ECE 551 Digital System Design & Synthesis

Slides:



Advertisements
Similar presentations
TOPIC : SYNTHESIS DESIGN FLOW Module 4.3 Verilog Synthesis.
Advertisements

Combinational Logic.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
ECE 551 Digital System Design & Synthesis Lecture 09 Synthesis of Common Verilog Constructs.
CSE241 Formal Verification.1Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 6: Formal Verification.
ELEN 468 Lecture 151 ELEN 468 Advanced Logic Design Lecture 15 Synthesis of Language Construct I.
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
ELEN 468 Lecture 121 ELEN 468 Advanced Logic Design Lecture 12 Synthesis of Combinational Logic I.
Logic Synthesis 1 Outline –Logic Synthesis Problem –Logic Specification –Two-Level Logic Optimization Goal –Understand logic synthesis problem –Understand.
1 Optimizations and Tradeoffs We now know how to build digital circuits –How can we build better circuits? Let’s consider two important design criteria.
Lecture 8 Arithmetic Logic Circuits
CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding.
King Fahd University of Petroleum and Minerals Computer Engineering Department COE 561 Digital Systems Design and Synthesis (Course Activity) Synthesis.
1 Application Specific Integrated Circuits. 2 What is an ASIC? An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
ECE 551 Digital System Design & Synthesis Lecture 11 Verilog Design for Synthesis.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
FPGA-Based System Design: Chapter 4 Copyright  2004 Prentice Hall PTR HDL coding n Synthesis vs. simulation semantics n Syntax-directed translation n.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Outline Analysis of Combinational Circuits Signed Number Arithmetic
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Synthesis Presented by: Ms. Sangeeta L. Mahaddalkar ME(Microelectronics) Sem II Subject: Subject:ASIC Design and FPGA.
Introduction to VHDL Arab Academy for Science, Technology & Maritime Transport Computer Engineering Department Magdy Saeb, Ph.D.
ECE Advanced Digital Systems Design Lecture 12 – Timing Analysis Capt Michael Tanner Room 2F46A HQ U.S. Air Force Academy I n t e g r i.
1 H ardware D escription L anguages Modeling Digital Systems.
1 © 2015 B. Wilkinson Modification date: January 1, 2015 Designing combinational circuits Logic circuits whose outputs are dependent upon the values placed.
Optimization Algorithm
VHDL IE- CSE. What do you understand by VHDL??  VHDL stands for VHSIC (Very High Speed Integrated Circuits) Hardware Description Language.
King Fahd University of Petroleum and Minerals Computer Engineering Department COE 561 Digital Systems Design and Synthesis (Course Activity) Synthesis.
CHAPTER 4 Combinational Logic
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
1 Logic Synthesis Using Cadence Ambit. 2 Environment Setup Enter the following to.cshrc or a c-shell command file. –setenv LM_LICENSE_FILE full_path/license.dat.
1 Lecture 6 BOOLEAN ALGEBRA and GATES Building a 32 bit processor PH 3: B.1-B.5.
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
Anurag Dwivedi. Basic Block - Gates Gates -> Flip Flops.
Electrical and Computer Engineering University of Cyprus LAB 1: VHDL.
04/06/031 ECE 551: Digital System Design & Synthesis Lecture Set 9 9.1: Constraints and Timing (In separate file) 9.2: Optimization - Part 1 9.3: Optimization.
ECE-C662 Lecture 2 Prawat Nagvajara
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
ELEN 468 Lecture 131 ELEN 468 Advanced Logic Design Lecture 13 Synthesis of Combinational Logic II.
1 Chapter 4 Combinational Logic Logic circuits for digital systems may be combinational or sequential. A combinational circuit consists of input variables,
Introduction to ASIC flow and Verilog HDL
04/26/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Introduction to VHDL 12.2: VHDL versus Verilog (Separate File)
2/1/20001 ECE 551: Digital System Design & Synthesis Lecture Set 7 7.1: Coding for if and case 7.2: Coding logic building blocks (In separate file) 7.3:
Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
IAY 0600 Digital Systems Design Timing and Post-Synthesis Verifications Hazards in Combinational Circuits Alexander Sudnitson Tallinn University of Technology.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
1 CS 352 Introduction to Logic Design Lecture 4 Ahmed Ezzat Multi-level Gate Circuits and Combinational Circuit Design Ch-7 + Ch-8.
©2010 Cengage Learning SLIDES FOR CHAPTER 8 COMBINATIONAL CIRCUIT DESIGN AND SIMULATION USING GATES Click the mouse to move to the next page. Use the ESC.
EECE 320 L8: Combinational Logic design Principles 1Chehab, AUB, 2003 EECE 320 Digital Systems Design Lecture 8: Combinational Logic Design Principles.
ASIC Design Methodology
Combinational Logic Logic circuits for digital systems may be combinational or sequential. A combinational circuit consists of input variables, logic gates,
Overview Part 1 – Design Procedure Beginning Hierarchical Design
Topics HDL coding for synthesis. Verilog. VHDL..
ECE 434 Advanced Digital System L08
Timing Analysis 11/21/2018.
ECE 551: Digital System Design & Synthesis
FPGA Tools Course Answers
ECE 352 Digital System Fundamentals
VLSI CAD Flow: Logic Synthesis, Placement and Routing Lecture 5
THE ECE 554 XILINX DESIGN PROCESS
*Internal Synthesizer Flow *Details of Synthesis Steps
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

ECE 551 Digital System Design & Synthesis Lecture 10 Synthesis Techniques

Lecture 10 Topics Synthesis Process Revisited Optimization Stages in Synthesis Advanced Synthesis Strategies

Synthesis Verilog files aren’t hardware yet! Need to “synthesize” them Tool reads hardware descriptions Figures out what hardware to make Done automatically Faster! Easier! Designers still have to understand hardware! Avoid pre- vs. post-synthesis discrepancies Describe EFFICIENT hardware

Useful Documentation Fairly complete documentation is available for the Synopsys tools using: /afs/engr.wisc.edu/apps/eda/synopsys/syn_Y-2006.06-SP1/sold See especially (through Design Compiler link) Design Vision User Guide Design Compiler User Guide Design Compiler Reference Manuals HDL Compiler (Presto Verilog) Reference Manual HDL Compiler for Verilog Reference Manual Use as references

HDL Compiler for Verilog Reference Manual, pg. 1-5. HDL Compiler is called by Design Compiler and Design Vision Why do we need to compare synthesized code to initial code?

Design Compiler User Guide, pg. 2-17 Design Vision is GUI for Design Compiler: use design_vision Can also run Design Compiler directly using dc_shell To compile using a synthesis script use dc_shell –tcl_mode –f file_name

Synthesis Script Example [1] # To run, place in the directory with all the Verilog files # and type: dc_shell -tcl_mode -f script.tcl #Analyze input files. analyze -library WORK -format verilog {./prob5.v ./prob1.v ./prob2.v} #Elaborate the design. elaborate GF_multiplier_mword -architecture verilog -library WORK #Sets clock constraint of 2ns with 50% duty cycle on signal "clock". create_clock -name "clk" -period 2 -waveform {0 1} {clock} set_dont_touch_network [ find clock clk ] #Sets the area constraint for the design set_max_area 50000

Synthesis Script Example [2] #Check and compile the design check_design > check_design.txt uniquify compile -map_effort medium #Export netlist for post-synthesis simulation into synth_netlist.v change_names -rule verilog -hierarchy write -format verilog -hierarchy -output synth_netlist.v #Generate reports report_resources > resource_report.txt report_area > area_report.txt report_timing > timing_report.txt report_constraint -all_violators > violator_report.txt report_register -level_sensitive > latch_report.txt exit

Internal Synthesizer Flow (Synopsys) HDL Description Structural Representation Syntax Checking Architectural Optimization Technology Library Synthesizer Policy Checking Multi-Level Logic Optimization Technology Mapping Elaboration & Translation Technology-Based Implementation

Initial Steps Parsing for Syntax and Semantics Checking Gives error messages and warnings to user User may modify the HDL description in response Synthesizer Policy Checking (“Check Design”) Check for adherence to allowable language constructs Are you using unsupported operators or constructs? Combinational feedback? Multiple drivers to non-tristate? This is where you find out you can’t use certain Verilog constructs This is synthesizer-dependent Example: Advanced DesignWare library allows modulo with any value; most other tools only allow modulo with powers of 2. Certain things common to MOST synthesizers See HDL Compiler for Verilog Reference Manual for constructs

Elaboration & Translation Unrolls loops, substitutes macros & parameters, computes constant functions, evaluates generate conditionals Builds a structural representation of the design Like a netlist, but includes larger components Not just gate-level, may include adders, etc. Gives additional errors or warnings to the user Issues in initial transformation to hardware. For example, port sizes do not match Affects quality achieved by optimization steps Structural representation depends on HDL quality Poor HDL can prevent optimization

Importance of Translation It is important for the tool to recognize the sort of logic structures you are trying to describe. If it sees a 32-bit full adder, the tool has built-in solutions for optimizing adders Ripple-carry, carry-save, carry look-ahead, etc. If it just sees a Boolean function with 65 inputs, it has to work a lot harder to achieve the same results Do you think it can invent a CLA on the fly?

Implications of Translation Writing clear, easy to understand code not only benefits other engineers, but may give you better synthesis results. Another reason for standard coding guidelines Brush up on the list in “Verilog Styles That Kill” If you have a decent synthesis tool, it’s usually better to use Verilog’s built-in arithmetic operators rather than trying to build them from gates or Boolean equations

Optimization in Synthesis None of these are guaranteed! Most synthesizers will make at least some attempt Detect and eliminate redundant logic Detect combinational feedback loops Exploit don't-care conditions Try to detect unused states Detect and collapse equivalent states Make state assignments if not made already Synthesize multi-level logic equations subject to: constraints on area and/or speed available technology (library)

Optimization Process Optimization modifies the generic netlist resulting from elaboration and translation. Uses cells from the technology library (mapping) Attempts to meet all specified constraints The process is divided into major phases All or some selection of the major phases may be performed during optimization Phase selection can be controlled by the user Some optimizations can be disabled (ex: set_structure) or forced (ex: set_flatten)

Optimization Phases Major Optimization Stages Architectural Logic-Level Gate-Level Architectural optimization High-level optimizations that occur before the design is mapped to the logic-level Based on constraints and high-level coding style After optimization circuit function is represented by a generic, technology-independent netlist (GTECH)

Architectural Optimization In Synopsis, optimizations include: Sharing common mathematical subexpressions Sharing resources Selecting DesignWare* implementations Replacing the generic representation from Translation with a pre-built, optimized circuits Reordering operators Identifying arithmetic expressions for datapath synthesis *DesignWare is Synopsys’s library of pre-designed circuit implementations

Architectural Optimization Examples: Replace an adder used as a counter with incrementer count = count + 1; Replace adder and separate subtractor with adder/subtractor if not used simultaneously if (~sub) z = a + b; else z = a – b; Performs selection of pre-designed components (Synopsys DesignWare) adders, multipliers, shifters, comparators, muxes, etc. Need good code for synthesizer to do this Designer knows more about the project than the tool does! It can only do so much on its own.

Logic/Gate-Level Optimization Works on the generic netlist created by logic synthesis Produces a technology-specific netlist. In Synopsis, it consists of four stages: Mapping Delay optimization Design rule fixing Area optimization This phase often runs in multiple iterations if constraints are not met on the first try

Logic/Gate-Level Optimization Mapping Generates a gate level implementation using tech library Tries to meet timing and area goals Delay optimization Tries to fix delay violations from mapping phase. Does not fix design rule violations or meet area constraints. Design rule fixing Tries to correct design rule violations Inserting buffers or resizing existing cells If necessary, violates optimization constraints Area optimization Tries to meet area constraints, which have lowest priority

Combinational Optimization

Gate-Level Optimization

Boolean Logic-Level Optimizations Verilog Description Technology Implementation TRANSLATION ENGINE Two-level Logic Functions Optimized Multi-level Logic Functions OPTIMIZATION MAPPING Libraries

Logic Optimizations Area Delay Number of gates fewer == smaller Size of gates (# inputs) fewer == smaller Delay Number of logic levels fewer == faster Size of gates (# inputs) fewer == faster Note that examples that follow ignore NOT gates for gate count / levels of circuits This is because many libraries offer gate cells with one or more inputs already inverted.

Logic Optimizations Decomposition Extraction Factoring Substitution Elimination You don’t have to remember the names of these But should understand logic optimization Different techniques targeting area vs. delay

Decomposition Find common expressions in a single function Reduce redundancy Reduce area (number/size of gates) May increase delay More levels of logic Define a G(x) cost function to compare expressions G(inverter) = 0 G(basic gate) = #inputs to the gate Basic gates: AND, OR, NAND, NOR Based on the concept that the size of a gate is proportional to the number of inputs

Decomposition Example F = abc + abd + a’c’d’ + b’c’d’ F = ab(c + d) + c’d’(a’ + b’) F = ab(c + d) + (c + d)’(ab)’ X = ab 1 gate, 1 level Y = c + d 1 gate, 1 level F = XY + X’Y’ 3 gates, 2 levels (5 gates, 3 levels total) G(Original) = 16 (four 3-input, one 4-input gates) G(Decomposed) = 10 (five 2-input gates)

Extraction Find common sub-expressions between functions Like decomposition, but across more than one function Reduce redundancy Reduce area (number/size of gates) May increase delay if more logic levels introduced

Extraction Example F = (a + b)cd + e 3 gates, 3 levels G = (a + b)e’ 2 gates, 2 levels H = cde 1 gate, 1 level Common subexp: X = a + b, Y = cd 1 gate, 1 level (each) F = XY + e 4 gates, 3 levels G = Xe’ 2 gate, 2 levels H = Ye 2 gate, 2 levels Before: (3) 2-input ORs, (2) 3-input ANDs, (1) 2-input AND G(original) = 6 + 6 + 2 = 14 After (2) 2-input Ors, (4) 2-input ANDs G(extracted) = 4 + 8 = 12

Factoring Traditional two-level logic is sum-of-products Sometimes better expressed by product-of-sums Fewer literals => less area May increase delay if logic equation not completely factored (becomes multi-level)

Factoring Example Definitely good: Maybe good: F = ac + ad + bc + bd 7 gates, 3 levels* F = (a + b)(c + d) 3 gates, 2 levels Maybe good: F = ac + ad + e 3 gates, 2 levels (G=7) F = a(c + d) + e 3 gates, 3 levels (G=6) This one might improve area... But will likely increase delay (tradeoff) *Assuming 2-input gates

Substitution Similar to Extraction When one function is a sub-function of another Reduce area Fewer gates Can increase delay if more logic levels

Substitution Example G = a + b 1 gate, 1 level F = a + b + c 1 gate, 1 level F = G + c 2 gate, 2 levels Before: (1) 2-input OR, (1) 3-input OR After: (2) 2-input ORs (better area but increased levels) With compile_ultra, the sub-expressions do not have to explicitly match, i.e. a + b would still be identified if F = b + c + a

Elimination (Flattening) Opposite of previous optimizations Goal is to reduce delay Make signals travel though as few logic levels as possible But will likely increase area Gate replication / redundant logic Can force/disable this step using set_flatten true / set_flatten false

Elimination Example G = c + d 1 gate, 1 level F = Ga + G' b 3 gates, 3 levels F = ac + ad + bc’d’ 4 gates, 2 levels Before: (2) 2-input ORs, (2) 2-input ANDs After: (1) 2-input OR, (1) 3-input OR, (2) 2-input ANDs, (1) 3-input AND (worse area, but fewer levels)

compile_ultra Optimizations Ultra-high mapping effort, 2-pass Compilation Automatic hierarchical ungrouping Ungroups small modules before mapping Ungroups critical path based on delay Automatic datapath extraction * E.g. carry-save adders, sharing/unsharing Boundary optimization Propagates logic across hierarchical boundaries (constants, NC inputs/outputs, NOT) Sequential inversion * Sequential elements can have their outputs inverted

Datepath Extraction Optimizations Uses carry-save adders where beneficial Carry-propagate adders only when result is needed

Datapath Extraction Optimizations Comparator sharing A>B, A=B, A<B use a single subtractor with multiple outputs Optimization of parallel constant multipliers SOP to POS transformation Operand reordering Explores trade-offs of common sub-expression sharing and mutually exclusive resource sharing

Sharing and Unsharing Expression sharing may be overridden later due to timing Z1 <= A + B + C Z2 <= A + B + D Arrival time is A < B < D < C

Sharing and Unsharing Mutually exclusive operations can share resources if(SEL) Z = A + B else Z = C + D When would this kind of sharing be a bad idea?

Sequential Inversion set compile_seqmap_enable_output_inversion true Useful if the available flip-flops do not have the same asynchronous input (preset or clear) as required in the design

Register Retiming At the HDL level, determining the optimal placement of registers is difficult and tedious at best, or just plain impossible at worst The register retiming tool moves registers through the synthesized combinational logic network to improve timing and/or area Equalize delay (i.e. reduce critical path delay by increasing delay in other paths) Reduce the number of flip-flops if timing criteria are met Usually propagate registers forward Be aware that this may change the values of some internal signals compared to pre-synthesis.

Register Retiming Example (1)

Register Retiming Example (2)

DC Topographical Mode When optimizing for delay, the synthesis engine is not aware of the net delays, since the place-and-route has not been accomplished Delays can be back-annotated and synthesis repeated after place-and-route, until closure is reached Layout-aware synthesis attempts to get faster timing closure by predicting the physical design and using that information in synthesis and optimization, particularly with respect to delay Estimates the placement and routing Predicts and uses net capacitances in synthesis and optimization

Further Reading There are many more commands out there to give you greater control over the synthesis process if you want it. See: Synopsys Online Documentation (SOLD) Design Compiler man pages