Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.

Similar presentations


Presentation on theme: "ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options."— Presentation transcript:

1 ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options

2 2

3 3

4 4

5 5 Of course, things are not so simply divided.

6 6

7 Pre-Synthesis Steps  Syntax Check  Makes sure your HDL code follows the syntax rules of the Standard.  Finds errors like typos, missing semicolons, “begin” without “end”, assigning to a net in a behavioral block, etc.  Only a surface-level check  Checks each module in isolation; doesn’t look at how they fit together 7

8 Pre-Synthesis Steps  Elaboration  “Elaborates” HDL statements  Unrolls FOR loops  Computes values of constant functions  Replaces parameters with their values  Substitutes macro text  Evaluates generate conditionals and loops  Checks to make sure instantiated modules are defined  Checks inter-module connections for mismatched input/output connections (i.e. module port width not the same as connected net/variable width) 8

9 Pre-Synthesis Steps  Design Check  Checks design for issues that may make it unsynthesizable, but are otherwise legal HDL  Detects multiple drivers to non-tristates  Detects combinational loops  Gives errors or warnings about unsynthesizable constructs like delays, unsupported operators, etc.  Warns about unconnected or constant-value ports  May give warnings about inferred latches  Many of these produce warnings rather than errors; make sure you read the warnings when synthesizing! 9

10 Synthesis Process  Inputs  Functional hardware description in HDL  List of design constraints and design rules  Desired clock frequency / maximum delay  Limits on area, power, capacitance  Technology library (logic cells, wire models, etc.)  User-specified synthesis options/strategies  Output  Ideally: A netlist that uses the specified technology library, produces the same behavior as the functional description, and meets the design constraints  Reports that summarize the area and timing of the implementation 10

11 11

12 Logic Synthesis Steps  Translation  The synthesis tool identifies the behavior of high-level constructs and replaces them with a structural representation from a generic technology library.  Examples: “adder”, “multiplier”, “flip-flop”, “latch”  High-Level Optimizations  The tool performs optimizations at the Boolean equation level  The types of optimizations depend on your strategies  Examples: Reducing the number of logic levels, minimizing the number of Boolean operations, eliminating redundant computations 12

13 Logic Synthesis Steps  Mapping  The synthesis tool replaces the generic representations of gates and logic structures with equivalent hardware representations from the provided technology library  The netlist now consists of a structural representation of logic cells (Standard Cell) or LUTs/CLBs (FPGA)  Low-Level Optimizations  The tool performs optimizations at the logic cell level, either to reduce delay or reduce area  Examples: Duplicating logic, re-ordering operations to minimize delay, re-timing registers 13

14 A Brief Aside on Mapping  People commonly say that when using Structural Verilog, you know exactly what gates you are getting.  Is this true?  It actually depends on what’s in your Tech Library  If your library contains an XOR gate, then an XOR primitive will be mapped to that gate  But what if your Tech Library only contains NAND gates? Or only Look-up Tables? 14

15 Why require Constraints & Strategies?  Synthesis is hard (NP-hard!)  For a circuit of any useful size, the number of possible implementations is enormous  It is too computationally intensive to try them all  Need to know when a solution is good enough to stop  We usually give the tool hints on how to proceed  Often there is no universally “best” solution  Area vs. delay  Throughput vs. latency  Power vs. frequency  Constraints & strategies allow us to manage tradeoffs to find the solution that meets our needs 15

16 Constraint Examples  Minimize area 16 module mac(input clk, rst, input [31:0] in, output [63:0] out); reg [31:0] constreg; reg [63:0] mult, add, result; reg [2:0] count; assign out = result; always @(*) mult = constreg * in; always @(*) add = mult + result; always @ (posedge clk) begin if (rst) begin constreg <= in; result <= 0; count <= 0; end else if (count > 0) begin result <= add; count <= count - 1; end else begin result <= 0; count <= 4; end endmodule

17 Setting Design Constraints  set_max_area 20000  Sets maximum area to 20,000 cell units  set_max_delay 4 -to all_outputs()  Sets maximum delay of 4 to any output  set_max_dynamic_power 10mW  Sets maximum dynamic power to 10 mW  create_clk “clk” –period 10  Specifies that port clk is a clock with a period of 10ns  create_clk –name “my_clk” –period 12  Creates a virtual clock called my_clk with a period of 12ns; use with combinational logic 17

18 Constraint Examples 18 CLK_PERIOD = 4 (250 MHz) MAX_AREA = 80000 Arrival:3.73 Slack:0.01 Area:68122 Slack = CLK_PERIOD – (Arrival + Library Setup Time) Library Setup Time is approximately 0.25-0.26 ns for these examples

19 Constraint Examples 19 CLK_PERIOD = 4 MAX_AREA = 65000 Arrival:3.75 Slack:0.00 Area:64758

20 Constraint Examples 20 CLK_PERIOD = 4 MAX_AREA = 60000 Arrival:3.75 Slack:0.00 Area:63377

21 Constraint Examples  Maximize speed 21 module mac(input clk, rst, input [31:0] in, output [63:0] out); reg [31:0] constreg; reg [63:0] mult, add, result; reg [2:0] count; assign out = result; always @(*) mult = constreg * in; always @(*) add = mult + result; always @ (posedge clk) begin if (rst) begin constreg <= in; result <= 0; count <= 0; end else if (count > 0) begin result <= add; count <= count - 1; end else begin result <= 0; count <= 4; end endmodule

22 Constraint Examples 22 CLK_PERIOD = 4 (250 MHz) MAX_AREA = 80000 Arrival:3.73 (+ 0.26 = 3.99) Slack:0.01 Area:68122

23 Constraint Examples 23 CLK_PERIOD = 3.6 (278 MHz) MAX_AREA = 80000 Arrival:3.46 (+ 0.26 = 3.68) Slack:-0.08 Area:73131

24 Constraint Examples 24 CLK_PERIOD = 3.7 (270 MHz) MAX_AREA = 90000 Arrival:3.45 (+ 0.25 = 3.7) Slack:0.00 Area:75673

25 Optimization Priorities  Design rules have priority over timing goals  Timing goals have priority over area goals  Design rules have highest priority  To prioritize area constraints:  use the ignore_tns (total negative slack) option when you specify the area constraint: set_max_area -ignore_tns 10000  To change priorities use set_cost_priority  Example: set_cost_priority -delay  To remove all optimization constraints use remove_constraint 25

26 Constraints Default Cost Vector 26

27 Compiling the Design  Once optimizations specifications are set, the design is compiled  The compile command  Logic-level and gate-level synthesis  Optimizations of the design  The compile_ultra command  Two-pass high effort compile of the design  May want to compile normally first to get ballpark figure (higher effort == longer compilation) 27 What is the purpose of doing multiple passes?

28 Synthesis Strategies  Even after supplying HDL code, Tech Library, and Constraints, the designer is still responsible for the Synthesis Strategy.  Why do we use Strategies?  The amount of CPU time and memory we devote to synthesis are still limited resources  The designers may already have a good idea about what sort of hardware they want 28

29 Compiling the Design  Useful compile options include: -map_effort low | medium | high (default is medium) -area_effort low | medium | high (default same as map_effort) -incremental_mapping (may improve already-mapped) -verify (compares initial and synthesized designs) -ungroup_all (collapses all levels of design hierarchy) 29

30 Top-Down Compilation  Use top-down compile strategy used when compile time or synthesizer memory are not limiters  Synthesizes each design unit separately and uses top-level constraints  Basic steps are:  Read in the entire design using analyze/elaborate or: acs_read_hdl -recurse $TOP_DESIGN  Resolve multiple instances of any design references with uniquify  Apply attributes and constraints to the top level  Compile the design using compile or compile_ultra 30

31 Example Top-Down Script # read in the entire design analyze -library WORK -format verilog {E.v D.v C.v B.v A.v TOP.v} elaborate {E.v D.v C.v B.v A.v TOP.v} current_design TOP link # links TOP.v to libraries and modules it references # set design constraints set_max_area 2000 # resolve multiple references uniquify # compile the design compile 31

32 Bottom-Up Compile Strategy  The bottom-up compile strategy  Compile the subdesigns separately and then incorporate them  Top-level constraints are applied and the design is checked for violations.  Advantages:  Compiles large designs more quickly (divide-and-conquer)  Requires less memory than top-down compile  Disadvantages  Need to develop local constraints as well as global constraints  May need to repeat process several times to meet design goals  Might use if memory or CPU time are limited 32

33 Compile-Once-Don’t-Touch Method  The compile-once-don’t-touch method uses the set_dont_touch command to preserve the compiled subdesign current_design top characterize U2/U3 current_design C compile current_design top set_dont_touch {U2/U3 U2/U4} compile  What are advantages and disadvantages? 33

34 Resolving Multiple References  In a hierarchical design, subdesigns are often referenced by more than one cell instance 34

35 Uniquify Method  The uniquify command creates a uniquely named copy of the design for each instance. current_design top uniquify compile  Each design optimized separately  What are advantages and disadvantages? 35

36 Ungroup Method (“Flattening”)  The ungroup command makes unique copies of the design and removes levels of the hierarchy current_design B ungroup {U3 U4} current_design top compile  What are advantages and disadvantages? 36

37 Benefits of Ungrouping Hierarchy 37 module logic1(input a, c, e, output reg x); always @(a, c, e) x = ((~a|~c) & e) | (a&c); endmodule module logic2(input a, b, c, d, output reg y); always @(a, b, c, d) y = ((((~a|~c)&b) | ((a|~b)&c))&d) | ((a|~b)&~d); endmodule module logic(input a, b, c, d, e, f, output reg z); wire x, y; logic1(a, c, e, x); logic2(a, b, c, d, y); always @(x, y, f) z = (~f&x) | (f&y); endmodule Without Hierarchy Area:34.15 Delay:0.25 With Hierarchy Area:36.15 Delay:0.25

38 Ungrouping versus Boolean Flattening  Ungrouping is commonly referred to as “Flattening the Hierarchy”, even by tool vendors  Because of this, many people incorrectly think the “set_flatten true” option in Synopsys is the same as “ungroup”  set_flatten true tells Design Vision to flatten the Boolean equations describing your logic down to a two-level expression. That is, to create a Sum of Products expression.  Flattening Boolean equations is a way of reducing delay at the cost of increased area – we’ll talk about it more in a later lecture. 38

39 Dealing with Structured Logic  Sometimes we do not want the synthesis tool to try to optimize our Boolean equations.  Structured Logic refers to Boolean logic operations that are structured in a certain way to achieve a goal, such as reduced delay or fault tolerance.  Examples: Carry-Lookahead Adder, Wallace Multiplier, duplicated logic  set_structure true (default) – tells the tool it can re- order, factor, or decompose the logic equations  set_structure false – tells the tool to leave the logic alone 39

40 Checking your Design  Use the check_design command to verify design consistency.  Usually run both before and after compiling a design  Gives a list of warning and error messages  Errors will cause compiles to fail  Warnings indicate a problem with the current design  Try to fix all of these, since later they can lead to problems  Use check_design –summary or check_design -no_warnings to limit the number of warnings given  Use check_timing to locate potential timing problems 40

41 Analyzing your Design [1]  There are several commands to analyze your design  report_design  display characteristics of the current design  operating conditions, wire load model, output delays, etc.  parameters used by the design  report_area  displays area information for the current design  number of nets, ports, cells, references  area of combinational logic, non-combinational, interconnect, total 41

42 Analyzing Your Design [2]  report_hierarchy  displays the reference hierarchy of the current design  tells modules/cells used and the libraries they come from  report_timing  reports timing information about the design  default shows one worst case delay path  report_resources  Lists the resources and datapath blocks used by the current design  Can send reports to files  report_resources > cmult_resources.rpt  Lots of other report commands available 42

43 Synthesis Scripts  Synthesis scripts provide a convenient method for performing synthesis multiple times  To run the script, enter the directory which contains the Verilog code and type:  dc_shell –tcl_mode –f script.tcl  dc_shell –tcl_mode –f script.tcl > log.txt &  This will start the script and store its output to log.txt 43

44 44 Example Synthesis Script analyze -library WORK -format verilog {/.register_file_behave.v} elaborate reg_file_behave -architecture verilog -library WORK create_clock –name "clk" -period 2 -waveform {0 1} {clk} set_dont_touch_network [ find clock clk ] set_max_area 30000 check_design uniquify compile -map_effort medium report_area > area_report.txt report_timing > timing_report.txt report_constraint -all_violators > violator_report.txt 44

45 Design Optimization: FIR Filter  Used in signal processing  Passes through some data but not all (filter!)  Example: Remove noise from image/sound  Uses multipliers and adders  Multiply constant “tap” value against time-delayed input value  In the Verilog, y is out, b k is taps, and x is data 45

46 FIR Filter Design 46

47 Design Optimization: FIR Filter  We’ll look at three different approaches to implementing this filter  “Initial”  “Small”  “Fast”  We’ll revisit the idea of re-architecting algorithms for better area, latency, and throughput later.  As an exercise, you should take some time on your own to try to understand exactly what is happening in each of the following code segments.  Learning to read and understand someone else’s (confusing) code is an extremely valuable skill 47

48 Initial Design: Code [1] 48 module fir_init(clk, rst, in, out); parameter bitwidth = 8; parameter ntaps = 4; parameter logntaps = 2; input clk, rst; input [bitwidth-1:0] in; output reg [bitwidth-1:0] out; reg [bitwidth-1:0] taps [0:ntaps-1]; reg [bitwidth-1:0] data [0:ntaps-1]; reg [logntaps:0] count; integer i;

49 Initial Design: Code [2] 49 always @(posedge clk) begin if (rst) begin // indicate we need to load all the tap values count <= 0; // reset the data and taps for (i = 0; i < ntaps; i = i + 1) begin: resetloop data[i] <= 0; taps[i] <= 0; end else if (count < ntaps) begin // we need to load the tap values before filtering for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps taps[i] <= taps[i-1]; end // load the new value at tap[0] taps[0] <= in; count <= count+1; end

50 Initial Design: Code [3] 50 else begin // ready to do the filtering // first shift in the new input data value for (i = ntaps-1; i > 0; i = i - 1) begin: shiftdata data[i] <= data[i-1]; end // load the new value at data[0] data[0] <= in; end // else: !if(count < ntaps) end // always @ (posedge clk) // compute the filtered result always @(*) begin out = 0; for (i = 0; i < ntaps; i = i + 1) begin: filterloop out = out + (data[i] * taps[ntaps-1 - i]); end endmodule

51 Initial Design: Synthesis  Constraints  CLK_PERIOD4  INPUT_DELAY0.2  OUTPUT_DELAY0.2  MAX_AREA8000  Results  Arrival Time3.13  Slack.67 (MET)  Area7335 51 Should we make our contraints more aggressive?

52 Initial Design: Schematic 52

53 Small Design: Code [1] 53 module fir_area(clk, rst, in, out); parameter bitwidth = 8; parameter ntaps = 4; parameter logntaps = 2; input clk, rst; input [bitwidth-1:0] in; output reg [bitwidth-1:0] out; reg [bitwidth-1:0] taps [0:ntaps-1]; reg [bitwidth-1:0] data [0:ntaps-1]; reg [bitwidth-1:0] partial; reg [logntaps:0] count; reg [logntaps-1:0] step; reg ready;// indicates ready to filter integer i;

54 Small Design: Code [2] 54 always @(posedge clk) begin if (rst) begin // indicate we need to load all the tap values count <= 0; ready <= 0; // reset the data and taps for (i = 0; i < ntaps; i = i + 1) begin: resetloop data[i] <= 0; taps[i] <= 0; end else if (count < ntaps && ~ready) begin // we need to load the tap values before filtering for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps taps[i] <= taps[i-1]; end // load the new value at tap[0] taps[0] <= in; count <= count+1; if (count >= ntaps) begin ready <= 1; count <= 0; end end

55 Small Design: Code [3] 55 else begin // ready to do the filtering // first shift in the new input data value for (i = ntaps-1; i > 0; i = i - 1) begin: shiftdata data[i] <= data[i-1]; end // load the new value at data[0] data[0] <= in; end // else: !if(count < ntaps) end // always @ (posedge clk)

56 Small Design: Code [4] 56 // compute the filtered result always @(posedge clk) begin if (rst || ~ready) begin step <= 0; partial <= 0; end else begin if (step == 0) begin out <= partial; partial <= (data[0] * taps[ntaps-1]); end else begin out <= out; partial <= partial + (data[step] * taps[ntaps - 1 – step]); end if (step < ntaps-1) step <= step + 1; else step <= 0; end endmodule

57 Small Design: Synthesis  Constraints  CLK_PERIOD4  INPUT_DELAY0.2  OUTPUT_DELAY0.2  MAX_AREA8000  Results  Arrival Time2.76 (vs. 3.13)  Slack.92 (MET) (4 clock cycles)  Area5754 (vs. 7335)  What are the tradeoffs? 57

58 Small Design: Schematic 58

59 Fast Design: Code [1] 59 module fir_fast(clk, rst, in, out); parameter bitwidth = 8; parameter ntaps = 4; parameter logntaps = 2; input clk, rst; input [bitwidth-1:0] in; output [bitwidth-1:0] out; reg [bitwidth-1:0] taps [0:ntaps-1]; reg [bitwidth-1:0] mult [0:ntaps-1]; reg [bitwidth-1:0] partial [0:ntaps-1]; reg [logntaps:0] count; reg ready;// indicates ready to filter integer i; assign out = partial[ntaps-1];

60 Fast Design: Code [2] 60 always @(posedge clk) begin if (rst) begin // indicate we need to load all the tap values count <= 0; // reset the taps for (i = 0; i < ntaps; i = i + 1) begin: resetloop taps[i] <= 0; end else if (count < ntaps && ~ready) begin // we need to load the tap values before filtering for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps taps[i] <= taps[i-1]; end // load the new value at tap[0] taps[0] <= in; count <= count+1; end

61 Fast Design: Code [3] 61 else begin // taps stay the same end // else: !if(count < ntaps) end // always @ (posedge clk) // compute the filtered result (pipelined) always @(posedge clk) begin // get the product of the input with each of the tap values for (i = 0; i < ntaps; i = i + 1) mult[i] <= in * taps[i]; // special case at front partial[0] <= mult[0]; // get the partial sums for the rest for (i = 1; i < ntaps; i = i + 1) partial[i] <= partial[i-1] + mult[i]; end endmodule

62 Fast Design: Synthesis  Constraints  CLK_PERIOD4  INPUT_DELAY0.2  OUTPUT_DELAY0.2  MAX_AREA8000  Results  Arrival Time1.92 (vs. 3.13)  Slack1.82 (MET) (1 clock cycle!*)  Area7311 (vs. 7335) What are the tradeoffs? 62

63 Fast Design: Schematic 63

64 Optimization Strategies  Area vs. Delay - Often only really optimize for one  “Fastest given an area constraint”  “Smallest given a speed constraint”  Design Compiler Reference Manual has several pointers on synthesis settings for these goals  In some ways, synthesis is as much an art as it is a science  Experiment with different options to see how they interact with each other 64

65 Design Examples  All using same constraints  No special synthesis options  Can get even more dramatic results by combining:  Coding style  Tight constraints  Synthesis optimization options 65

66 Some More “Small Design” Results 66 constraintsresults areainput delay output delay clock period areaslack compile –area_effort medium80000.2 457972.05 compile –area_effort high55000.2 457782.05 compile ultra55000.2 452421.42 compile ultra50000.2 452421.42 compile + compile ultra50000.2 465621.78 compile ultra55000.2 252740.01 compile ultra55000.2 1.853910.00 compile ultra55000.2 1.755190.00 compile ultra (rst no delay)55000.2 1.754140.00 compile ultra55000.1 1.756360.01 compile ultra (rst no delay)55000.1 1.754140.00 compile ultra55000.5 1.759230.00 compile ultra (rst no delay)55000.5 1.754140.00

67 67 Script analyze -library WORK -format verilog {fir_area.v} elaborate fir_area -architecture verilog -library WORK create_clock -name "clk" -period 4 {clk} set_dont_touch_network [ find clock clk ] set_max_area 5000 set NORM_INPUTS [remove_from_collection [all_inputs] "clk rst"] #set NORM_INPUTS [remove_from_collection [all_inputs] "clk"] set_input_delay 0.2 -max -clock clk $NORM_INPUTS set_output_delay 0.2 -max -clock clk [all_outputs] check_design > check_design.txt uniquify #compile -map_effort medium -area_effort medium compile -map_effort high -area_effort high compile_ultra report_area > area_report.txt report_timing > timing_report.txt report_constraint -all_violators > violator_report.txt exit

68 Want more information about any of the Design Vision commands listed in these lectures? Log in to a CAE computer and type: dc_shell man command_name 68


Download ppt "ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options."

Similar presentations


Ads by Google