Presentation is loading. Please wait.

Presentation is loading. Please wait.

by: Saikat Bandyopadhyay © Interra Systems India Pvt Ltd

Similar presentations


Presentation on theme: "by: Saikat Bandyopadhyay © Interra Systems India Pvt Ltd"— Presentation transcript:

1 by: Saikat Bandyopadhyay © Interra Systems India Pvt Ltd
Synthesis in EDA Flow by: Saikat Bandyopadhyay © Interra Systems India Pvt Ltd If everybody is ready, I can start todays presentation. Hello, my name is saikat, and I am the manager of the concorde team. This presentation is mostly for general audience, with some familiarity with EDA and verilog. Though mostly an overview on Synthesis but I am pretty sure that even people who are familiar with Synthesis will find somethings new or interesting in the presentation. The presentation should take in total of 3 hours excluding Q and A. We will have 1 break after every hour except for the last one where will continue till the end of Q and A. Lets now go to the topics, that will be covered. interra confidential

2 Content Defining Synthesis History IC Design Flow Synthesis Flow
Analysis and Elaboration Synthesis Scheduling and Allocation Optimization Technology Mapping Synthesis Goals and Constraints Synthesizing Big Design Variations in Synthesis Q and A We will start with the definition of Synthesis. Very trivial for some, but have you ever tried to explain Synthesis or Simulation to a non EDA person ? If you have tried it, you know how non trivial a simple job of defining can be. We cannot skip such an important topic in this presentation. We will then go to the history of Synthesis. How the Synthesis that we are familiar with came into being. Next we will describe IC implementation flow with emphasis on Synthesis Once we know the place of Synthesis in IC implementation, we will look deep inside Synthesis itself. We will break up Synthesis into various components and look into the flow inside Synthesis. An important directive for Synthesis is to meat it’s clock speed, or fit inside a particular die. These comes as an input to Synthesis tool, we will go over them to have a better understanding of Synthesis Now we are supposed to know Synthesis, but still we are not done. We will go through the divide and conquer technique to Synthesize a real large design. Most of the presentation we will covers about the standard Synthesis RTL to gate in ASIC flow. In this section we will cover some variation from the regular flow And of course we have Q and A. interra confidential

3 Defining Synthesis Conversion of High Level Hardware Description to Gate Level Hardware Description Level of Hardware Description Gate level Data Flow level RTL level Behavioural level Synthesis in short can be described as conversion of a High level Hardware Description to Gate Level Hardware Description. When we normally use the word Synthesis, the High Level Description is RTL Level. OK so lets gate familiar with the various levels of Hardware Description. These are Gate Level, Data Flow Level, RTL Level and behavioral Levels. Lets learn more about these levels interra confidential

4 Gate Level Description of the hardware is purely in terms nets connecting pins of gate instances and ports Example implements a 2 input mux using gate level components module select(out, s, a, b); output out; input s, a, b; INT_NOT (s_bar, s); //s_bar=!s INT_AND2 (t1, a, s); //t1=a&s INT_AND2 (t2, b, s_bar);//t2=a&s_b INT_OR2 (out, t1, t2);//out=t1|t2 endmodule Gate level describes design as instances of technology cells and in terms of nets connecting the instances. Here is an example of gate level implementation of 2to1 MUX. Assume that there is no MUX in the library. The MUX is implemented with inverter and 2 input and/or gates. interra confidential

5 Data Flow Level Example: Gate level + assign statements
normally used to represent combinational circuit Can represent sequential circuit if used with instance of latch or ff Example: computes absolute value module abs (out, in); output [7:0] out; input [7:0] in; wire [7:0] twosCIn; assign twosCIn = ~in + 1; assign out = in[7] ? twosCIn : in; endmodule Data Flow level includes gate level with Assign Statements. The Assign statements can have complex expression on the right hand side. Normally this represents combinational circuits Along with gate instansiation, assign statements can be used to represent sequential logic too. Here is an example of Data Flow level Description. The Design takes in a signed input and returns absolute value of the input. interra confidential

6 RTL Level Explicit clock and state machine Technology independent
Fixed Architechture Synthesizable Example : RTL level description for recognizing overlapping 101 pattern State diagram 1/0 0/0 1/1 1/0 S1 0/0 S2 S0 S0 0/0 In RTL Level the clock is explicit. I.e clock is very clear from the design. You will see that in the example. The design is technology independent Can be Synthesized to gate level for any library of gates. Can be Synthesized to gate level using synthesis tools like DC, Concorde, Build gate etc. Let’s look at the following example. We want to recognize overlapping patterns of 101. Please take a little while and verify the statediagram. Intially we are in state S0. If we get input 1 state is changed to S1. Else if input is 0 state remains in S0. The output is 0 in both cases. Similary …… Now lets look at it’s RTL description interra confidential

7 RTL Level interra confidential
module recognize101(match,in,ck); input in, ck; output match; reg match; reg [1:0] state; ck) begin case (state) 2’b00: begin if (in == 1) begin state = 2’b01; end match = 1’b0; 2’b01: begin if (in == 0) begin state = 2’b10 case 2’b10: begin if (in == 1) begin state = 2’b01; match = 1’b1; end else begin state = 2’b00; match = 1’b0; end default: begin endcase endmodule This description essentially implements the code in the state diagram. Note clock is explicit in this case. Clock event is used to start the always block. For each value of state we check the current value of input and take appropriate action of changing to new state and setting the output match incase 101 pattern is found. interra confidential

8 Behavioural Level Implicit clock and scheduling of events
Architechture independent Mostly used for modeling only (not synthesizable) Can be synthesized with special behavioural synthesis tools. Example: The following module computes sqrt Uses logic n-1 (2i+1) = n2 module sqrt(in, out); input [7:0] in; output [3:0] out; reg [3:0] out, tmp; reg [7:0] odd; begin tmp = in; out = 0; odd = 1; while (tmp > 0) begin if (tmp >= odd) begin out = out+1; tmp = tmp - odd; odd = odd + 2; end else begin tmp = 0; end endmodule interra confidential

9 History of Synthesis Initial IC Designs were handmade at Mask level
Polygon pushing tools(example Calma®) were used for design. Simulation was done at this level by Simulators like HiLo®. Next tools were developed for automatic generation of operators Some generators were developed for generating operators from parameters like input/output width and architecture.(e.g 16 bit carry look ahead adder) The operators were connected by hand Later Schematic entry tools came to market. Gates or operators can be drawn and connected schematically Automatic tools would generate the mask from the schematic. Mentor graphics Idea Station® had integrated schematic entry and simulation interra confidential

10 History of Synthesis(cont)
Next came High Level Hardware Description Language Gateways Design came up with Verilog Language Verilog was essentially developed to model behavior of Electronic Circuits. Not for simulation. Gateways developed the Verilog Simulator now called Verilog-XL. From High Level Description to Gate Level Synopsys was at earlier called optimal design Inc. It specialized in gate level logic optimization. Synthesis happened as a after thought. Since this modeling language(verilog) was available, Synopsys engineers tried to convert various of high level verilog constructs into gate level where ever possible. synthesis as we know today was born. interra confidential

11 IC Design Flow Develop and verify algorithm (C, Mathlab etc)
Hand convert to RTL level Hardware Description Verify the RTL Design by Simulation. Power and Timing estimation tools can also be used at RTL level. Synthesis tools used to convert description to gate level. Simulation or Formal Verification done to verify functionality Design Flow Algorithm in C, Mathlab Execute and verify Algo Tech Library RTL Description Estimate Timing and Power Constraints Synthesis Simulate to verify Functionality Gate Description Verify Functionality with Simulation or Formal Verification Verify Timing and Power interra confidential

12 IC Design Flow (cont) Placement tool in now used to assign place(x,y coordinates) for gates Timing verification is done with better estimate of wire delays Routing tool assigns location for nets that connect the instance gates. Timing Verification is again done with still refined wire delays Mask is used to prepare the IC Design Flow Gate Description Floor Plan Physical Library Placement Verify Timing Placed Gates Verify and Correct Placement Rules Routing Verify Timing Mask (GDSII) Verify and Correct Mask Rules To IC foundry interra confidential

13 Synthesis Flow Translate RTL level Design description in HDL to gate level netlist In description only synthesizable subset of the HDL are supported for synthesis Different steps in Synthesis flow RTL Description CDFG Traversal Analysis Optimization DFA Elaboration Technology Mapping Allocation CDFG generation Macro Generation Writing Netlist Gate Level Description interra confidential

14 Synthesis Flow (analysis)
Input : Design description in HDL (Verilog/VHDL file) Output : Analyzed design units in an intermediate form either in memory or in disk Functionality : Perform syntax and semantics checks on the design description Creates Data Structure in an language dependent form (Obejct Model) module my_mod(z, a, b, c); input [1:0] a, b, c; output [1:0] z; or b or c) z = a + b – c; end endmodule module my_mod always ports expr interra confidential

15 Synthesis Flow (elaboration)
module top (o, i1, i2); input [7:0] i1, i2; output [7:0] o; my_mod#(1) (o[1:0], i1[1:0], i2[1:0]); my_mod#(3) (o[7:2], i1[7:2], i2[7:2]); endmodule module my_mod(z, a, b); parameter w; input [2*w-1:0] a, b; output [2*w-1:0] z; assign z = a + b – c; Elaboration Input : Analyzed design unit list Output : Elaborated design unit list Functionality : Expand the complete design hierarchy Generate a design unit list consisting of distinct design units Resolve all parameter values Compute all the constant expression module my_mod_1(z, a, b); input [1:0] a, b; output [1:0] z; assign z = a + b – c; endmodule module my_mod_3(z, a, b); input [5:0] a, b; output [5:0] z; assign z = a + b – c; endmodule module top (o, i1, i2); input [7:0] i1, i2; output [7:0] o; my_mod_1 (o[1:0], i1[1:0], i2[1:0]); my_mod_3 (o[7:2], i1[7:2], i2[7:2]); endmodule interra confidential

16 Synthesis Flow (cdfg) Generation of Control and Data Flow Graphs
Input : Elaborated Language dependent Data Structure Output : Language Independent Control and Data Flow Graphs(CDFG) START END IF ENDIF = NOP + t c z a b m n module my_mod(z,a,b,c,m,n); input [1:0] a, b, c; input m, n; reg[1:0] z; reg [1:0] z; reg [1:0] t; or b or c or m or n) begin if(m) t = a; else if (n) t = b; z = t + c; end endmodule interra confidential

17 Synthesis Flow (cdfg) Distinct component of synthesis routine:
CDFG Generation Populate Language independent representation of the input design as a Control and Data Flow Graph Functional flow input language dependent Input: Inmemory representation of the entire design created by analyzer Output: Language independent representation of the entire design as a directed graph Graph is created for each concurrent block and represents sequential behaviour of the design Each node in Graph represents either control node or data node Each edge in Graph represents either control flow or data flow interra confidential

18 Synthesis Flow (dfa) Data Flow Analysis and Creating Logic with Generic Gates Traverse the CDFG created for each concurrent block Calculate the driving logic for each assign object in each path and store them as logic equation Both data logic and control logic are evaluated Realize an abstract structure of the input design START END IF ENDIF = NOP + t c z a b m n MUX LATCH adder b a m n c z interra confidential

19 Synthesis Flow (dfa) We analyze the cdfg and store the data in intermediate forms called path variable array(PVA) and path variable matrix(PVM) Path Variable Array(PVA) one for each path array of lhs-rhs pair. p = a + b; q = ~en ~en q a+b p rhs lhs interra confidential

20 Synthesis Flow (dfa) Path Variable Matrix(PVM) n NULL m r b q a+b a p
Created each time paths join rows represent lhs(signals getting assigned) columns are paths For each column(path) there is enabling condition n NULL m r b q a+b a p m == 3 m == 2 m == 1 lhs\cond interra confidential

21 Synthesis Flow (dfa) Data Flow Analysis
Each path consists of path segments and for each path segment data and control value are evaluated for each assigned object. These values are stored in PVA (Path Variable Array) A special construct PVM (Path Variable Matrix) is created out of PVAs to hold value of the objects in different paths. Each column in PVM represents a particular path and each row represents a particular object. Each entry in Matrix represents logic value of a particular object in a particular path. interra confidential

22 Synthesis Flow (dfa) Data Flow Analysis (Example) interra confidential
START END IF ENDIF = NOP + t c z a b m n PVA : P1 PVM: M1 PVA : P12 PVA : P11 PVA : P12 PVA : P121 PVM : M2 PVA : M3 interra confidential

23 Synthesis Flow (dfa) Data Flow Analysis (Example)
For each sequential block, one root PVA and one root PVM are allocated (P1, M1) Starting from each branch node new PVA is created for each path segment.(P11 and P12) When hit a join node, new PVM (M2) is created out of PVAs (P11 and P12) This PVM is passed to allocator for allocating current data and control logic Clock, Tristate and Hold logic is allocated only from Root PVM (M1) interra confidential

24 Synthesis Flow (dfa) Inferring Logic from PVM
Each row of PVM is analyzed and logic inferred. For row in which all colums have values one hot mux is inferred For row in which some columns are empty, latch is infered Latch, flip-flop and tristate are allocated from root PVM: M1 MUX b a m lhs\cond m ~m d a b d lhs\cond m ~m d a NULL LATCH a d m interra confidential

25 Synthesis Flow(dfa example)
Lets now infer logic for the CDFG that we had created Initial PVM just has initial values(NULL) At first join node PVM M2 is created Since infers to latch we wait till root PVM:M3 Since t_1 is not yet allocated. The PVM is divided into PVM for data and PVM for hold logic lhs\cond n ~n t_1 b NULL lhs\cond m ~m t a t_1 interra confidential

26 Synthesis Flow(dfa example)
PVM for data logic PVM for hold logic t_data goes to data pin. t_hold goes to hold pin and the output is t Finally logic for z is infered for root PVM MUX b a n lhs\cond m ~m t_2 a b t_data lhs\cond m ~m t_2 NULL ~n m t_hold n t + z c interra confidential

27 Synthesis Flow(dfa example)
Inferred netlist for the CDFG m RTL_LD b RTL_MUX M_RTL_ADD a m c n interra confidential

28 Synthesis Flow (cont.) Allocation and Scheduling
Schedule the clock cycle in which to perform the operation Allocate actual hardware resource for each logic operation Bind the allocated resource with the input and output data Transform the design into netlist form by instantiating cell/macro and connects them to achieve the functionality interra confidential

29 Synthesis Flow (cont.) Allocation and Scheduling
Example of Data Flow Path for scheduling Trivial Scheduling Assumes infinite resources All operations in 1 clock cycle Large clock cycle Latency is 0 * * * * + * * + < Clock Period - - interra confidential

30 Synthesis Flow (cont.) Allocation and Scheduling ASAP Scheduling
One operation per clock cycle Independent operations done parallel Operations done ASAP Smaller clock Latency is number of levels * * * * + T1 * * + < T2 - T3 - T4 interra confidential

31 Synthesis Flow (cont.) Allocation and Scheduling
Scheduling under resource constraint Resource available 1 multiplier 1 add/sub Small clock(same as ASAP) Small area Large latency * + T1 * < T2 * T3 * - T4 * T5 * - T6 + T7 interra confidential

32 Synthesis Flow(cont) Macro Generation
Operators in Data Flow Paths like adders, multipliers which are allocated as Macros are build in terms of primitive cells Input: Netlist with macro Instances Ouput: Netlist in terms of primitive instances only Functionality Based on the macro(operator type), input width and input type(signed, unsigned) appropriate operator generator are called. generator replaces the macro with primitive gates like PRIM_AND, PRIM_XOR. interra confidential

33 Synthesis Flow (cont.) Optimization
Circuit cost whether area or speed is optimized. Optimization in concorde is mainly done by SIS Hanging logic removal, removal of not gates connected in series, parallel instance removal etc. is done traversing the netlist in concorde code. interra confidential

34 Cube Representation of function
Synthesis Flow(cont) Logic Optimization Lets discuss algorithm for one such case (expand) Function to optimize is FON = ab’c’ + a’b’c’ + a’bc’ + a’b’c Fdon’t care = abc’ FOFF can be computed to ab’c + a’bc + abc Tabular representation FON FOFF a b c a b c ab’c’ ab’c a’b’c’ a’bc a’bc abc a’b’c c b a Cube Representation of function interra confidential

35 Synthesis Flow(cont) Expand Algo interra confidential
Foreach row of FON foreach column of row if (FON[row][column] != *) F = FON F[row][column] = * if (FFOFF == ) foreach row2 of F if (row != row2 && F[row]F[row2] == F[row]) { erase F[row2]; FON = F interra confidential

36 Synthesis Flow(cont) Expand Algo
Tabular Representation Cube Representation FON FOFF 0 0 1 * * * 0 0 0 0 erase erase * * * * * 0 * * 0 * * 0 * * * interra confidential

37 Synthesis Flow(cont) Sequential Optimization
Several Kinds of Sequential Optimization Techniques are also present. Lets consider one such Optimization(retiming) Flip Flop or Latch position is moved along the path to optimize area and speed interra confidential

38 Synthesis Flow (cont.) Technology Mapping & Optimization
Map the generic synthesized netlist using customer specific library cell Rule Based Mapping Algorithm Based Mapping Mapping criteria get minimum area get minimum delay interra confidential

39 Synthesis Flow (cont.) Technology Mapping & Optimization
Lets consider Dynamic Programming based mapping to optimize area Library cells are converted to NAND, INV tree based on it’s logic Library and NAND-INV tree INV NAND AND IOR interra confidential

40 Synthesis Flow (cont.) Technology Mapping & Optimization
Design is also converted to NAND_INV tree Algorithm Cost of a cell is it’s Area Cost of Input pins is 0 Cost of a vertex is cost of cell whose pattern matches the pattern at vertex + vertex cost at inputs If multiple cell patterns match pattern at the vertex. We will take the cell which results in minimum vertex cost Compute cost for all vertex from input to output interra confidential

41 Synthesis Flow (cont.) Technology Mapping & Optimization
Cost of V1 = cost(NAND) = 5 Cost of V2 = min(cost(INV)+cost(V1), cost(AND)) = 6 Cost of V3 = min(cost(IOR)+cost(V1),cost(NAND)+cost(V2)) = 10 INPUT DESIGN MIN AREA IMPLEMETATION 1 2 3 interra confidential

42 Synthesis Flow (cont.) Writing Structural Netlist
Write synthesized netlist in any desired format to output text files Output netlist is in structural form. interra confidential

43 Synthesis Goals and Constraints
RTL Level hardware description can be implemented in many ways[macro(architectural), or micro(logic) level] b c c a a b a + + c + b + + + a+b+c a+b+c a+b+c Architectural choices x x y y z Logic choices z interra confidential

44 Synthesis Goals and Constraints
Goals and Constraints help Synthesis Tool to make the choices Goals can be maximize speed or minimize area, power Constraints are more detailed Goals Constraints at Chip Level Minimize area for a given Clock speed Maximize speed as long as the design fits into a FPGA of specific size Constraints at Block Level are more complex interra confidential

45 Constraints at Block Level
Input Delay specifying the data arrival time at each input seperately. Output Delay specifies the extra delay after the output. The current design must make the output data arrive earlier to take care of this case. Clock waveform needs to be specified. Specific paths can be specified with specific delay to meet interra confidential

46 Synthesizing Big Design
Big Designs take too much memory and time to be Synthesized together. Divided into blocks(modules) and the blocks are synthesized separately Synthesis is done bottom up. Leaf level blocks are synthesized first. Constraints need to be computed from the Top, since constraint at each block comes from constraint of the whole chip. interra confidential

47 Synthesizing Big Design
Designers divide the total chip area into area constraint for each block The block constraints can be total area or width and height of each block. Pin positions of each block are determined. Synthesis tool only takes in the area. The other constraints (width, height, pin positions) are for placement tools B1 B3 B2 B4 B5 B6 B7 Chip Layout interra confidential

48 Synthsizing Big Design
Similarly designers divides the clock period into timing constraints for each block. Say the clock period is 20ns. For B1 Flip Flop to output can be 7ns, for B2 input to output can be 5 ns. For B3 input to Flip Flop is 8ns. B1 B2 B3 Design with Blocks(abstract) interra confidential

49 Synthesizing Big Designs
This process of dividing chips resources is called bugeting. Buggeting is mostly manual but there are some tools to help in bugeting The process is mostly iterative. After Synthesis designers often find blocks that couldn’t meet the constraints. Designers normally redo the buggeting and Synthesizes again. interra confidential

50 Variations in Synthesis
Higher Level Synthesis Input is at higher level than RTL Alternate Target Synthesis Output not at Gate Level Timing Driven Synthesis interra confidential

51 Higher Level Synthesis
Behavioural Synthesis Synthesis done from Behavioral Level Output is normally RTL Unlike RTL Synthesis(regular Synthesis), architechture selection is done by the tool based on constraints Scheduling is non trivial. Clock is used to divide the data paths into different time slots Resources are shared if they are in different time slots interra confidential

52 Higher Level Synthesis
Protocol Synthesis Input in Language specific for describing Communication Protocols between designs Output is RTL Description for Synthesis Sometimes also produces C model for verification Examples are Synopsys’s Protocol Compiler Austin Protocol Compiler(APC) of The University of Texas at Austin ALFred Protocol Compiler interra confidential

53 Higher Level Synthesis
Example of Protocol input in Timed Asynchronous Protocol(TAP) process pe const Rp: integer=0; Bq: integer=0; tr: integer=10; qe: address var sp: integer = 0; sq: array [2] of integer = 0; d, e: integer; initialize: integer = 1 begin act sendrqst in 0; initialize := 0 timeout sendrqst rst.e:=NCR(Bq,2,sq[0],sq[1]); send rqst to qe; act resend in tr; rcv rqst from qe d:=DCR(Bq,0,rqst.e); e:= DCR(Bq,1,rqt.e); if (sp=d)(sp=e) sp:=e; reply.e:= NCR(Bq,1,sp); log(“detected adversary”); fi timeout resend if sq[0] = sq[1] rqst.e:=NCR(Bq,2,1,sq[1]); send rqst to qe; act resend in tr; skip; rcv reply from qe d:= DCR(Rp,0,reply.e); if sq[1] = d sq[0]:=sq[1]; end interra confidential

54 Alternate Target Synthesis
FPGA Synthesis Special Mapping to Programmable gates e.g 4 input gates(often called LUT) that can be programmed to any 4 input logic Dedicated resources needs special care while mapping and cost computation. Gates using carry chain wires have different delay from regular wires that go through switch boxes. Architechture specific Optimization LUT Switch Box LUT LUT interra confidential

55 Alternate Target Synthesis
Physical Synthesis Generates directly Placed Gates Design Convergence is guarantied Constraint that meets in Synthesis may not meet after placement. We normally need to redo the Synthesis. Physical Synthesis helps to avoid this iteration interra confidential

56 Timing Driven Synthesis
Synthesis is done directly to technology gates. Synthesis is done from input towards output(light to dark) Architechtures are selected while synthesizing based on the delays interra confidential

57 Q & A Thank you interra confidential


Download ppt "by: Saikat Bandyopadhyay © Interra Systems India Pvt Ltd"

Similar presentations


Ads by Google