Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Similar presentations


Presentation on theme: "Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of."— Presentation transcript:

1 Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of Pipelined Circuits

2 Overall Goal Modular, Asynchronous, Sequential Specification Efficient, Synchronous, Parallel Implementation in Synthesizable Verilog

3 RESET Instruction Fetch Register Operand Fetch Compute and Writeback Example: Specification PC IM... RF wen inc br inc

4 Specification Properties Basic Concepts State Registers and Memories Conceptually Infinite Queues Modules (state transformers) Queues Provide Modularity Decouple Modules Enable Independent Development Promote Reusable Modular Designs

5 RESET Instruction Fetch Register Operand Fetch Compute and Writeback Example: Implementation PC IM RF wen inc br inc

6 Implementation Issues Synthesizing Efficient Combinational Logic Queue Finitization Synchronous Global Scheduling

7 Specification Language

8 Type Declarations type reg = int(3), val = int(8), loc = int(8); type ins = | ; type irf = | ; State Declarations var PC: loc, IM: ins[N], RF: val[8]; var IQ: queue(ins), RQ: queue(irf);

9 Modules Each module is set of update rules Each Update Rule Consists of Precondition Action (set of updates) Rule is enabled (and can execute) if precondition is true in current state When rule executes, atomically applies updates in action to produce new state

10 Update Rules And Modules Instruction Fetch Module TRUE  iq = append(iq,im[pc]), pc = pc + 1; Register Operand Fetch Module = head(iq) and notin(rq, )  iq = tail(iq), rq = append(rq, ); = head(iq) and notin(rq, )  iq = tail(iq), rq = append(rq, );

11 Update Rules And Modules Compute and Writeback Module = head(rq)  rf = rf[r = v+1], rq = tail(rq); = head(rq) and (v == 0)  pc = l, iq = nil, rq = nil; = head(rq) and (v !=0)  rq = tail(rq);

12 Abstract Model of Execution Conceptually, system execution is a sequence of rule executions while TRUE choose an enabled rule execute rule obtain new state Concepts in Abstract Execution Model Rules execute atomically Rules execute asynchronously Rules execute sequentially

13 Synthesis Algorithm

14 Starting Point Asynchronous, sequential abstract execution Conceptually infinite queues Goal: efficient synchronous global schedule In each clock cycle, multiple rules execute Synchronously and Concurrently (pipeline stages move together) Implement each queue with a finite hardware buffer Can read and write buffer in same cycle

15 Basic Idea At Each Clock Cycle Check each rule to see if enabled If so, atomically update state to reflect execution For each variable, generate expression that specifies new value at end of cycle Challenge: sequential, atomic semantics for rules Solution: symbolic rule execution

16 Final Result for PC if ( = rq and v == 0) new pc = location else if ((iq == nil) or ( = iq and != rq) or ( = iq and != rq)) new pc = pc+1 else new pc = pc

17 Algorithm Outline Rule Numbering: for symbolic execution Relaxation: shorten critical path by testing intial state Queue Finitization: ensure rules execute only if will be room for the result in output queues Symbolic Execution Optimizations Synthesizable Verilog Generation: from optimized expressions

18 Rule Numbering Goal: Resolve Conflicts Between Parallel Rule Executions Approach: For each state variable, number versions according to order Feed results of previous rule into next rule

19 TRUE  iq 1 = append(iq 0, im[pc 0 ]), pc 1 = pc 0 +1; = head(iq 1 ) and notin(rq 1, )  iq 2 = tail(iq 1 ), rq 2 = append(rq 1, ); = head(iq 2 ) and notin(rq 2, )  iq 3 = tail(iq 2 ), rq 3 = append(rq 2, ); = head(rq 3 )  rf 4 = rf 3 [r  v+1], rq 4 = tail(rq 3 ); = head(rq 4 ) and v != 0  rq 5 = tail(rq 4 ); = head(rq 5 ) and v == 0  pc 6 = l, rq 6 = nil, iq 6 = nil;

20 Relaxation Issue: rule numbering may produce long clock cycle Solution: for each rule R i with precondition P i for each variable instance v i in precondition P i replace v i with its earliest safe version... R k-1 : P k-1 -> v k =...... R i : P i (v i,...) ->...... v k safe for v i if either P i [v k /v i ] implies P i (P i,P k-1 ) mutually exclusive 01 2 3 => 01 3 2

21 Relaxation Result Queues separate pipeline stages Items traverse one stage per clock cycle Safety: If a rule executes in new system Then it also executes in old system And it generates same result Liveness: After relaxation, all rules test initial state If rule enabled in old system but not in new system, then Some rule executes in new system

22 TRUE  iq 1 = append(iq 0, im[pc 0 ]), pc 1 = pc 0 +1; = head(iq 0 ) and notin(rq 0, )  iq 2 = tail(iq 1 ), rq 2 = append(rq 1, ); = head(iq 0 ) and notin(rq 0, )  iq 3 = tail(iq 2 ), rq 3 = append(rq 2, ); = head(rq 0 )  rf 4 = rf 3 [r  v+1], rq 4 = tail(rq 3 ); = head(rq 0 ) and v != 0  rq 5 = tail(rq 4 ); = head(rq 0 ) and v == 0  pc 6 = l, rq 6 = nil, iq 6 = nil;

23 Queue Finitization Issue: Conceptually unbounded queues Finite hardware buffers Assumption: queues start within length at beginning of cycle Goal: generate circuit that makes queues remain within length at end of cycle Basic Approach: Before enabled rule executes Be sure will be room for result in output queues at end of clock cycle

24 Queue Finitization Algorithm Build Producer-Consumer Graph Nodes are rules Edge between rules if first inserts into queue and second removes from queue In Example: 1 2 3 4 5 6

25 Acyclic Graphs Process Rules in Topological Sort Order Augment execution precondition If rule inserts into a queue, require that either there is room in queue when rule executes or future rules will execute and remove items to make room in queue Each queue has counter of number of elements in queue at start of cycle Combinational logic tracks queue insertions and deletions

26 Example Instruction Fetch Rule After Queue Finitization Empty(iq 0 ) or ( = head(rq 0 ) and v == 0) or = head(iq 0 ) and notin(rq 0, ) or = head(iq 0 ) and notin(rq 0, )  iq 1 = append(iq 0, im[pc 0 ]); pc 1 = pc 0 +1;

27 Pipeline Implications Counter becomes presence bit for single element queues Additional preconditions can be viewed as pipeline stall logic Design can be written to generate pipeline forwarding/bypassing instead of stall

28 Cyclic Graphs Cyclic Graphs lead to Cyclic Dependences Rule 1 depends on rule 2 to remove an item from a queue But rule 2 depends on rule 1 to remove an item from another queue Algorithm from acyclic case would generate recursive preconditions rule 2rule 1

29 Solution to Cyclic Dependence Problem Groups of rules must execute together Use depth-first search on producer- consumer graph to find cyclic groups Augment preconditions to allow all rules in cycle to execute together Extensions include paths into and out of cyclic group

30 Symbolic Execution Substitute out all intermediate versions of variables Obtain expression for last version of each variable Each expression defines new value of corresponding variable

31 Optimizations Optimize expressions from symbolic execution CSE: avoid unnecessary replication of HW Mutual Exclusion Testing: Eliminate computation of values that never occur in practice as result of mutually exclusive preconditions

32 Symbolic Execution with Optimization Final result for rq, assuming single item queues if = head(rq) and v==0 new rq = nil else if =head(iq) and notin(rq, ) new rq = else if =head(iq) and notin(rq, ) new rq = else new rq = nil

33 Verilog Generation Synthesize HW directly from expressions: Each queue as one or more registers Each memory variable as library block Each state variable as one or more registers, depending on type Each expression as combinational logic that feeds back into corresponding registers

34 Experimental Results We have implemented synthesis system Used system to generate synthesizable Verilog for several specifications (map effort medium, area effort low, constraints 10ns) Benchmark Cycle Time Area (cells) Bubblesort 9.34ns ~370 Butterfly 9.57ns ~412 Processor 11.28ns ~387 Filter 9.51ns ~252

35 Conclusion Starting Point: (Good for Designer) Modular, Asynchronous, Sequential Specification with Conceptually Infinite Queues Ending Point: (Good for Implementation) Efficient, Synchronous, Globally Scheduled, Parallel Implementation with Finite Queues in Synthesizable Verilog Variety of Techniques: Symbolic Execution Queue Finitization


Download ppt "Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of."

Similar presentations


Ads by Google