CS 150 - Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings.

CS 150 - Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings yAccelerate or retard cycle in which outputs are asserted zParallelism yDoing more than one thing at a time zPipelining ySplitting computations into overlapped, smaller time steps

CS 150 - Spring 2007 – Lec #16 – Retiming - 2 Synchronizer Circuitry at Inputs and Outputs Synchronizer Circuitry at Inputs and Outputs Recall: Synchronous Mealy Machine Discussion zPlacement of flipflops before and after the output logic changes the timing of when the output signals are asserted …

CS 150 - Spring 2007 – Lec #16 – Retiming - 3 Recall: Synchronous Mealy Machine with Synchronizers Following Outputs Case III: Synchronized Outputs A asserted during Cycle 0, ƒ' asserted in next cycle Effect of ƒ delayed one cycle Signal goes into effect one cycle later

CS 150 - Spring 2007 – Lec #16 – Retiming - 4 Vending Machine State Machine zMoore machine youtputs associated with state 0¢ [0] 10¢ [0] 5¢ [0] 15¢ [1] N’ D’ + Reset D D N N+D N N’ D’ Reset’ N’ D’ Reset Mealy machine outputs associated with transitions 0¢0¢ 10¢ 5¢5¢ 15¢ (N’ D’ + Reset)/0 D/0 D/1 N/0 N+D/1 N/0 N’ D’/0 Reset’/1 N’ D’/0 Reset/0

CS 150 - Spring 2007 – Lec #16 – Retiming - 5 Open asserted only when in state 15 State Machine Retiming zMoore vs. (Async) Mealy Machine yVending Machine Example Open asserted when last coin inserted leading to state 15

CS 150 - Spring 2007 – Lec #16 – Retiming - 6 State Machine Retiming zRetiming the Moore Machine: Faster generation of outputs zSynchronizing the Mealy Machine: Add a FF, delaying the output zThese two implementations have identical timing behavior Push the AND gate through the State FFs and synchronize with an output FF Like computing open in the prior state and delaying it one state time

CS 150 - Spring 2007 – Lec #16 – Retiming - 7 Out calc Plus set-up NOTE: overlaps with Next State calculation State Machine Retiming zEffect on timing of Open Signal (Moore Case) Clk FF prop delay State Open Out prop delay Retimed Open Calculation

CS 150 - Spring 2007 – Lec #16 – Retiming - 8 State Machine Retiming zTiming behavior is the same, but are the implementations really identical? FF input in synchronous Mealy implementation FF input in retimed Moore implementation Only difference in don’t care case of nickel and dime at the same time

CS 150 - Spring 2007 – Lec #16 – Retiming - 9 Parallelism zExample, Student final grade calculation: read mt1, mt2, mt3, project; grade = 0.2  mt1 + 0.2  mt2 + 0.2  mt3 + 0.4  project; write grade; zHigh performance hardware implementation: As many operations as possible are done in parallel Doing more than one thing at a time: optimization in h/w often involves using parallelism to trade between cost and performance

CS 150 - Spring 2007 – Lec #16 – Retiming - 10 Parallelism zIs there a lower cost hardware implementation? Different tree organization? zCan factor out multiply by 0.2: zHow about sharing operators (multipliers and adders)?

CS 150 - Spring 2007 – Lec #16 – Retiming - 11 Time Multiplexing zReuse single ALU for all adds and multiplies yLower hardware cost, longer latency yBUT must add muxes/registers/control zConsider the combinational hardware circuit diagram as an abstract computation-graph: Alternative building blocks

CS 150 - Spring 2007 – Lec #16 – Retiming - 12 Time Multiplexing x + 0.2mt1 x + 0.4Proj 0 x + 0.2mt3 x + 0.2mt2 x + 0.2 0.4 mt1 mt2 mt3 proj 0 Time-multiplexing “covers” the computation graph by performing the action of each node one at a time

CS 150 - Spring 2007 – Lec #16 – Retiming - 13 Time Multiplexing x 0.2mt3 x + 0.4Proj x 0.2mt1 x + 0.2mt2 x 1 x + 1 x 0.2 1 mt3 mt1 x + 0.4 0.2 1 Proj mt2

CS 150 - Spring 2007 – Lec #16 – Retiming - 14 Pipelining Principle zPipelining review from CS61C: Analog to washing clothes: step 1:wash(20 minutes) step 2:dry(20 minutes) step 3:fold(20 minutes) 60 minutesx 4 loads  4 hours washload1 load2 load3 load4 dry load1 load2 load3 load4 fold load1 load2 load3 load4 20 min overlapped  2 hours

CS 150 - Spring 2007 – Lec #16 – Retiming - 15 Pipelining wash load1 load2 load3 load4 dry load1 load2 load3 load4 fold load1 load2 load3 load4 Increase number of loads, average time per load approaches 20 minutes Latency (time from start to end) for one load = 60 min Throughput = 3 loads/hour Pipelined throughput  # of pipe stages x un-pipelined throughput

CS 150 - Spring 2007 – Lec #16 – Retiming - 16 Pipelining zGeneral principle: zCut the CL block into pieces (stages) and separate with registers: T’ = 4 ns + 1 ns + 4 ns +1 ns = 10 ns F = 1/(4 ns +1 ns) = 200 MHz zCL block produces a new result every 5 ns instead of every 9 ns Assume T = 8 ns T FF (setup +clk  q) = 1 ns F = 1/9 ns = 111 MHz Assume T1 = T2 = 4 ns

CS 150 - Spring 2007 – Lec #16 – Retiming - 17 Limits on Pipelining zWithout FF overhead, throughput improvement proportional to # of stages After many stages are added. FF overhead begins to dominate: Other limiters to effective pipelining: Clock skew contributes to clock overhead Unequal stages FFs dominate cost Clock distribution power consumption feedback (dependencies between loop iterations) FF “overhead” is the setup and clk to Q times.

CS 150 - Spring 2007 – Lec #16 – Retiming - 18 Pipelining Example zF(x) = y i = a x i 2 + b x i + c zx and y are assumed to be “streams” zDivide into 3 (nearly) equal stages. zInsert pipeline registers at dashed lines. zCan we pipeline basic operators? z Computation graph:

CS 150 - Spring 2007 – Lec #16 – Retiming - 19 Example: Pipelined Adder zPossible, but usually not done … (arithmetic units can often be made sufficiently fast without internal pipelining)

CS 150 - Spring 2007 – Lec #16 – Retiming - 20 State Machine Retiming Summary zRetiming yVending Machine Example xVery simple output function in this particular case yBut if output takes a long time to compute vs. the next state computation time -- can use retiming to “balance” these calculations and reduce the cycle time zParallelism yTradeoffs in cost and performance yTime reuse of hardware to reduce cost but sacrifice performance zPipelining yIntroduce registers to split computation to reduce cycle time and allow parallel computation yTrade latency (number of stage delays) for cycle time reduction

CS 150 - Spring 2007 – Lec #16 – Retiming - 21 Announcements zMidterm II -- NEXT Thursday, 22 March in CS 150 Laboratory, 2:10 - 3:30+ (that is one week from today -- tell your friends!) yClosed book, open double sided crib sheet yTA review session next week yFive or so design-oriented questions covering: xState machine word problems xMemory systems xDatapath design xRegister transfer xController implementation Time state Jump Counter Branch Sequencer Horizontal and Vertical Microprogramming xRetiming, Parallelism, Pipelining xLabs 4, 5, Checkpoint 0, 1

CS 150 - Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings.

Similar presentations

Presentation on theme: "CS 150 - Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 150 - Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings.

Similar presentations

Presentation on theme: "CS 150 - Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings."— Presentation transcript:

Similar presentations

About project

Feedback