Penn ESE535 Spring 2013 -- DeHon 1 ESE535: Electronic Design Automation Day 3: January 16, 2013 Scheduled Operator Sharing.

Slides:



Advertisements
Similar presentations
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
Advertisements

EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2015 Scheduling Introduction.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day3: October 2, 2000 Arithmetic and Pipelining.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: April 2, 2008 Scheduling Introduction.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 21: April 15, 2009 Routing 1.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 4: January 22, 2007 Memories.
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
Penn ESE Spring DeHon 1 ESE : Computer Organization Day 3: January 17, 2007 Arithmetic and Pipelining.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 26, 2015 Covering Work preclass exercise.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 16, 2015 Scheduling Variants and Approaches.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 9, 2015 Scheduled Operator Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 13: March 3, 2015 Routing 1.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 10, 2005 Arithmetic and Pipelining.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 25, 2012 Sequential Logic (FSMs, Pipelining, FSMD)
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 27, 2010 Sequential Logic (FSMs, Pipelining, FSMD)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: March 1, 2003 Scheduling Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 13, 2003 Arithmetic and Pipelining.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 4, 2011 Static Timing Analysis and Multi-Level Speedup.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 11: February 11, 2002 Scheduling Introduction.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: January 31, 2011 Scheduling Variants and Approaches.
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE534: Computer Organization
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
Presentation transcript:

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 16, 2013 Scheduled Operator Sharing

Penn ESE535 Spring DeHon 2 Today Sharing Resources Area-Time Tradeoffs Throughput vs. Latency VLIW Architectures Scheduling (introduce) Behavioral (C, MATLAB, …) RTL Gate Netlist Layout Masks Arch. Select Schedule FSM assign Two-level, Multilevel opt. Covering Retiming Placement Routing

Penn ESE535 Spring DeHon 3 Compute Function Compute: y=Ax 2 +Bx +C Assume –D(Mpy) > D(Add) –A(Mpy) > A(Add)

Penn ESE535 Spring DeHon 4 Spatial Quadratic A(Quad) = 3*A(Mpy) + 2*A(Add)

Penn ESE535 Spring DeHon 5 Latency vs. Throughput Latency: Delay from inputs to output(s) Throughput: Rate at which can introduce new set of inputs

Penn ESE535 Spring DeHon 6 Washer/Dryer Example 1 Washer Takes 30 minutes 1 Dryer Takes 45 minutes How long to do one load of wash? –  Wash latency How long to do 5 loads of wash? Wash Throughput?

Penn ESE535 Spring DeHon 7 Spatial Quadratic D(Quad) = 2*D(Mpy)+D(Add) = 21 Throughput 1/(2*D(Mpy)+D(Add)) = 1/21 A(Quad) = 3*A(Mpy) + 2*A(Add) = 32 Latency?

Synchronous Discipline Compute –From registers –Through combinational logic –To new values for registers Delay through logic sets a lower bound on the duration of each clock – the clock cycle Penn ESE370 Fall DeHon 8

Penn ESE535 Spring DeHon 9 Pipelined Spatial Quadratic D(Quad) = 3*D(Mpy) = 30 Throughput = 1/D(Mpy) = 1/10 A(Quad) = 3*A(Mpy)+2*A(Add)+6A(Reg) = 35

Penn ESE535 Spring DeHon 10 Quadratic with Single Multiplier and Adder? We’ve seen reuse to perform the same operation –pipelining We can also reuse a resource in time to perform a different role. –Here: x*x, A*(x*x), B*x –also: (Bx)+c, (A*x*x)+(Bx+c)

Penn ESE535 Spring DeHon 11 Quadratic Datapath Start with one of each operation

Penn ESE535 Spring DeHon 12 Multiplexer Gate allows us to select data from multiple sources Mux –For short Useful when sharing operators select i0 i1 o=i0*/select+ i1*select

Penn ESE535 Spring DeHon 13 Quadratic Datapath Multiplier serves multiple roles –x*x –A*(x*x) – B*x Use multiplexer to steer data (switch interconnections) –A(mux) < A(multiply)

Penn ESE535 Spring DeHon 14 Quadratic Datapath Multiplier serves multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Penn ESE535 Spring DeHon 15 Quadratic Datapath Multiplier serves multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Penn ESE535 Spring DeHon 16 Quadratic Datapath Adder serves multiple roles –(Bx)+c –(A*x*x)+(Bx+c) one always mpy output C, Bx+C

Penn ESE535 Spring DeHon 17 Quadratic Datapath

Penn ESE535 Spring DeHon 18 Quadratic Datapath Add input register for x

Penn ESE535 Spring DeHon 19 Cycle Impact? Need more cycles How about the delay of each cycle? –Add mux delay –Register setup/hold time, clock skew –Limited by slowest operation –Cycle? D(Mpy)+2*D(Mux2) = 10.2

Penn ESE535 Spring DeHon 20 Quadratic Control Now, we just need to control the datapath What control? Control: –LD x –LD x*x –MA Select –MB Select –AB Select –LD Bx+C –LD Y

Penn ESE535 Spring DeHon 21 Quadratic Control 1.LD_X 2.MA_SEL=x,MB_SEL[1:0]=x, LD_x*x 3.MA_SEL=x,MB_SEL[1:0]=B 4.AB_SEL=C,MA_SEL=x*x, MB_SEL=A, LD_Bx+C 5.AB_SEL=Bx+C, LD_Y [Could combine 1 and 5 and do in 4 cycles; analysis that follows assume 5 as shown.]

Penn ESE535 Spring DeHon 22 Quadratic Memory Control 1.LD_X 2.MA_SEL=x, MB_SEL[1:0]=x, LD_x*x 3.MA_SEL=x, MB_SEL[1:0]=B 4.AB_SEL=C, MA_SEL=x*x, MB_SEL=A, LD_Bx+C 5.AB_SEL=Bx+C, LD_Y

Penn ESE535 Spring DeHon 23 Quadratic Datapath Latency/Throughput/Area? Latency: 5*(D(MPY)+D(mux3))=51 Throughput: 1/Latency ~= 0.02 Area: A(Mpy)+A(Add)+5*A(Reg) +2*A(Mux2)+A(Mux3)+A(Imem)=17.5+ A(Imem)

Penn ESE535 Spring DeHon 24 Quadratic with 2 Mult, 1 Add Latency/Throughput/Area? step XX+ 1X*XB*X 2A*(X*X)(B*X)+C 3(A*X*X*X) +(B*X+C)

Penn ESE535 Spring DeHon 25 Quadratic with 2 Mult, 1 Add Latency = 3*(D(Mpy)+D(Mux))=30.3 Throughput = 1/30.3 ~=0.03 Area = 2*A(Mpy)+4*A(Mux2)+A(Add)+3*A(Reg) = 26.5 step XX+ 1X*XB*X 2A*(X*X)(B*X)+C 3(A*X*X*X) +(B*X+C)

Penn ESE535 Spring DeHon 26 Quadratic: Area-Time Tradeoff DesignAreaThroughputLatency 3M2A (pipe) M1A M1A

Penn ESE535 Spring DeHon 27 Registers  Memory Generally can see many registers If # registers >> physical operators –Only need to access a few at a time Group registers into memory banks

Penn ESE535 Spring DeHon 28 Memory Bank Quadratic Store x x*x B*x A*x 2 ; B*x+c (A*x 2 )+(B*x+c) X +

Penn ESE535 Spring DeHon 29 Memory Bank Quadratic Store x x*x B*x A*x 2 ; B*x+c (A*x 2 )+(B*x+c) X + x x2x2 x B A Bx Ax 2 c Bx+c

Penn ESE535 Spring DeHon 30 Cycle Impact? How cycle changed? Add mux delay Register setup/hold time, clock skew Memory read/write –Could pipeline X +

Penn ESE535 Spring DeHon 31 Cycle Impact? Add mux delay Register setup/hold time, clock skew Memory read/write –Could pipeline –Impact? Latency Throughput? X +

Impact When have big operators –Like multiplier Can share them to reduce area –At cost of throughput –Maybe at cost of latency, energy This gives a rich trade space Penn ESE535 Spring DeHon 32

Details At extreme, number of “big” operators is dominant cost –Total number for area –Number in path for delay Does cost additional area, delay to share them –sometimes a lower order cost Penn ESE535 Spring DeHon 33

Penn ESE535 Spring DeHon 34 VLIW Very Long Instruction Word Set of operators –Parameterize number, distribution (X, +, sqrt…) More operators  less time, more area Fewer operators  more time, less area Memories for intermediate state Memory for “long” instructions Schedule compute task General framework for specializing to problem –Wiring, memories get expensive –Opportunity for further optimizations General way to tradeoff area and time

Penn ESE535 Spring DeHon 35 VLIW + X X Address Instruction Memory

Penn ESE535 Spring DeHon 36 Review Reuse physical operators in time Share operators in different roles Allows us to reduce area at expense of increasing time Area-Time tradeoff Pay some sharing overhead –Muxes, memory VLIW – general formulation for shared datapaths

Design Automation Sets up two problems for us: Provisioning –(Architecture Selection) –After Spring Break Scheduling –Start introducing now –Next two lectures Penn ESE535 Spring DeHon 37 Behavioral (C, MATLAB, …) RTL Gate Netlist Layout Masks Arch. Select Schedule FSM assign Two-level, Multilevel opt. Covering Retiming Placement Routing

Penn ESE535 Spring DeHon 38 General Problem Resources are not free –Wires, io ports –Functional units LUTs, ALUs, Multipliers, …. –Memory access ports –State elements memory locations Registers –Flip-flop –loadable master-slave latch –Multiplexers (mux) select i0 i1 o=i0*/select+ i1*select

Penn ESE535 Spring DeHon 39 Trick/Technique Resources can be shared (reused) in time Sharing resources can reduce –instantaneous resource requirements –total costs (area) Pattern: scheduled operator sharing

Penn ESE535 Spring DeHon 40 Example

Penn ESE535 Spring DeHon 41 Example Assume unit delay operators. How many operators do I need to evaluate this computation in ~5 time units.

Penn ESE535 Spring DeHon 42 Sharing Does not have to increase delay –w/ careful time assignment –can often reduce peak resource requirements –while obtaining original (unshared) delay Alternately: Minimize delay given fixed resources

Penn ESE535 Spring DeHon 43 Schedule Examples time resource

Penn ESE535 Spring DeHon 44 More Schedule Examples

Penn ESE535 Spring DeHon 45 Scheduling Task: assign time slots (and resources) to operations –time-constrained: minimizing peak resource requirements n.b. time-constrained, not always constrained to minimum execution time –resource-constrained: minimizing execution time

Penn ESE535 Spring DeHon 46 Resource-Time Example Time Constraint: <5  -- 5  4 6,7  2 >7  1 Time Area

Penn ESE535 Spring DeHon 47 Scheduling Use Very general problem formulation –HDL/Behavioral  RTL –Register/Memory allocation/scheduling –Instruction/Functional Unit scheduling –Processor tasks –Time-Switched Routing TDMA, bus scheduling, static routing –Routing (share channel)

Penn ESE535 Spring DeHon 48 Two Types (1) Data independent –graph static –resource requirements and execution time independent of data –schedule staticly –maybe bounded-time guarantees –typical ECAD problem

Penn ESE535 Spring DeHon 49 Two Types (2) Data Dependent –execution time of operators variable depend on data –flow/requirement of operators data dependent –if cannot bound range of variation must schedule online/dynamically cannot guarantee bounded-time general case (I.e. halting problem) –typical “General-Purpose” (non-real-time) OS problem

Penn ESE535 Spring DeHon 50 Unbounded Resource Problem Easy: –compute ASAP schedule (next slide) I.e. schedule everything as soon as predecessors allow –will achieve minimum time –won’t achieve minimum area (meet resource bounds)

Penn ESE535 Spring DeHon 51 ASAP Schedule As Soon As Possible (ASAP) For each input –mark input on successor –if successor has all inputs marked, put in visit queue While visit queue not empty –pick node –update time-slot based on latest input –mark inputs of all successors, adding to visit queue when all inputs marked

Penn ESE535 Spring DeHon 52 ASAP Example Work Example

Penn ESE535 Spring DeHon 53 ASAP Example

Penn ESE535 Spring DeHon 54 Also Useful to Define ALAP As Late As Possible Work backward from outputs of DAG Also achieve minimum time w/ unbounded resources Rework Example

Penn ESE535 Spring DeHon 55 ALAP Example

Penn ESE535 Spring DeHon 56 ALAP and ASAP Difference in labeling between ASAP and ALAP is slack of node –Freedom to select timeslot –Class theme: exploit freedom to reduce costs If ASAP=ALAP, no freedom to schedule

Penn ESE535 Spring DeHon 57 ASAP, ALAP, Difference ASAP ALAP

Penn ESE535 Spring DeHon 58 Big Ideas: Scheduled Operator Sharing Area-Time Tradeoffs

Penn ESE535 Spring DeHon 59 Admin Assignment 1 out today –Grab from syllabus –Due next Monday –Includes Tools warmup Reading for Wednesday online