Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESE534 Computer Organization

Similar presentations


Presentation on theme: "ESE534 Computer Organization"— Presentation transcript:

1 ESE534 Computer Organization
Day 7: February 17, 2014 Interconnect Introduction Penn ESE534 Spring DeHon

2 Previously Universal building blocks Computations with gates
Penn ESE534 Spring DeHon

3 Today Universal Spatially Programmable Crossbar
Programmable compute blocks Start thinking about optimizing interconnect Penn ESE534 Spring DeHon

4 Spatial Programmable Penn ESE534 Spring DeHon

5 Spatially Programmable
Program up “any” function Not sequentialize in time E.g. Want to build any FSM or any arithmetic operation Penn ESE534 Spring DeHon

6 Needs? Need a collection of gates. What else will we need?
Penn ESE534 Spring DeHon

7 Needs Need some registers Need way to programmably wire gates together
Penn ESE534 Spring DeHon

8 Multiplexer Interconnect
Use a multiplexer for programmable interconnect Can select any source to be an input for a gate How big is an N-input multiplexer? Penn ESE534 Spring DeHon

9 Sources? What are potential sources? Inputs to circuit -- I
Outputs of gates -- G Outputs of registers – R N=I+G+R Penn ESE534 Spring DeHon

10 Sinks Which things need programmable inputs? (and how many?)
Circuit outputs -- O Gate – needs one per input -- kG Assuming k-input gates Registers – R M = O+R+kG Penn ESE534 Spring DeHon

11 N-input, M-output Multiplexing
Area? Bits to control the multiplexers? Data input switching Capacitance Switched? Delay? Penn ESE534 Spring DeHon

12 Mux Programmable Interconnect
Area M×N = (I+G+R) × (O+kG+R) = kG2 + … Scales faster than gates! Penn ESE534 Spring DeHon

13 Interconnect Costs We can do better than this Even when we do better
Touch on a little later in lecture Dig into details later in term Even when we do better Interconnect can be dominate Area, delay, energy Particularly for Spatial Architectures Penn ESE534 Spring DeHon

14 Dominant Time LUT is the gate. Definition coming soon…
Penn ESE534 Spring DeHon

15 Dominant Power [Energy]
XC4003A data from Eric Kusse (UCB MS 1997) [Virtex II, Shang et al., FPGA 2002] [Tuan et al./FPGA 2006] Penn ESE534 Spring DeHon

16 Crossbar Penn ESE534 Spring DeHon

17 Crossbar Allows us to connect any of a set of inputs to any of the outputs. This is functionality provided with our muxes Penn ESE534 Spring DeHon

18 Crossbar Structure Can be more efficient
Penn ESE534 Spring DeHon

19 Crossbar Costs Area still goes as M×N Delay proportional to M + N
More realistic even for mux implementation Energy still goes as M×N Penn ESE534 Spring DeHon

20 Crossbar Notation Penn ESE534 Spring DeHon

21 Gates with Crossbar Interconnect
Penn ESE534 Spring DeHon

22 Programmable, Universal
Program to any function of N gates Limitation we only use nand2 gates? What can we say about functions of arbitrary 2-input gates? Penn ESE534 Spring DeHon

23 Universal Computation with Fixed Compute Operator
Being minimalists, this shows: do not need compute to be programmable Just use fixed nor2 or nand2 Penn ESE534 Spring DeHon

24 Programmable Functions
Penn ESE534 Spring DeHon

25 Mux can be a programmable gate
bool mux4(bool a, b, c, d, s0, s1) { return(mux2( mux2(a,b,s0), mux2(c,d,s0), s1)); } Penn ESE534 Spring DeHon

26 Program AND2? How do we program to behave as and2?
Penn ESE534 Spring DeHon

27 Mux as Logic bool and2(bool x, y)
{return (mux4(false,false,false,true,x,y));} bool or2(bool x, y) {return (mux4(false,true,true,true,x,y));} Just by routing “data” into this mux4, Can select any two input function Penn ESE534 Spring DeHon

28 LUT – LookUp Table When use a mux as programmable gate
Call it a LookUp Table (LUT) Implementing the Truth Table for small # of inputs # of inputs =k (need mux-2k) Just lookup the output result in the table Penn ESE534 Spring DeHon

29 Programmable Compute Can use programmable gate in place of nor gate
Penn ESE534 Spring DeHon

30 Programmable Compute How do we “program” to perform a particular function? Penn ESE534 Spring DeHon

31 Instructions Set of bits that tell the programmable units how to behave Penn ESE534 Spring DeHon

32 Is an Adder Universal? Assuming interconnect: A: 001a What’s c?
(big assumption as we have just seen) Consider: What’s c? A: 001a B: 000b S: 00cd Penn ESE534 Spring DeHon

33 Practically To reduce (some) interconnect,
and to reduce number of operations, do tend to build a bit more general “universal” computing function Penn ESE534 Spring DeHon

34 Arithmetic Logic Unit (ALU)
Observe: with small tweaks can get many functions with basic adder components Penn ESE534 Spring DeHon

35 Arithmetic and Logic Unit
Penn ESE534 Spring DeHon

36 ALU Functions A+B w/ Carry B-A A xor B (squash carry)
A*B (squash carry) /A Penn ESE534 Spring DeHon

37 Optimization and Design Space
Interconnect Optimization and Design Space Penn ESE534 Spring DeHon

38 Locality Maybe we don’t need to connect everything to everything?
Cluster groups of C things at leaves CG gates, CR registers Limit cluster I/O – CI, CO Crossbar within cluster Crossbar among clusters Penn ESE534 Spring DeHon

39 Compare Switches Penn ESE534 Spring DeHon

40 8×16=128 8×4+18×4=104 Penn ESE534 Spring DeHon

41 Comparing Full Crossbar needs: kG2 switches
How many switches needed for: CG gates per cluster CI inputs to cluster CO outputs from cluster (ignore registers and circuit input/output) Penn ESE534 Spring DeHon

42 Costs Cluster Input Crossbar: Cluster Output Crossbar:
Inputs: CI+CG Outputs: kCG Cluster Output Crossbar: Input: CG Output: CO Master Crossbar: Inputs: (G/CG) × CO Outputs: (G/CG) × CI (G/CG)×CO×(G/CG)×CI +(G/CG) ×(k×CG2+k×CG×CI+CG×CO) Penn ESE534 Spring DeHon

43 Costs Cluster: G2×(CI×CO/CG2)+k×G×(CG+CI+CO) Full Crossbar: kG2
Compare at: CG=8, CI=CO=2, G=256, k=2 Cluster case? Full crossbar case? 216×(2×2/82)+28×(8+2+2)×2 = ×29= ×211= 5×213 217 Penn ESE534 Spring DeHon

44 More? Can we do it again? Discuss Generalization? Issues?
Penn ESE534 Spring DeHon

45 Optimizing Interconnect
What other interconnect optimizations/structures might we use? Penn ESE534 Spring DeHon

46 Interconnect Design Space
Large interconnect design space We will be exploring systematically Day16—20+25 Penn ESE534 Spring DeHon

47 Big Ideas Interconnect can be programmable
Interconnect area/delay/energy can dominate compute area Exploiting structure can reduce area Locality Penn ESE534 Spring DeHon

48 Admin HW2 HW3 Wednesday Drop date is Friday HW4 out
On-time submissions graded (late assignments will be graded later) HW3 Wednesday Drop date is Friday HW4 out Reading for Wed. on Web Penn ESE534 Spring DeHon


Download ppt "ESE534 Computer Organization"

Similar presentations


Ads by Google