Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.

Similar presentations


Presentation on theme: "CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices."— Presentation transcript:

1 CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices

2 Today Finish up empirical comparisons Instructions and their impact Modeling

3 Multiply

4 FIR

5 IIR/Biquad

6 DES Keysearch

7 DNA Sequence Match Problem: “cost” of transform S 1  S 2 Given: cost of insertion, deletion, substitution Relevance: similarity of DNA sequences –evolutionary similarity –structure predict function Typically: new sequence compared to large databse

8 DNA Sequence Match

9 Floating-Point Add (single prec.)

10 Floating-Point Mpy (single prec.)

11 Summary Raw densities (Area-Time) –FPGA/custom = 100x –Processor/custom = 1000x Special-purpose functional units in processors/DSPs, much lower net benefit since need to control and interconnect Gap narrows (closes) as programmable can be specialized

12 Modeling Why do we model?

13 Terminology Primitive Instruction (pinst) –Collection of bits which tell a bit-processing element what to do –Includes: select compute operation input sources in space (interconnect) input sources in time (retiming)

14 Computational Array Model Collection of computing elements –compute operator –local storage/retiming Interconnect Instruction

15 “Ideal” Instruction Control Issue a new instruction to every computational bit operator on every cycle

16 “Ideal” Instruction Distribution Why don’t we do this?

17 “Ideal” Instruction Distribution Problem: Instruction bandwidth (and storage area) quickly dominates everything else –Compute Block ~ 1M   x  –Instruction ~ 64 bits –Wire Pitch ~  –Memory bit ~  

18 Instruction Distribution 64x8 =512 Two instructions in 1024

19 Instruction Distribution Distribute from both sides = 2x

20 Instruction Distribution Distribute X and Y = 2x

21 Instruction Distribution Room to distribute 2 instructions across PE per metal layer (1024 = 2x8x64) Feed top and bottom (left and right) = 2x Two complete metal layers = 2x => 8 instructions / PE Side

22 Instruction Distribution Maximum of 8 instructions per PE side Saturate wire channels at 8x  N = N => at 64 PE –beyond this instruction distribution dominates area

23 Instruction Distribution Beyond 64 PE, instruction bandwidth dictates PE size –  Pe area /(64x8 )x4  N = N –Pe area =16K  x N

24 Instruction Memory Requirements Idea: put instruction memory in array Problem: Instruction memory can quickly dominate area, too –Memory Area = 64x1.2K  /instruction –PE area = 1M  + (Instructions)x80K 

25 Instruction Pragmatics Instruction requirements could dominate array size. Standard architecture trick: –Look for structure to exploit in “typical computations”

26 Typical Structure? What structure do we usually expect?

27 Two Extremes SIMD Array (microprocessors) –Instruction/cycle –share instruction across array of PE s –uniform operation in space –operation variance in time FPGA –Instruction/PE –assume temporal locality of instructions (same) –operation variance in space –uniform operations in time

28 Hybrids VLIW (SuperScalar) –Few pinsts/cycle –Share instruction across w bits DPGA –Small instruction store / PE

29 Architecture Instr. Taxonomy

30 Modeling Instruction Effects Restrictions from “ideal” save area Restriction from “ideal” limits usability (yield) of PE Want to understand effects –area model –utilization/yield model

31 Efficiency/Yield Intuition What happens when –Datapath is too wide? –Datapath is too narrow? –Instruction memory is too deep? –Instruction memory is too shallow?

32 Computing Device Composition –Bit Processing elements –Interconnect space time –Instruction Memory

33 Computing Device Composition Tile basic building blocks into array

34 Model Area

35 Spot Check Area FPGA –w=1, d=c=1, k=4 880K  –Xilinx 4K 630K  –Altera 8K 930K  SIMD –w=1000, c-0, d-64, k=3 170K  –Abacus 190K  Processor –w=32, d=32, c=1024, k=2 2.6M  –MIPS-X 2.1M 

36 Peak Densities from Model

37 Empirical Peak Densities

38 Efficiency What do we want to maximize? –Useful work per unit silicon Yield Fraction / Area (or minimize (Area/Yield) )

39 Efficiency For comparison, look at relative efficiency to ideal. Ideal = architecture exactly matched to application requirements Efficiency = A ideal /A arch A arch = Area Op/Yield

40 Efficiency Calculation

41 Efficiency with fixed Width w=1, 16K PEs

42 FPGA vs. DPGA Compare Robust point: when context memory area is 1/2 of total area

43 Robust Point depend on Width w=1 w=8 w=64

44 Processors and FPGAs FPGA c=d=1, w=1, k=4 “Processor” c=d=1024, w=64, k=2

45 Intermediate Architecture w=8 c=64 16K PEs

46 Caveats Model abstracts away many details which are important –interconnect –control –specialized functional units Applications are a heterogeneous mix of characteristics

47 Modeling Message Architecture space is huge Easy to be very inefficient Hard to pick one point robust across entire space

48 Summary Instruction resources can be significant Exploit application structure to reduce instruction requirements Classify architectures by instruction organization Arch. structure and application requirements mismatch => inefficiencies Model => visualize efficiency trends


Download ppt "CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices."

Similar presentations


Ads by Google