Presentation is loading. Please wait.

Presentation is loading. Please wait.

RICE UNIVERSITY ‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed.

Similar presentations


Presentation on theme: "RICE UNIVERSITY ‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed."— Presentation transcript:

1 RICE UNIVERSITY ‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed from papers at VT and Stanford.

2 RICE UNIVERSITY Motivation  ‘Stream’-based computing  what does it mean?  Not a well-defined term  ‘computation’ that uses flow of self-guided info.  ‘sequence of data’  Related to flow of data through architecture  Application to implementing wireless algorithms

3 RICE UNIVERSITY Outline  Stallion  reconfigurable computing at Virginia Tech  ‘stream’-based computing #1  Custom Configurable Machines (CCM)  Imagine  media processing at Stanford  ‘stream’-based computing #2  programmable architectures

4 RICE UNIVERSITY Stallion at VT  Wormhole Run-Time Reconfiguration (RTR)  coarse-grained structure  reconfiguration using ‘streams’

5 RICE UNIVERSITY ‘Stream’ packets A stream packet Stream flow through architecture

6 RICE UNIVERSITY Functional description of PE

7 RICE UNIVERSITY Stream module description 4 States: IDLE – reconf. in progress BUSY – doing work PROGRAM – load reconf. data PASS – meant for next module Need to output packet/cycle VALID – maintain sync. - set INVALID instead of wait states - strip information off stack

8 RICE UNIVERSITY Processing layer  Static section  configures the reconf. section  buffers data during reconf. & sends ‘IDLE’ packets  Reconf. Section  processing of the data done here  Higher layers convert algorithm to data and configuration patterns

9 RICE UNIVERSITY Cart before the horse Colt before the Stallion Colt architecture (also at VT) IFU Mesh – Mesh of interconnected func. units

10 RICE UNIVERSITY Stallion chip 16-bit data 4-control 3 3 4 4 2 2

11 RICE UNIVERSITY IFU mesh in Stallion Dash-line –- skip buses Can send operands over 1/more IFUs

12 RICE UNIVERSITY IFU details Only left input can do barrel shifting ALU based on LUT Control register – stores control information for reconfiguration Optional Delay Register - provides latency to synchronize path lengths of different pipeline streams Cond. unit Output control unit

13 RICE UNIVERSITY Radio testbed at VT Stallion

14 RICE UNIVERSITY Worm-hole routing  stream = worm architecture = holes  multiple, independent streams can wind their way through the chip simultaneously  parts of system can be processing, parts could be reconfiguring  GOAL: Layered Software Radio Architecture

15 RICE UNIVERSITY ‘Stream’ processing at Stanford  Speeding up media applications  Need lots of computations per memory reference  Lots of data and sub-word parallelism  Current GPP architectures do not have enough ALUs  ‘Stream’ processors to the rescue

16 RICE UNIVERSITY Special-purpose processors Fed by dedicated wires/memoriesLots (100s) of ALUs

17 RICE UNIVERSITY Care and feeding of ALUs Data Bandwidth Instruction Bandwidth Regs Instr. Cache IR IP ‘Feeding’ Structure Dwarfs ALU

18 RICE UNIVERSITY Architecture implications  Tremendous opportunities  media problems have lots of parallelism and locality  VLSI technology enables 100s of ALUs/chip (1000s soon) (in 0.18um 0.1mm 2 per integer adder, 0.5mm 2 per FP adder)  Challenging problems  locality - global structures won’t work  explicit parallelism - ILP won’t keep 100 ALUs busy  memory - streaming applications don’t cache well  Its time to try some new approaches

19 RICE UNIVERSITY Register file organization  Register files functions:  short term storage for intermediate results  communication between multiple function units  Global register files don’t scale with #ALUs  need more registers to hold more results (grows with #ALUs )  need more ports to connect all of the units (grows with #ALUs 2 )

20 RICE UNIVERSITY Register files dwarf ALUs

21 RICE UNIVERSITY Distributed register files  Distributed register files means:  not all functional units can access all data  each functional unit input/output no longer has a dedicated route from/to all register files

22 RICE UNIVERSITY Stream processing SAD Kernel Stream Input Data Output Data Image 1 convolve Image 0 convolve Depth Map  Little data reuse (pixels never revisited)  Highly data parallel (output pixels not dependent on other output pixels)  Compute intensive (60 operations per memory reference)

23 RICE UNIVERSITY Stream programming  Streams  Communication void main() { Stream a(256); Stream b(256); Stream c(256); Stream d(1024);... example1(a, b, c); example2(c, d);... }  Kernels  Computation KERNEL example1(istream a, istream b, ostream c) { loop_stream(a) { int ai, bi, ci; a >> ai; b >> bi; ci = ai * 2 + bi * 3; c << ci; }

24 RICE UNIVERSITY Stream Processor  Instructions are Load, Store, and Operate  operands are streams  Operate performs a compound stream operation  read elements from input streams  perform a local computation  append elements to output streams  repeat until input stream is consumed  (e.g., triangle transform)

25 RICE UNIVERSITY Imagine

26 RICE UNIVERSITY Arithmetic clusters

27 RICE UNIVERSITY Bandwidth hierarchy  VLIW clusters with shared control  41.2 32-bit operations per word of memory bandwidth 2GB/s32GB/s SDRAM Stream Register File ALU Cluster 544GB/s

28 RICE UNIVERSITY Conclusions  ‘Streams’ shown to be promising for reconfigurable computing  wireless may need reconfigurability  ‘Streams’ shown to be promising for media processing  wireless may have similar workloads  Important to understand pros and cons of different methodologies for good wireless architectures  Important to have the right tools


Download ppt "RICE UNIVERSITY ‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed."

Similar presentations


Ads by Google