Presentation is loading. Please wait.

Presentation is loading. Please wait.

StreamIt on Raw StreamIt Group: Michael Gordon, William Thies, Michal Karczmarek, David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb, Ali S. Meli, Chris.

Similar presentations


Presentation on theme: "StreamIt on Raw StreamIt Group: Michael Gordon, William Thies, Michal Karczmarek, David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb, Ali S. Meli, Chris."— Presentation transcript:

1 StreamIt on Raw StreamIt Group: Michael Gordon, William Thies, Michal Karczmarek, David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb, Ali S. Meli, Chris Leger, Sam Larsen, and Saman Amarasinghe MIT Laboratory for Computer Science MIT Computer Architecture Workshop September 19, 2002

2 Von Neumann Languages Why C (FORTRAN, C++ etc.) became very successful? –Abstracted out the differences of von Neumann machines Register set structure Functional units and capabilities Pipeline depth/width Memory/cache organization –Directly expose the common properties Single memory image Single control-flow A clear notion of time –Can have a very efficient mapping to a von Neumann machine Today von Neumann languages are a curse!

3 StreamIt: A Spatially-Aware Language A language for streaming applications –Provides high-level stream abstraction A filter is the autonomous unit of computation. Breaks the von Neumann language barrier –Each filter has its own PC –Each filter has its own address space –No global time –Explicit data movement between filters

4 The Filter A filter communicates using FIFO channels, with the following operations: –pop(): dequeue the bottom item from the incoming channel. –peek(index): return the value at position index without dequeuing it. –push(value): enqueue value on the outgoing channel. The pop, peek, and push rate for each firing of a filter must be statically determined. Each filters contains: –An initialization function –A steady-state “work” function

5 StreamIt Language A collection of filters connected by channels. Structured Streams –Streaming applications have structure, not a free-form graph. –Use a few constructs: pipeline, splitjoin and feedback –Hierarchical composition –Intuitive textual representation –Greatly simplify compiler analysis

6 Hierarchical Structures pipeline –Sequential composition of streams splitjoin –Parallel composition of streams feedback loop –Cyclic composition of streams

7 Compiler Flow Summary Kopi Front-End StreamIt Code SIR Conversion Parse Tree SIR (unexpanded) SIR (expanded) Graph Expansion Partitioning Layout Communication Scheduler Code Generation Load-balanced Stream Graph Filters assigned to Raw tiles Switch Code Processor Code

8 Partitioning Goal: Granularity of the stream graph should match the target architecture. For Raw, we want the number of filters in the stream graph to equal the number of tiles. The final stream graph needs to be load balanced. Partitioning is currently driven by a simple greedy algorithm. Two primary transformations: –Fission –Fusion ?

9 Partitioning - Fission Fission - splitting streams –Duplicate a filter, placing the duplicates in a splitjoin to expose parallelism. Filter Joiner Splitter … – Split a filter into a pipeline for load balancing. FilterFilter0Filter1FilterN …

10 Partitioning - Fusion Fusion - merging streams –Reduce the number of filters in a construct for load balancing and synchronization removal. Filter FilterNFilter0 Joiner Splitter … FilterFilter0Filter1FilterN …

11 Partitioning Example (Sort) 242 Filters16 Filters

12 Layout Goal: To assign each filter to exactly one Raw tile. The layout algorithm is implemented using Simulated Annealing. The cost function (energy) tries to measure the added synchronization imposed by the layout. Want to avoid: –Crossed routes –Routes passing through tiles assigned to filters Because of the static properties of StreamIt, exact communication properties of the stream graph are known at compile time. –Cost function is quite accurate –Leads to excellent layouts

13 Layout Example (FFT) Partitioned Stream Graph Zero-cost layout

14 Layout Example (Radio) Partitioned Stream Graph Best layout

15 Routing At this time, data items are routed using a simple dimension-ordered router. The router traces the path from source to destination by first routing the Y dimension and then the X dimension. All items are sent over the first static network. The second static network and the dynamic network are unused.

16 Communication Scheduling The communication scheduler maps StreamIt’s channel abstraction to Raw’s static network. The communication scheduler simulates the execution of a given schedule, recording the communication as it simulates. –Assume that each filter fires instantaneously. –Record the routing instruction for the source, destination, and intermediate hops.

17 Code Generation For the compute-processor, we generate C code that is compiled using Raw's GCC port. We introduce an internal buffer for each filter. –The buffer is necessary because of the peek operation. –All items are received into this buffer. Loop “work” function infinitely in steady-state: –Each filter buffers its input until it has peek items in its buffer, then it fires. –pop() and peek(index) are reads from the buffer. –A push(value) is a static network send.

18 Results We have detailed performance measurements over our 9 benchmarks in our upcoming ASPLOS paper, but we will not give them here. –This is our initial implementation and we are working on optimizations. But the results show that we are not communication limited. –We need to focus on optimizing the generated compute- processor code. In the following slides we give a comparison of StreamIt and C code for our benchmarks.

19 Speedup Over Single Tile –For Radio we obtained the C implementation from a 3 rd party –For FIR, Sort, FFT, Filterbank, and 3GPP we wrote the C implementation following a reference algorithm.

20 Intel ® Xeon TM Comparison 0 2 4 6 8 10 12 14 16 FIRRadarRadioSortFFTFilterbankGSMVocoder3GPP Throughput / cycle normalized to a Xeon @ 2.2GHz Sequential C program on 1 tile StreamIt program on 16 tiles 37 –For Radio, GSM, and Vocoder we obtained the C implementation from a 3 rd party –For FIR, Sort, FFT, Filterbank, Radar, and 3GPP we wrote the C implementation following a reference algorithm. –For Radar, GSM, and Vocoder the C implementation did not fit on a single Raw tile.

21 Conclusion First step toward a portable stream language for communication-exposed architectures. Future work: –Optimizing the implementation –Support more features of StreamIt Other cool StreamIt projects: –New syntax –DSP domain specific linear dataflow analysis and transformation. –Constrained scheduling

22 http://cag.lcs.mit.edu/streamit StreamIt Homepage For More Information William Thies, Michal Karczmarek, and Saman Amarasinghe, StreamIt: A Language for Streaming Applications, 2002 International Conference on Compiler Construction, Grenoble, France. To appear in the Springer-Verlag Lecture Notes on Computer Science. Michael I. Gordon, William Thies, et. al., A Stream Compiler for Communication-Exposed Architectures, Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October, 2002. Michael I. Gordon. A Stream-Aware Compiler for Communication-Exposed Architectures. S.M. Thesis, Massachusetts Institute of Technology, August 2002.


Download ppt "StreamIt on Raw StreamIt Group: Michael Gordon, William Thies, Michal Karczmarek, David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb, Ali S. Meli, Chris."

Similar presentations


Ads by Google