Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.

Similar presentations


Presentation on theme: "SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John."— Presentation transcript:

1 SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John Wawrzynek U.C. Berkeley BRASS group

2 Outline Lecture 1 – Introduction – Related Work – SCORE Computational Model – Hardware Requirements – Language Instantiation Lecture 2 – Execution Example – SCORE Run-Time Environment – Example: JPEG – Results and Conclusion

3 Introduction Problem: Lack of unifying computational model which allows applications portability and longevity without sacrificing a substantial fraction of raw capabilities Solution: Stream based compute model. Divide computation into fixed “pages.” Time multiplex “pages” into hardware.

4 Introduction SCORE – Ease development, deployment, and range of RC applications – Efficient implementation maximizing resources

5 Introduction Current Issues? – Existing targets not portable Software for RC hardware tied to a particular device – Existing targets expose fixed resource limitations Impaired expressiveness Algorithms used restricted by available hardware No dynamic resource allocation Addressing Issues – Virtualize resources computations, communication, and memory resources – Convenient and efficient model

6 Introduction SCORE - Programming model is natural abstraction of communication between spatial, hardware blocks. Data flow communications graph captures the blocks of computation (operators) and the communication (streams) between them. Then capture and map to hardware efficiently

7 Related Work Villasenor et At circa 1995 – Motion-wavelet video coder – Hand-partitioning design into “pages” and manually reconfiguring each device Run on 1/3 as many machines Only experienced 10% overhead SCORE builds on: – Instruction Set Architecture, Data Flow, Disturbed and streaming computation models – PRISC, DISC, GARP

8 SCORE Computational Model Compute Model – Abstract model capturing essential semantics of computation Programming Model – Programming constructs providing convenient way to express computations in the compute model Execution Model – Low-level description of the computation and the semantics which the hardware is expected to provide when interpreting this description

9 Compute Model Graph of computation operators and memory blocks linked together by streams Streams – Provide node-to-node communication – Single source, single sink FIFO Queues Operators – Finite State Machine (FSM) node Interact via stream links – Turing Complete (TM) node Support resource allocation and stream operations

10 Compute Model Operations are fully deterministic – Determinism of individual operators – Timing independent communication – Operators cannot side-effect each other’s state 1. Communicate through streams which guarantee a timing independent order of execution 2. Memory segments have single unique owner (no multiple read-write hazards)

11 Programming Model Framework independent of device limits Guidelines for efficient execution on any hardware implementation Key Abstractions for Programming model – Operators – Streams – Memory Segments

12 Programming Model Operators – Represents an algorithmic transformation of input data to produce output data – Computation building blocks for computation (Multiplier, FIR, FFT) – Size of operator in hardware is implementation dependent, is not limited to programming model – Partitioning is integral part to automate the compilation process

13 Programming Model Streams – Communication uses streaming data flow – Producer connected to consumer via streams – Defines where data is logically routed – Acts as unbounded length queue for data tokens – Data Presence Signals Operators signal when producing data and consuming data

14 Programming Model Memory Segments – Contiguous block of memory – serves as the basic unit for memory management – used by giving a specific operating mode, then linking it into a data flow graph

15 Programming Model Dynamic Features – Dynamic rate operators Consume / produce tokens at data-dependent rates Efficient operators for tasks: – Data Compression (JPEG), decompression, searching, and filtering Scheduling decisions should be made at Run Time – Dynamic graph composition and instantiation Computational graphs can be created, extended or modified during execution – Dynamic handling of uncommon events (Exception Handling)

16 Execution Model 3 Key Components – Compute Page (CP) fixed size block of RC logic which is the basic unit of virtualization and scheduling – Memory Segment contiguous block of memory which is the basic unit for data page management – Stream Link logical connection between the output of one page and the input of another page

17

18

19 Hardware Virtualization Compute pages, segments, and streams fundamental units for – allocation – virtualization – management of hardware resources

20

21

22 Example of Stream Buffer Execution

23

24

25

26 Model Implications Advice for Programmers – Describe computations as spatial pipelines with multiple, independent computational paths – Avoid or minimize feedback cycles – Expose large data streams to SCORE operators

27 Hardware Requirements Sequential Processor and RC device RC Device divided into a number of equivalent and independent compute pages Multiple distributed memory blocks required to store intermediate data High bandwidth, Low Latency communication, among compute pages and memory, allowing memory pages to be used concurrently

28

29 Language Instantiation One could define – subsets of conventional HDLs – subsets of conventional programming languages (C++, Java) Instead they define – RTL language to describe SCORE operators TDF: Intermediate language

30 Language Requirements SCORE Operators are synchronous, single clock entities with their own state – Communicate only through designed I/O streams – Operation is gated by data presence on the I/O streams – Each operation is viewed as a FSM with associated Data Path SCORE does not have a global shared memory abstraction among operators – Remember memory segments (no two operators can share memory at same time)

31 TDF RTL Description with special syntax for handling input and output data dreams from the operator – Data Path operators similar to C To allow dynamic operators, basic form is FSM – Each State specifies the inputs which must be present before it can “fire” – When input arrives, operator consumes the inputs and the FSM may choose to change states

32

33

34

35

36 END PART 1 Tune in next week for exciting examples

37 Execution Example Reference Figure 16 – Shows example of C++ program which uses the merge and uniq operators * SCORE operator instantiation and composition can be performed from C++ code

38 Example - Assumptions Design consists of 3 behavioral operators – Fully implementation of each operator requires only one compute page The RC array contains one compute page and three configurable memory blocks – Each CMB partitioned into 4 segments (s0 - s3) s0 and s1 buffer computation data s2 and s3 store state / configuration for a compute page

39 Example - Assumptions CMB state maintained by controller – Details are not shown in this example Each compute page has 2 input 2 output FIFO buffers Scheduling and array reconfiguration are performed at the beginning of each timeslice

40 Execution Example Physical view of array at each point in timeline Single Letter identifiers assigned – A: merge (inputs i0, i1) – B: merge (inputs t1, t2) – C: uniq – Segments: S0, S1

41 Timeline for Execution Example

42 Step-by-Step Execution Example

43

44

45

46

47

48

49 SCORE Run-Time Environment Building Applications Run-Time Environment

50 Example: JPEG

51 Conclusion

52 Figure 18

53 Figure 19

54 Figure 20

55 Table 2

56 Figure 21

57 Figure 4


Download ppt "SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John."

Similar presentations


Ads by Google