Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictable Programming on a Precision Timed Architecture Hiren D. Patel UC Berkeley Joint work with: Ben Lickly, Isaac Liu, Edward.

Similar presentations


Presentation on theme: "Predictable Programming on a Precision Timed Architecture Hiren D. Patel UC Berkeley Joint work with: Ben Lickly, Isaac Liu, Edward."— Presentation transcript:

1 Predictable Programming on a Precision Timed Architecture Hiren D. Patel UC Berkeley hiren@eecs.berkeley.edu Joint work with: Ben Lickly, Isaac Liu, Edward A. Lee - UC Berkeley Sungjun Kim, Stephen A. Edwards - Columbia University

2 Patel, UC Berkeley, PRET2 Edwards and Lee - Case for PRET 2007 – Edwards and Lee made a case for precision timed computers (PRET machines) –Predictability –Repeatability S. A. Edwards and E. A. Lee, The case for the precision timed (PRET) machine. In Proceedings of the 44th Annual Conference on Design Automation (San Diego, California, June 04 - 08, 2007). DAC '07. ACM, New York, NY, 264-265. 2

3 Patel, UC Berkeley, PRET3 Edwards and Lee - Case for PRET Unpredictability –Difficulty in determining timing behavior through analysis Non-repeatability –Lack of guarantee that every execution yields the same timing behavior Brittleness –Small changes have big effects on timing behavior 3

4 Patel, UC Berkeley, PRET4 Brittleness Expensive affair Tight coupling of software and hardware Reliance on testing for validation Upgrading difficult Solution: stockpile 4 Source: www.skycontrol.netwww.skycontrol.net

5 Patel, UC Berkeley, PRET5 But wait … Real-time scheduling –Worst-case execution time Detailed model of hardware Large engineering effort Valid for particular hardware models –Interrupts, inter- process communication, locks … Bench testing –Brittle 5 Sebastian Altmeyer, Christian Hümbert, Björn Lisper, and Reinhard Wilhelm. Parametric Timing Analysis for Complex Architectures. In Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'08), pages 367-376, Kaohsiung, Taiwan, August 2008. IEEE Computer Society.

6 Patel, UC Berkeley, PRET6 Precise Timing and High Performance 6 TraditionalAlternative CachesScratchpads Deep out-of-order pipelinesThread-interleaved pipelines Function-only ISAsISAs with timing instructions Function-only languagesLanguages and programming models with timing Best-effort communicationFixed-latency communication Time-sharingMultiple independent processors

7 Patel, UC Berkeley, PRET7 Outline Introduction Related Work PRET Machine Programming Example Future Work Conclusion 7

8 Patel, UC Berkeley, PRET8 Related Work Java Optimized Processor –Schoeberl et al. [2003] Timing instructions –Ip and Edwards [2006] Reactive processors –Von Hanxleden et al. [2005] –Salcic et al. [2005] Virtual Simple Architecture –Mueller et al. [2003] 8

9 Patel, UC Berkeley, PRET9 9 Semantics of Timing Instructions Deadline instructions –Denote the required execution time of a block When decoded –Stall instruction if timer value is not 0 –Otherwise set timer value to new value deadi $t0, 10 … deadi $t0, 8 … deadi $t0, 0 … L0: … deadi $t0, 10 b L0 … Straight Line Block 0 Straight Line Block 1 Loop Block

10 Patel, UC Berkeley, PRET10 Tracing A Program Fragment A: deadi $t0, 6 B: sethi %hi(0x3f800000), %g1 C: or %g1, 0x200, %g1 D: st %g1, [ %fp + -12 ] E: deadi $t0, 8 F: … cycle 0 6 5 4 32 1 08 $t0

11 Patel, UC Berkeley, PRET11 Precision Timed Architecture Thread-interleaved pipeline Scratchpad memories Time-triggered main memory access Round-robin thread scheduling

12 Patel, UC Berkeley, PRET12 Memory Hierarchy Clocks –Main clock –Derived clocks Instruction and data scratchpad memories –1 cycle access latency Main memory –16MB size –Latency of 50ns –Frequency:250Mhz ~13 cycles latency 12 Core Main Mem. Main Mem. SPM DMA

13 Patel, UC Berkeley, PRET13 Thread-interleaved Pipeline Thread stalls –Main memory access –Multi-cycle operations –Deadline instructions Replay mechanism –Execute same PC next iteration –Multi-cycle ALU ops replay instructions 13 Fetch Decode Reg. Access Execute Memory WriteBack F/D D/R R/E E/M M/W Decrement Deadline Timers Stall if Deadline Instruction Increment PC Check main memory access

14 Patel, UC Berkeley, PRET14 Time-Triggered Access through Memory Wheel Decouple thread’s access pattern Time-triggered access Best-case access time –If accessed 1st cycle Worst-case access time –If accessed 2nd cycle of window 14

15 Patel, UC Berkeley, PRET15 Tool Flow GCC 3.4.4, SystemC 2.2, Python 2.4 Boot codeMotorola SREC files C programs timing instructions GCC to compile boot code and program code

16 Patel, UC Berkeley, PRET16 Simple Mutual Exclusion Example Producer followed by Consumer and Observer –Consumer and Observer execute together Loop rate of two rotations of memory wheel –1 st for Producer to write –2 nd Consumer and Observer to read 16 Write to shared data Read from shared data Write to output

17 Patel, UC Berkeley, PRET17 Video Game Example Graphi c Thread VGA- Driver Thread Even Buffer Odd Buffer Main- Control Thread Odd Queue Even Queue Command Pixel Data Swap (When Sync Requested and When Odd Queue Empty) Sync (After queue swapped) Update Screen (Sync request) Sync (After buffer swapped) Refresh (Sync request) Swap (When sync requested and when Vertical blank)

18 Patel, UC Berkeley, PRET18 Timing Requirements 18 SignalTiming Requirement Pixel Cycles V. Sync64µs1611 V. Back-porch1.02ms25679 Draw 480 lines15.25ms V. Front-porch350µs8811 H. Sync3.77µs96 H. Back-porch1.89µs48 Draw 640 pixels25.42µs H. Front-porch0.64µs16

19 Patel, UC Berkeley, PRET19 Timing Implementation Pixel-clock using derived clock –25.175Mhz –~ 39.72ns cycle period Drawing 16 pixels 19

20 Patel, UC Berkeley, PRET20 Future Work Architecture –DMA –DDR2 main memory model –Thread synchronization primitives –Shared data between threads Real-time Benchmarks –With timing requirements Programming models –Memory allocation schemes –Synchronizations

21 Patel, UC Berkeley, PRET21 Conclusion What we want … –Time as a first class citizen of embedded computing –Predictability –Repeatability Where we are at … –PRET cycle-accurate simulator –Release …

22 Patel, UC Berkeley, PRET22

23 Patel, UC Berkeley, PRET23 Extras

24 Patel, UC Berkeley, PRET24 More on Brittleness Small changes may have big effects on timing behavior Theorem (Richard’s anomalies): If a task set with fixed priorities, execution times, and precedence constraints is optimally scheduled on a fixed number of processors, then increasing the number of processors, reducing execution times, or weakening precedence constraints can increase the schedule length. Richard L. Graham, “Bounds on the performance of scheduling algorithms”, in E. G. Coffman, Jr.(ed.), Computer and Job-Shop Scheduling Theory, John Wiley, New York, 1975.

25 Patel, UC Berkeley, PRET25 Richard’s Anomalies 1 9 2 5 3 6 4 7 T1/3T2/2T3/2T4/2 T9/9T5/4T6/4T7/4 8 T8/4 0312 9 tasks, 3 processors, priority list, precedence order, execution times.

26 Patel, UC Berkeley, PRET26 eTime’ = eTime - 1 Richard’s Anomalies: Reducing Execution Times 1 9 2 5 3 6 4 7 T1/2T2/1T3/1T4/1 T9/8T5/3T6/3T7/3 8 T8/3 0312

27 Patel, UC Berkeley, PRET27 Richard’s Anomalies: More Processors 1 9 2 5 3 6 4 7 T1/3T2/2T3/2T4/2 T9/9T5/4T6/4T7/4 8 T8/4 0312 4 processors 15

28 Patel, UC Berkeley, PRET28 Richard’s Anomalies: Changing Priority List 1 7 2 4 6 3 3 8 T1/3T2/2T3/2T4/2 T9/9T5/4T6/4T7/4 9 T8/4 0312 L = (T1,T2,T4,T5,T6,T3,T9,T7,T8)

29 Patel, UC Berkeley, PRET29 Brittleness Again… In general, all task scheduling strategies are brittle


Download ppt "Predictable Programming on a Precision Timed Architecture Hiren D. Patel UC Berkeley Joint work with: Ben Lickly, Isaac Liu, Edward."

Similar presentations


Ads by Google