F A S T Frequency-Aware Static Timing Analysis

Slides:



Advertisements
Similar presentations
Feedback EDF Scheduling Exploiting Dynamic Voltage Scaling Yifan Zhu and Frank Mueller Department of Computer Science Center for Embedded Systems Research.
Advertisements

Xianfeng Li Tulika Mitra Abhik Roychoudhury
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
Real- time Dynamic Voltage Scaling for Low- Power Embedded Operating Systems Written by P. Pillai and K.G. Shin Presented by Gaurav Saxena CSE 666 – Real.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Tightening the Bounds on Feasible Preemption Points.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
1 Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University Frank Mueller Timing Analysis: In Search of.
NC STATE UNIVERSITY Anantaraman © 2004RTSS–25 Enforcing Safety of Real-Time Schedules on Contemporary Processors using a Virtual Simple Architecture (VISA)
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
CprE 458/558: Real-Time Systems
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.
A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.
ParaScale : Exploiting Parametric Timing Analysis for Real-Time Schedulers and Dynamic Voltage Scaling Sibin Mohan 1 Frank Mueller 1,William Hawkins 2,
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
Scheduling policies for real- time embedded systems.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
1 Estimating the Worst-Case Energy Consumption of Embedded Software Ramkumar Jayaseelan Tulika Mitra Xianfeng Li School of Computing National University.
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
NC STATE UNIVERSITY 1 Feedback EDF Scheduling w/ Async. DVS Switching on the IBM Embedded PowerPC 405 LP Frank Mueller North Carolina State University,
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Bounding Worst-Case Data Cache Behavior by Analytically.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Computer Organization CS224 Fall 2012 Lessons 41 & 42.
Real-time aspects Bernhard Weirich Real-time Systems Real-time systems need to accomplish their task s before the deadline. – Hard real-time:
Yifan Zhu, Frank Mueller North Carolina State University Center for Efficient, Secure and Reliable Computing DVSleak: Combining Leakage Reduction and Voltage.
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Bounding Preemption Delay within Data Cache Reference.
CprE 458/558: Real-Time Systems (G. Manimaran)1 Energy Aware Real Time Systems - Scheduling algorithms Acknowledgement: G. Sudha Anil Kumar Real Time Computing.
Static Timing Analysis
Sunpyo Hong, Hyesoon Kim
Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental.
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
CHaRy Software Synthesis for Hard Real-Time Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Ioannis E. Venetis Department of Computer Engineering and Informatics
EEE 6494 Embedded Systems Design
5.2 Eleven Advanced Optimizations of Cache Performance
Flavius Gruian < >
CSCI1600: Embedded and Real Time Software
Improved schedulability on the ρVEX polymorphic VLIW processor
Address-Value Delta (AVD) Prediction
Aravindh Anantaraman*, Kiran Seth†, Eric Rotenberg*, Frank Mueller‡
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
FAST: Frequency-Aware Static Timing Analysis
CSCI1600: Embedded and Real Time Software
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
Presentation transcript:

F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems Research Departments of CS & ECE North Carolina State University

Real-Time Systems Tasks have a deadline  must terminate on time Classification Hard Real-time: missed deadline  catastrophe Soft Real-time: missed deadline  low QoS. Multi-tasking real-time systems require scheduling algorithms  Scheduler ensures task arbitration online Schedulability test ensures met deadlines (static test) requires known Worst-Case Execution Time (WCET)

Static Timing Analysis To schedule tasks in Real-time systems, need Worst-case Execution Time (WCET) and Worst-case Execution Cycles (WCEC) Experimental WCET  unsafe bounds Due to input & hardware complexity Use static timing analysis toolset to obtain safe WCET bounds

Static Instruction Cache Analysis Work explained in [Mueller RTS-J’00] Interprocedural data-flow analysis Predicts each cache reference as one of always-hit always-miss first-hit first-miss Each instruction categorized for each loop level and function (loop w/ 1 iteration)

Static Data Cache Simulation For accurate static timing analysis need data cache analysis Currently, data cache analysis tool not accurate enough Too many restrictions, not general enough for real code Improvements by [Vera RTSS’03] Solutions  All data accesses hits… highly underestimated. All data accesses misses… highly overestimated. Assume big enough cache to fit all data set Assume first-time accesses as misses (cold misses, only), o/w hits Accurate? Yes. But what is caches smaller? No significant impact on this study

Static Timing Analyzer Path & tree-based approach [Healy IEEE TC’99] Find nodes in the CFG and derive WCEC for each node A node is a function or loop WCET is calculated bottom-up Standard timing analysis assumptions apply  No recursion All loop bounds must be known No function pointers

Motivation of FAST Dynamic Voltage Scaling (DVS) scheduling schemes Change frequency/voltage for system save power without missing deadlines Several DVS scheduling schemes available Good fit for real-time systems Most real-time systems have low utilization are low-power embedded systems Potential for considerable energy savings with DVS

Problem Current DVS schemes: Ignore effects of frequency scaling on WCEC DVS schemes assume: WCEC constant with frequency Overestimate WCET at lower frequencies To demonstrate the problem WCET of C-Lab benchmark  static timing analysis tool For frequencies 100MHz – 1GHz Assess observed WCEC & WCET vs. assumption made by DVS schemes

Actual vs. Assumed WCEC for FFT WCEC changes with frequency modulation WCEC increases with higher frequency Constant memory latency: 100ns

Actual vs. Assumed WCET for FFT Difference in chosen frequency for DVS w/ WCET=5ms assumed: ~ 550 MHz actual: ~ 150 MHz

Parametric Frequency Model Problem: DVS Considers processor frequency scaling Ignores effect of frequency scaling on memory accesses With frequency scaling: Cycles for processor operations remains constant Except for memory operations  problem DVS schemes overestimate the WCET at lower frequencies Cannot fully utilize available slack Power savings potential largely wasted

Parametric Frequency Model Solution: Calculate WCEC accounting for effects of memory accesses using the new parametric frequency model Model: WCEC(f) = i + mN = i + mLf i: Invariant # of worst-case cycles (for non-memory operations) m: # of worst-case memory accesses N: # of cycles per memory access depends on memory latency L and frequency f: N = Lf

Using the Parametric Frequency Model A: add R2, R1, R3 B: load R4, [M1] C: add R2, R1, R4 D: add R2, R1, R5 Instruction sequence simulated through simple pipeline explain parametric frequency model Simple pipeline: 6 stages Data & instruction cache N = 10

Example 0: Cache Hits Recall: B is load instruction WCEC = 9 + 0N Each row represents pipeline stage. Time (and cycle count) increases horizontally.

Example 1: Effect of I-cache miss WCEC = 9 + 1N Stall due to I-cache miss is shown Model accurately captures memory latency, however long

Example 2: Effect of D-cache miss Recall: B is load instruction WCEC = 9 + 1N Stall due to D-cache miss is shown Again, model captures memory latency, however long Notice: during stall cycles, no useful work is done

Example 3: Effect of I- & D-cache Miss WCEC = 9 + 2N I-cache miss first, then D-cache miss Overlap between useful cycles & stall cycles Also during high-latency execution operations E.g. floating-point, multiply, …  overlap w/ D-cache miss Leads to overestimation  in practice rare, still safe WCET

Experimental Validation Combine frequency model with our static timing analyzer FAST tool WCEC  FAST equations Experiment to validate results from FAST tool Run benchmarks through FAST tool An equation representing WCEC for benchmark obtained Run same benchmarks through traditional timing analysis tool Vary frequencies: 100MHz-1GHz

Frequency-Aware Static Timing Analysis (FAST) FAST tool  “as accurate” as traditional static timing analysis Slight overestimation in case of floating-point benchmarks

FAST in EDF Scheduling with DVS DVS with EDF:  Ck/Pk , where =fc/fm FAST with EDF:  (ik+mkLfm)/Pkfm   Schedulability test:  (ik/Pk) / fm (1 - L mk/Pk)   Implemented frequency model for 3 EDF-DVS algorithms Algorithms by [Pillai & Shin] Look-ahead improved: @ completion, consider next deadline up to 34% additional energy savings (5-11% on avg.), low U but 0.5-8% less savings at high utilization

Improving DVS schemes Use parametric frequency model to improve DVS schemes provide accurate WCET Improved energy savings Architectural Simulator: SimpleScalar+Wattch [Brooks ISCA’00] 6-stage simple in-order pipeline processor model I-cache and D-cache (8KB each) Run 4-8 tasks simultaneously (scheduler runs as its own task) More accurate than E ~ V2f model ? Results newer than paper

Static RT-DVS vs. FAST Static RT-DVS Base case: EDF Tasks at 1GHz Idle: 100MHz no sleep mode  small task periods tasksets 1: integer 2: float 3: mix High: 0.9 utilization Low: 0.5 Static scheme better than base EDF  12-60% energy savings FAST-Static even better  40-78% savings high + lower utilization

Cycle-conserving RT-DVS vs. FAST cycle-conserving RT-DVS dynamic scheduling  early completion, reclaimed as slack Cycle-conserving  57-72% energy savings FAST  71-80% savings

FAST Look-ahead RT-DVS Look-ahead RT-DVS vs. FAST Look-ahead RT-DVS most aggressive DVS: early completion + max. deferral Look-ahead: slightly higher savings than cycle-conserving @ 68-80% FAST: slightly better in most cases @ 72-83%

Look-ahead RT-DVS vs. FAST Look-ahead RT-DVS E ~ V2f model Higher savings: up to 96% ? Ratio look-ahead / FAST similar Wattch detailed power model Probably more accurate

Conclusion Energy savings in real-time systems can be significantly improved by considering the effects of frequency scaling on WCET FAST + Static RT-DVS as good as Look-Ahead RT-DVS less overhead The parameterized frequency model can easily track effects of frequency scaling on WCET FAST tool works best when  Many cache misses If D-cache analysis is highly inaccurate (usually true) FAST can make up for it High memory latency Insufficient dynamic slack reclaiming (during DVS scheduling) Integrated into real-time hardware support [VISA ISCA’03]

BACKUP SLIDES

The V2f model

Old DVS Scheduling Simulator Event based simulator of scheduler. Have to assume miss rate for the tasks in dynamic schemes. Uses E ~ V2f energy model. Gives a good idea about savings, BUT accurate ??

Static RT-DVS vs. FAST Static RT-DVS

Cycle-conserving RT-DVS vs. FAST cycle-conserving RT-DVS

Look-ahead RT-DVS vs. FAST Look-ahead RT-DVS

DVS schemes (Pillai & Shin) Static RT-DVS – Uses static slack available in the schedule. Cycle-conserving RT-DVS – Uses static slack + dynamic slack due to early completion. Look-ahead RT-DVS – Uses static slack + dynamic slack due to early completion + latest possible scheduling (look-ahead).

Complexity Original EDF test  O(n) Modified EDF test  still O(n)