Presentation is loading. Please wait.

Presentation is loading. Please wait.

F A S T Frequency-Aware Static Timing Analysis

Similar presentations


Presentation on theme: "F A S T Frequency-Aware Static Timing Analysis"— Presentation transcript:

1 F A S T Frequency-Aware Static Timing Analysis
By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems Research Departments of CS & ECE North Carolina State University

2 Real-Time Systems Tasks have a deadline  must terminate on time
Classification Hard Real-time: missed deadline  catastrophe Soft Real-time: missed deadline  low QoS. Multi-tasking real-time systems require scheduling algorithms  Scheduler ensures task arbitration online Schedulability test ensures met deadlines (static test) requires known Worst-Case Execution Time (WCET)

3 Static Timing Analysis
To schedule tasks in Real-time systems, need Worst-case Execution Time (WCET) and Worst-case Execution Cycles (WCEC) Experimental WCET  unsafe bounds Due to input & hardware complexity Use static timing analysis toolset to obtain safe WCET bounds

4 Static Instruction Cache Analysis
Work explained in [Mueller RTS-J’00] Interprocedural data-flow analysis Predicts each cache reference as one of always-hit always-miss first-hit first-miss Each instruction categorized for each loop level and function (loop w/ 1 iteration)

5 Static Data Cache Simulation
For accurate static timing analysis need data cache analysis Currently, data cache analysis tool not accurate enough Too many restrictions, not general enough for real code Improvements by [Vera RTSS’03] Solutions  All data accesses hits… highly underestimated. All data accesses misses… highly overestimated. Assume big enough cache to fit all data set Assume first-time accesses as misses (cold misses, only), o/w hits Accurate? Yes. But what is caches smaller? No significant impact on this study

6 Static Timing Analyzer
Path & tree-based approach [Healy IEEE TC’99] Find nodes in the CFG and derive WCEC for each node A node is a function or loop WCET is calculated bottom-up Standard timing analysis assumptions apply  No recursion All loop bounds must be known No function pointers

7 Motivation of FAST Dynamic Voltage Scaling (DVS) scheduling schemes
Change frequency/voltage for system save power without missing deadlines Several DVS scheduling schemes available Good fit for real-time systems Most real-time systems have low utilization are low-power embedded systems Potential for considerable energy savings with DVS

8 Problem Current DVS schemes:
Ignore effects of frequency scaling on WCEC DVS schemes assume: WCEC constant with frequency Overestimate WCET at lower frequencies To demonstrate the problem WCET of C-Lab benchmark  static timing analysis tool For frequencies 100MHz – 1GHz Assess observed WCEC & WCET vs. assumption made by DVS schemes

9 Actual vs. Assumed WCEC for FFT
WCEC changes with frequency modulation WCEC increases with higher frequency Constant memory latency: 100ns

10 Actual vs. Assumed WCET for FFT
Difference in chosen frequency for DVS w/ WCET=5ms assumed: ~ 550 MHz actual: ~ 150 MHz

11 Parametric Frequency Model
Problem: DVS Considers processor frequency scaling Ignores effect of frequency scaling on memory accesses With frequency scaling: Cycles for processor operations remains constant Except for memory operations  problem DVS schemes overestimate the WCET at lower frequencies Cannot fully utilize available slack Power savings potential largely wasted

12 Parametric Frequency Model
Solution: Calculate WCEC accounting for effects of memory accesses using the new parametric frequency model Model: WCEC(f) = i + mN = i + mLf i: Invariant # of worst-case cycles (for non-memory operations) m: # of worst-case memory accesses N: # of cycles per memory access depends on memory latency L and frequency f: N = Lf

13 Using the Parametric Frequency Model
A: add R2, R1, R3 B: load R4, [M1] C: add R2, R1, R4 D: add R2, R1, R5 Instruction sequence simulated through simple pipeline explain parametric frequency model Simple pipeline: 6 stages Data & instruction cache N = 10

14 Example 0: Cache Hits Recall: B is load instruction WCEC = 9 + 0N
Each row represents pipeline stage. Time (and cycle count) increases horizontally.

15 Example 1: Effect of I-cache miss
WCEC = 9 + 1N Stall due to I-cache miss is shown Model accurately captures memory latency, however long

16 Example 2: Effect of D-cache miss
Recall: B is load instruction WCEC = 9 + 1N Stall due to D-cache miss is shown Again, model captures memory latency, however long Notice: during stall cycles, no useful work is done

17 Example 3: Effect of I- & D-cache Miss
WCEC = 9 + 2N I-cache miss first, then D-cache miss Overlap between useful cycles & stall cycles Also during high-latency execution operations E.g. floating-point, multiply, …  overlap w/ D-cache miss Leads to overestimation  in practice rare, still safe WCET

18 Experimental Validation
Combine frequency model with our static timing analyzer FAST tool WCEC  FAST equations Experiment to validate results from FAST tool Run benchmarks through FAST tool An equation representing WCEC for benchmark obtained Run same benchmarks through traditional timing analysis tool Vary frequencies: 100MHz-1GHz

19 Frequency-Aware Static Timing Analysis (FAST)
FAST tool  “as accurate” as traditional static timing analysis Slight overestimation in case of floating-point benchmarks

20 FAST in EDF Scheduling with DVS
DVS with EDF:  Ck/Pk , where =fc/fm FAST with EDF:  (ik+mkLfm)/Pkfm   Schedulability test:  (ik/Pk) / fm (1 - L mk/Pk)   Implemented frequency model for 3 EDF-DVS algorithms Algorithms by [Pillai & Shin] Look-ahead improved: @ completion, consider next deadline up to 34% additional energy savings (5-11% on avg.), low U but 0.5-8% less savings at high utilization

21 Improving DVS schemes Use parametric frequency model to improve DVS schemes provide accurate WCET Improved energy savings Architectural Simulator: SimpleScalar+Wattch [Brooks ISCA’00] 6-stage simple in-order pipeline processor model I-cache and D-cache (8KB each) Run 4-8 tasks simultaneously (scheduler runs as its own task) More accurate than E ~ V2f model ? Results newer than paper

22 Static RT-DVS vs. FAST Static RT-DVS
Base case: EDF Tasks at 1GHz Idle: 100MHz no sleep mode  small task periods tasksets 1: integer 2: float 3: mix High: 0.9 utilization Low: 0.5 Static scheme better than base EDF  12-60% energy savings FAST-Static even better  40-78% savings high + lower utilization

23 Cycle-conserving RT-DVS vs. FAST cycle-conserving RT-DVS
dynamic scheduling  early completion, reclaimed as slack Cycle-conserving  57-72% energy savings FAST  71-80% savings

24 FAST Look-ahead RT-DVS
Look-ahead RT-DVS vs. FAST Look-ahead RT-DVS most aggressive DVS: early completion + max. deferral Look-ahead: slightly higher savings than 68-80% FAST: slightly better in most 72-83%

25 Look-ahead RT-DVS vs. FAST Look-ahead RT-DVS
E ~ V2f model Higher savings: up to 96% ? Ratio look-ahead / FAST similar Wattch detailed power model Probably more accurate

26 Conclusion Energy savings in real-time systems can be significantly improved by considering the effects of frequency scaling on WCET FAST + Static RT-DVS as good as Look-Ahead RT-DVS less overhead The parameterized frequency model can easily track effects of frequency scaling on WCET FAST tool works best when  Many cache misses If D-cache analysis is highly inaccurate (usually true) FAST can make up for it High memory latency Insufficient dynamic slack reclaiming (during DVS scheduling) Integrated into real-time hardware support [VISA ISCA’03]

27 BACKUP SLIDES

28 The V2f model

29 Old DVS Scheduling Simulator
Event based simulator of scheduler. Have to assume miss rate for the tasks in dynamic schemes. Uses E ~ V2f energy model. Gives a good idea about savings, BUT accurate ??

30 Static RT-DVS vs. FAST Static RT-DVS

31 Cycle-conserving RT-DVS vs. FAST cycle-conserving RT-DVS

32 Look-ahead RT-DVS vs. FAST Look-ahead RT-DVS

33 DVS schemes (Pillai & Shin)
Static RT-DVS – Uses static slack available in the schedule. Cycle-conserving RT-DVS – Uses static slack + dynamic slack due to early completion. Look-ahead RT-DVS – Uses static slack + dynamic slack due to early completion + latest possible scheduling (look-ahead).

34 Complexity Original EDF test  O(n) Modified EDF test  still O(n)


Download ppt "F A S T Frequency-Aware Static Timing Analysis"

Similar presentations


Ads by Google