Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore.

Similar presentations


Presentation on theme: "Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore."— Presentation transcript:

1 Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore

2 Why Timing Analysis? Timing guarantees for real time embedded system Real time scheduling: – –Worst case bound on execution time – –Tasks are guaranteed to be schedulable irrespective of inputs Tight bound to avoid idle processor cycles Extremely important for safety critical systems

3 Worst Case Execution Time (WCET) Maximum execution time of a program on a micro-architecture for all possible inputs Measurement – –Execute program for all inputs: impractical – –Execute program for selected inputs to get a lower bound on WCET (Observed WCET) Analysis – –Employ static analysis to compute an upper bound on WCET (Estimated WCET) Observed Actual Estimated

4 WCET Analysis Program path analysis [Shaw’89, Healy’98,..] – –All possible paths in program are not feasible Micro-architectural modeling – –Dynamically variable instruction execution time Cache, Pipeline [Li’99, Theiling’00, Schneider’99,..] Speculative execution (branch prediction) [Mitra’02] Combined modeling of cache + speculative execution

5 Speculative Execution No Speculative Execution No Speculative Execution Misprediction Misprediction Correct prediction Correct prediction B NT S Misprediction penalty

6 Cache + Speculation: Destructive Effect B NT S Cache Execution Cache Miss 1: Loading into cache from speculated path Cache Miss 2: Loading into cache from correct path NT & map to same cache block

7 Destructive Effect: Extra Cache Misses Cache miss penalty (CMP) along speculative path – –Fully masked by branch misprediction penalty (BMP) – –Partially masked by BMP wait for cache miss to be serviced before executing correct path Cache miss penalty along correct path due to fetch along speculative path BMP CMP BMP CMP

8 Cache + Speculation: Constructive Effect B N S Cache Execution Cache Miss 1: Loading into cache from speculated path Cache Hit: Correct block already loaded into cache & map to same cache block BS

9 How serious is the effect?

10 Technique: Integer Linear Programming Integrate program analysis and micro-architectural modeling in an ILP framework [Li and Malik 1995] Input: Input: –Control Flow Graph (CFG) of the program –User provided loop bounds, recursion depth etc. –Specification of micro-architecture Objective function: Execution time (maximized) Constraints – –Flow constraints from Control Flow Graph – –Constraints from micro-architectural modeling ILP formulation of instruction cache + speculative exec.

11 Objective Function WCET =  (cost B × count B + BMP x misprediction B + CMP x miss B + mp_delay B ) cost B × count B : Execution time of basic block B without cache miss and branch misprediction BMP x misprediction B : Penalty due to mispredictions CMP x miss B : Penalty due to cache misses – –Includes constructive and destructive effect of speculation along correct path mp_delay B : Penalty due to partially masked cache misses along speculative path (variable CMP)

12 Flow Constraints: Easy !! e s,1 + e 3,1 = count 1 = e 1,2 + e 1,4 e 1,2 + e 2,2 = count 2 = e 2,3 + e 2,2 e 2,3 + e 4_3 = count 3 = e 3,1 + e 3,E e 1_4 = count 4 = e 4,3 Loop bounds: e 2,2  100 e 3,1  10 B1 B3 Bounds count B Inflow = Basic Block Execution Count = Outflow Bound on maximum loop iterations B2B4

13 Other Constraints Branch misprediction constraints – –Bounds mispredictions B – –Details appeared in an earlier paper Timing Analysis of Embedded Software for Speculative Processors. T. Mitra, A. Roychoudhury and X. Li. In ACM Intl. Symposium on System Synthesis (ISSS) 2002 Instruction cache miss constraints Instruction cache miss constraints –Bounds miss [Li, Malik and Wolfe 1999] –Bounds miss B [Li, Malik and Wolfe 1999]

14 Modeling Cache-Speculation Interaction Modify instruction cache miss constraints to model Modify instruction cache miss constraints to model constructive/destructive effect of speculation along correct path Add additional constraints on mp_delay B : Penalty due to partially masked cache misses along speculative path

15 Modeling Instruction Cache B1B3 S E B1 B3 B2 Cache Conflict Graph p S_1 p 1_3 p 3_1 p 3_E Flow among blocks mapping to the same cache line p S_1 + p 3_1 = count 1 = p 1_3 miss 1 = p S_1 + p 3_1 B4

16 Constructive Effect of Speculation B1B3 B1 B3 B2 N T N T B4 T N Speculative Path Correct Path B3 (2,T) Miss Partially Masked CMP

17 Constructive Effect of Speculation B1B3 B1 B3 B2 N T N T B4 T N Speculative Path Correct Path B3 (2,T) Partially Masked CMP HitMiss miss 3 will decrease by the amount of flow between B3 (2,T) and B3

18 Destructive Effect of Speculation B2B4 B1 B3 B2 N T N T B4 T N Speculative Path Correct Path B4 (1,N) Miss Partially Masked CMP Hit miss 2 will increase by the amount of flow between B4 (1,N) and B2

19 General Flow Involving Extra Nodes n m (b,X) n1 b Case 1 Case 3 Case 2 X XX b b1 X XX Y Case 4 m1 (b,X) m (b,X ) m2 (b1,Y ) YY Case 2

20 Additional Constraints b B1 Bn BMP count (m i (b,X) ) = misprediction (b, X) -  miss (m k (b,X) ) k=1 i-1 CMP > BMP X XX mp_delay (b, X) =  miss (m k (b,X) ) × delay (m k (b,X) ) k=1 n delay (m i (b,X) ) = CMP – (BMP -  cost (m k (b, X) ) k=1 i-1 And some others …. B2

21 Benchmarks ProgramDescriptionPathsLoops matsumSummation of two 100 * 100 matricesS  matmultMultiplication of two 10 * 10 matricesS  isortInsertion sort of 100-element array bsearchBinary search of 100 element array fft1024-point Fast Fourier TransformS  fdctFast Discrete Cosine TransformS  dhry Dhrystone benchmarkS desData Encryption Standard whetWhetstone benchmarkS djpgDecompress 128 * 96 color JPG image

22 Experimental Methodology Observed WCET: simulation Observed WCET: simulation –SimpleScalar cycle-accurate architectural simulator –In-order exec, No pipeline, No Data Cache misses –Branch misprediction penalty = 5 cycles –Cache miss penalty = 10 cycles Estimated WCET: Prototype analyzer Estimated WCET: Prototype analyzer Input: benchmark in assembly code,  -arch parameters, loop bounds Input: benchmark in assembly code,  -arch parameters, loop bounds Output: ILP constraints Output: ILP constraints Feed the constraints to CPLEX: a commercial ILP solver Feed the constraints to CPLEX: a commercial ILP solver

23 Accuracy (Smaller Benchmarks) Program WCETRatioMisprediction Est/Obs Cache miss Est/Obs ObsEst matsum105K106K matmult25.1K25.6K isort48.6K48.8K bsearch fft fdct219K229K

24 Accuracy (Larger Benchmarks) Program WCETRatioMisprediction Est/Obs Cache miss Est/Obs ObsEst dhry218.6K232.5K des87.4K96.4K whet545.5K581.5K djpg44.9 M65.2 M

25 Scalability

26 Summary Micro-architectural modeling is crucial for tight estimation of Worst Case Execution Time (WCET) Micro-architectural modeling is crucial for tight estimation of Worst Case Execution Time (WCET) Existing methods typically focus on a single micro- architectural feature Existing methods typically focus on a single micro- architectural feature –Cache –Pipeline –Speculation A step towards combining micro-architectural features which effect each other A step towards combining micro-architectural features which effect each other –Cache misses/hits due to speculation


Download ppt "Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore."

Similar presentations


Ads by Google