Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis for Safety-Critical Real-Time Systems – WCET-aware program.

Similar presentations


Presentation on theme: "Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis for Safety-Critical Real-Time Systems – WCET-aware program."— Presentation transcript:

1 Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis for Safety-Critical Real-Time Systems – WCET-aware program optimizations Heiko Falk Embedded Systems/Real-Time Systems Ulm University, Germany Jan C. Kleinsorge Computer Science 12 TU Dortmund, Germany

2 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 2 / 49 Outline  WCET-aware Optimizations and Code Quality  Graph Coloring Register Allocation  Scratchpad Memory Allocation  Cache-aware Memory Content Selection  Cache Partitioning for Multi-task Real-time Systems  Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning

3 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 3 / 49 WCET-aware Optimizations and Code Quality WCET as objective function  Actual speed-up but also by enhancing analyzability  Side-effects of changes on timing hard to anticipate  Issuing just a single instruction can lead to uncertainty regarding:  Location, alignment, access pattern (cache), schedule (pipeline), branch prediction, etc. Code quality for WCET-aware optimizations  Avoid dynamic dispatch, excessive inflation and layout changes without being clear about its effects  In short: maintain predictability first

4 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 4 / 49 WCET-aware Optimizations and Systems (1) Semantics: Computation Layout Pipeline System AbstractionAbstraction Uncertainty Open parameters: + Ideally: just program input + Location: accesses to busses, memories + Order, registers + Dependencies Expressions Insn (virt.) Insn (phys.) “BLOB”

5 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 5 / 49 WCET-aware Optimizations and Systems (2) Practical heuristics to pick the right level of abstraction:  Still on WCEP?  Will decision change WCEP?  Are side-effects possible and (in how far) are they bounded?  What's the overall impact on the system?  How often do we need to re-evaluate intermediate solutions? Uncertainty that cannot be tackled at any level:  Speculative execution, cache hierarchies (and replacement policies), timing anomalies in general, general I/O, etc.  However: Trend towards (many) simpler cores in fact improves situation as far as per-task predictability is concerned

6 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 6 / 49 Outline  WCET-aware Optimizations and Code Quality  Graph Coloring Register Allocation  Scratchpad Memory Allocation  Cache-aware Memory Content Selection  Cache Partitioning for Multi-task Real-time Systems  Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning

7 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 7 / 49 Workflow of Graph Coloring RA 1.Initialization: Build Interference Graph G = (V, E) with V = {virtual registers}  {K physical processor registers}, e = (v, w)  E  v and w may never share the same PHREG, (i. e. v and w interfere) 2.Simplification: Remove all nodes v  V with degree < K 3.Spilling: After step 2, each node of G has degree  K. Select one v  V; mark v as potential spill; remove v from G 4.Repeat steps 2 and 3 until G =  5.Coloring: Successively re-insert nodes v into G in reverse order; if there is a free color k v, color v; else, mark v as actual spill [A. W. Appel, Modern compiler implementation in C, 1998]

8 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 8 / 49 Problem of Standard Graph Coloring 3.Spilling: After step 2, each node of G has degree  K. Select one v ∊ V; mark v as potential spill; remove v from G Which node v should be selected as potential spill? Common graph coloring implementations select …  … the first node v according to the order in which VREGs were generated during code selection, ... the node with highest degree in the interference graph, ... a node with high degree, with many DEFs/USEs, in some inner loop – maybe depending on profiling data.  Uncontrolled spill code generation – potentially along Worst-Case Execution Path (WCEP) defining the WCET!

9 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 9 / 49 WCET-aware Register Allocation  Derived from classic Chaitin graph coloring  Registers allocation as a problem from the „tip“ of the memory hierarchy  Besides runtime overhead, spill-code affects:  Instruction count, schedule, memory layout, cache access and pattern, etc.  WCET-aware optimization must take into account:  Where to store data (actual allocation decision)?  But also: Where to store (spill) instruction (relative to WCEP)? The catch:  …relies on WCET data provided by WCET analysis using aiT ...can’t obtain WCET data since code contains virtual registers

10 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 10 / 49 WCET-aware RA: of Chickens and Eggs Pessimistic register allocation:  Start by marking all VREGs as actual spill (each VREG is spilled. Now code is fully analyzable)  Perform WCET analysis, get WCEP  Allocate VREGs of basic block b with most worst-case spill code executions to PHREGs using standard GC on original program  Re-evaluate novel WCEP  Stop and allocate rest if no more VREGS on WCEP

11 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 11 / 49 Results – Worst-Case Execution Times 100% = WCET EST using Standard Graph Coloring (highest degree) 93% 24% 69%

12 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 12 / 49 Results – Average-Case Execution Times 100% = ACET using Standard Graph Coloring (highest degree) -6% – -12%

13 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 13 / 49 Summary & Caveats Summary  Standard graph coloring unaware of worst-case properties  May thus lead to uncontrolled spill code generation along WCEP  WCET-aware register allocation: combination of standard graph coloring with WCET-aware spill heuristic  Average WCET reductions over 46 benchmarks: 31.2% Caveats  “Bad” spills not revocable, might unbalance pipeline load  Experiments with highly accurate ILP-based WCET-aware register allocation [H. Falk, WCET-aware Register Allocation based on Graph Coloring, DAC 2009]

14 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 14 / 49 Outline  WCET-aware Optimizations and Code Quality  Graph Coloring Register Allocation  Scratchpad Memory Allocation  Cache-aware Memory Content Selection  Cache Partitioning for Multi-task Real-time Systems  Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning

15 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 15 / 49 Caches vs. Scratchpad Memories (SPM) Caches: Processor L1-Cache Main Memory Scratchpads: Processor SPM Main Memory  Hardware-controlled  Cache contents difficult to predict statically  Latencies of memory accesses highly variable  WCET EST often imprecise  Caches often deactivated in hard real-time systems  No autonomous hardware  SPM seamlessly integrated in processor’s address space  Latencies of memory accesses constant  WCET EST extremely precise  SPM contents to be defined by compiler

16 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 16 / 49 Scratchpad Allocation: Variants and Caveats Characteristics of code and data allocation:  Data relocation and mutation „naturally“ supported by architecture  Code relocation usually requires modification of instructions  Locality annihilated  (Potentially already optimized) Runtime properties destroyed Static and dynamic scratchpad optimization:  Static: Precompute global and static relocation, maintain order (therefore locations implicit)  Dynamic: Precompute dynamic exchange of SPM contents  Perspective of static analysis: self-modifying code  Static dispatch (overlaying targets)  Memory allocation is hard

17 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 17 / 49 ILP for WCET-aware SPM Allocation of Code  Goal  Determine set of basic blocks to be allocated to the SPM ...such that selected basic blocks lead to overall minimization of WCET EST ...under consideration of switching WCEPs.  Approach  Integer-linear programming (ILP)  Optimality of results: no need for backtracking techniques  In the following: uppercase = constants, lowercase = variables

18 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 18 / 49  Costs of basic block b i : models the WCET EST of b i if it is allocated to main memory or SPM, respectively Decision Variables & Costs  Binary decision variables per basic block (BB):

19 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 19 / 49 Intraprocedural Control Flow  Modeling of a function’s control flow: A CB D E Acyclic sub-graphs:(Reducible) Loops: B A C D E  Treat body of inner- most loop L like acyclic sub-graph  Fold loop L  Costs of L:  Continue with next innermost loop [V. Suhendra et al., WCET Centric Data Allocation to Scratchpad Memory, RTSS 2005] = WCET of any path starting at A Loop L B, C, D

20 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 20 / 49  Jump scenarios: Cross-Memory Jumps  Allocation of consecutive BBs:  Allocation of consecutive BBs in the CFG to different memories requires adaption/insertion of dedicated jumping code  Cross-memory jumps are costly  Jumping code: variable overhead in terms of WCET EST and code size, depending on decision variables bibi bkbk bjbj bibi bkbk bjbj bibi bjbj a) Implicitb) Unconditional c) Conditional

21 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 21 / 49  Penalties for jump scenarios (   Boolean XOR):  Penalty for Implicit jumps: Add high penalty if BBs i and j are placed in different memories  Penalty for Unconditional jumps:  If b i and b j in different memories:  If b i and b j adjacent in same memory: 0  If b i and b j not adjacent in same memory:  Conditional jumps: Obvious combination of and Penalties for Cross-Memory Jumps bibi bkbk bjbj bjbj bkbk bjbj

22 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 22 / 49 Jump Penalties & Interprocedural Control Flow  Jump penalties for basic block b i :  Modeling of the global control flow:  Variable models cost of WCEP starting at b F entry  If F’ calls F, must be added to WCET EST of F’

23 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 23 / 49 Call Penalties  Call penalties for basic block b i : If b i calls F, add WCET EST of F to call penalty. Furthermore, add if b i contains cross-memory call, otherwise.  Final control flow constraints per basic block b i : Add jump and call penalties to variable modeling WCET EST of any path starting at b i

24 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 24 / 49 Objective Function  WCET EST of entire program:  Variable models WCET EST of entire program  Size of BB b i depends on actual jumping code for b i :  Size of jumping code for b i : # bytes for jumping code, depending on jump/call scenario  Total size of basic block b i : Size of b i without any jumping code plus Size of b i ’s jumping code Scratchpad Capacity

25 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 25 / 49 Average WCET EST for 73 Benchmarks  Steady WCET EST decreases for increasing SPM sizes  WCET EST reductions from 7% – 40% 7%

26 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 26 / 49 Summary & Caveats Summary  Current state of the art:  Neglects varying jumping code in basic blocks  Select one element of power set of basic blocks  Our approach:  Models changing WCEPs  Uses jump scenarios to cope with varying jumping code Caveats  Implicit control-flow model requires well-structured code  No component-wise compilation [H. Falk, J. Kleinsorge Optimal Static WCET-aware Scratchpad Allocation of Program Code, DAC 2009]

27 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 27 / 49 Outline  WCET-aware Optimizations and Code Quality  Graph Coloring Register Allocation  Scratchpad Memory Allocation  Cache-aware Memory Content Selection  Cache Partitioning for Multi-task Real-time Systems  Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning

28 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 28 / 49 Cache-aware Memory Content Selection  Compilers good at dealing with registers (register/stack)  WCC good at SPM-allocation (spm/main memory)  Aspects of cache-aware optimizations:  Generally unresolved problem due to system-wide influence of local decisions and generally unknown cache parameters  Only generalized attempts on data - like loop transformations - to improve average access pattern on data  For predictability and idle optimization potential in code:  Divide program in cached or uncached parts  Software-controlled memory content selection to adapt to actual access pattern

29 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 29 / 49 Example for unprofitable memory layout:  Mutual eviction of functions  Could lead to a highly increased WCET EST due to thrashing void foo1() { for(i=0; i<100; i++) { foo2(); foo3();... } Cache-aware Memory Content Selection

30 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 30 / 49 Basic idea:  Step-wise allocation of functions to cached memory areas  Select functions whose WCET EST benefits most from cached execution  Unprofitable functions w. r. t. a program’s WCET EST must not evict profitable ones from cache  Hill-climbing approach with a “profit”-metric: Cache-aware Memory Content Selection

31 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 31 / 49 Memory Content Selection Algorithm (1) LLIR mcs( LLIR P, Cache cache ): // Precompute profit profit = computeFunctionProfit( P ) // Fill cache exactly once for_each( sort( F in P, profit ) ): allocate( F, cache ) if cache.full: break // Perform WCET-aware cache-allocation...

32 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 32 / 49 Memory Content Selection Algorithm (2) // “Overcharge” cache memory unless WCET EST degrades wcet = computeWCET( P ) profit = computeFunctionProfit( P ) // As before: most profitable function first for_each( sort( F in P, profit ) ): allocate( F, cache ) tmp = computeWCET( P ) // Only keep improvements if ( wcet < tmp ): deallocate( F, cache ) else wcet = tmp profit = computeFunctionProfit( P )

33 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 33 / 49 Results compared to unoptimized cache usage EST 20%

34 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 34 / 49 Conclusion  Iterative approach ensures optimizing along a possibly switching WCEP  Profitable functions not evicted from cache by unprofitable ones w.r.t. their WCET EST  Achieves WCET EST reductions of up to 20% Caveats  Greedy approach (upside: direct, simple)  Functions as allocation units might be too coarse Summary & Caveats [S. Plazar, P. Lokuciejewski and P. Marwedel, WCET-driven Cache-aware Memory Content Selection, ISORC 2010]

35 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 35 / 49 Outline  WCET-aware Optimizations and Code Quality  Graph Coloring Register Allocation  Scratchpad Memory Allocation  Cache-aware Memory Content Selection  Cache Partitioning for Multi-task Real-time Systems  Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning

36 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 36 / 49 Software-based Cache Partitioning General thoughts on presented optimization strategies:  Until now, greedy relocation successful strategy to get around intra-task cache conflicts due to tight coupling with static WCET analysis  Fails in multi-task environments: Analysis unaware of potential preemptions  Safety can only be achieved by guaranteeing no collisions  Granularity: instructions (possibly splitting basic blocks) Intuition:  Divide the cache into partitions of optimal size  Assign one task per partition to prevent mutual eviction

37 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 37 / 49 Software-based Cache Partitioning  Exploit the cache addressing logic (index bits)  Distribute memory blocks of tasks over address space  Ensure mapping to particular cache lines  Effectively inverts the logical mapping direction 0x0 0x80 0x100

38 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 38 / 49 WCET-aware Cache Partitioning Greedy approach  Partition size depends on task’s code size  Example: 4 tasks with the same code size Better  ILP-model to select individual partition size per task  Take number of activations into account Cache Line 0 63 Task 1 Task 2 Task 4 Task 3 Cache Line 0 63 Task 1 Task 2 Task 4 Task 3 [F. Müller, Compiler Support for Software-Based Cache Partitioning, 1995]

39 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 39 / 49 ILP Formulation

40 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 40 / 49 ILP Formulation Each task must have a partition assigned: Keep track of the cache size:

41 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 41 / 49 ILP Formulation Partition-specific WCET per task: Objective function to minimize:

42 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 42 / 49 Results: MRTC benchmarks Average of 100 sets of randomly selected tasks:  5 tasks: ~6kB  10 tasks: ~12kB  15 tasks: ~19kB WCET relative to greedy approach

43 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 43 / 49 Conclusion  Optimal partition sizes w.r.t. the overall system WCET  Partitioning introduces predictability for preemptive schedules  Average WCET reduction of 12% (5 tasks) up to 19% (15 tasks) compared to greedy approach Caveats  “Zero-collision” policy can be too conservative depending on the actual cache logic and scheduling policy  Pre-computation of partitions time consuming  Locality in address space (basic block splits, instruction corrections) Summary & Caveats [S. Plazar, P. Lokuciejewski and P. Marwedel, WCET-aware Software Based Cache Partitioning for Multi-Task Real-Time Systems, WCET 2009]

44 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 44 / 49 Outline  WCET-aware Optimizations and Code Quality  Graph Coloring Register Allocation  Scratchpad Memory Allocation  Cache-aware Memory Content Selection  Cache Partitioning for Multi-task Real-time Systems  Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning

45 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 45 / 49 An Experiment: Combined Approach Can SPM allocation, memory content selection and cache partitioning be combined?  Intention is to fully exploit memory hierarchy  All three severely alter the memory layout due to relocation and partitioning  Order of application critical for good results Example: MCS prior to SPM CachedUncachedSPM Task i Task i,j Task i,k Task i,l Task i,j,0 Task i,j,1 Task i,j,2 Task i,k,0 Task i,k,1

46 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 46 / 49 Combined Approach (1) Reasoning about the order of application  SPM allocation (SPMA) should be performed prior to Memory Content Selection (MCS) and Cache Partitioning (CP)  CP prior to MCS:  Similar to previous example: cache potentially under-utilized  MCS prior to CP:  CP only considers objects designated to be cached by MCS  Likely that the greedy MCS decision was inappropriate given the potential exploited by a fine-grained partitioning  Computing MCS solution per partition in precomputation of CP  Apply CP ILP to determine optimal combination

47 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 47 / 49 Application in order:  Effects of SPM, CP invoking MCS in preprocessing Remains uncached (MCS) Not affected by CP/MCS (SPMA)

48 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 48 / 49 Evaluation Gain compared to unoptimized code SPM size (%) Cache size (%) 92% Gains in WCET EST :  crc, fft1, gsm_decode, trellis 73%

49 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 49 / 49 Remarks  WCET-aware compilation:  Compilers usually are unaware of timing  Optimistic optimization strategies: no clearly defined objective  “Maybe faster but could be worse” doesn’t quite cut it for hard real-time applications (profile-guided optimization no match)  Fine-grained optimization decisions span from well-directed exploitation over conflict reduction to full conflict freedom Challenges  Multi-tasking: component-wise compilation, interaction, OS  Multi-core: inter-core communication  Tailor (fully) predictable but still highly configurable systems Conclusion

50 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 50 / 49

51 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 51 / 49 Backup: RA

52 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 52 / 49 WCET-Aware Graph Coloring (1) LLIR WCET_GC_RA( LLIR P ): // Iterate until current WCEP is fully allocated. while ( true ): // Clone P, spill all VREGs of P’ onto stack. LLIR P’ = P.copy() P’.spillAllVREGs() // Compute Worst-Case Execution Path for fully spilled LLIR. set WCEP = computeWCEP( P’ ) // If there are no more VREGs, the allocation loop is over. if ( getVREGs( WCEP ) ==  ) break

53 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 53 / 49 WCET-Aware Graph Coloring (2) // Determine that block on the WCEP with highest product of // Worst-Case Execution Count * spilling instructions. basic_block b’ = getMaxSpillCodeBlock( WCEP ) basic_block b = getBlockOfOriginalP( b’ ) // Collect all VREGs of this most critical block. list vregs = getVREGs( b ) // Sort VREGs by #occurrences, apply standard graph coloring. vregs.sort( occurrences of VREG in b ) traditionalGraphColoring( P, vregs ) end while // Allocate all remaining VREGs not lying on the WCEP. traditionalGraphColoring( P, getVREGs( P ) ) return P; }

54 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 54 / 49 WCET-aware RA: spilling

55 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 55 / 49 Backup: SPM allocation

56 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 56 / 49 Timing Predictability of Caches & SPMs (G.721)  SPMs are – in contrast to caches – highly predictable: WCET EST scale with ACETs [L. Wehmeyer, P. Marwedel, Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software, DATE 2005]

57 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 57 / 49 Support of the ILP by WCC Infrastructure WCET EST of BB b i for SPM and main memory:, Max. Iteration counts of loop L: Size of BB b i : SPM Size = 47 kB SPM Access = 1 Cycle Flash Access = 6 Cycles Other parameters hard-coded:,, …

58 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 58 / 49 WCET EST for g721 encoder  Steady WCET EST decreases for increasing SPM sizes  WCET EST reductions from 29% – 48%  X-Axis: SPM size = x% of benchmark’s code size  Y-Axis: 100% = WCET EST when not using SPM at all

59 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 59 / 49 WCET EST for cover  X-Axis: SPM size = x% of benchmark’s code size  Y-Axis: 100% = WCET EST when not using SPM at all  Stepwise WCET EST decreases: Useful content allocated to SPM only at 40%, 70% and 100% relative SPM size  WCET EST reductions of 10%, 35% and 44%

60 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 60 / 49 WCET EST for md5  X-Axis: SPM size = x% of benchmark’s code size  Y-Axis: 100% = WCET EST when not using SPM at all  Almost invariable WCET EST reductions for all SPM sizes: 40% – 44%  ILP clearly finds tiny but time-critical hot-spot of md5 and allocates it to SPM

61 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 61 / 49 Backup: Content selection

62 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 62 / 49 Example for unprofitable memory layout:  Mutual eviction of functions  Could lead to a highly increased WCET EST due to increased number of possible cache misses Cache-aware Memory Content Selection WCET EST reduction: (350-195+690-470 = ) 375 cycles

63 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 63 / 49 Evaluation  Infineon TriCore TC1796,16 kB 2-way set associative cache (LRU), 2 MB program flash  Employed the 10 largest benchmarks of our benchmark suites DSP Stone, MediaBench, MiBench, MRTC, Netbench and UTDSP  Code size ranges from 5 kB (v32.modem_bencode) up to 15 kB (the two rijndael benchmarks)  Using optimization level –O3 (incl. procedure positioning)  Artificially limit cache sizes to 5, 10 and 20% of overall code size

64 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 64 / 49 Optimization Time  Most of the optimization time consumed by repetitive WCET analyses employing aiT  Maximal number of analyses amounts to:  Test machine: Intel Xeon X3220 (2.4 GHz)  rinjndael_decoder : 6 WCET analyses consumed almost 2 hours of CPU time  g721/g723_encode : 17 WCET analyses amount to 8 respectively 10 minutes analysis time

65 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 65 / 49 Backup: Cache partitioning

66 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 66 / 49 Distribution of Code  Achieved by exploiting the linker  Each portion is assigned to its own section  Example linker script for two tasks:

67 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 67 / 49 Memory usage

68 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 68 / 49 Optimization Time  Host machine: Dual Xeon L5420 @ 2.50GHz  Using a single core  Complete workflow consists of:  Compilation  Analysis  Optimization : up to 3 minutes : up to 1 hour / task : up to 1 minute

69 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 69 / 49 Results: UTDSP benchmarks Average of 100 sets of randomly selected tasks:  5 tasks: ~8kB  10 tasks: ~18kB  15 tasks: ~26kB WCET relative to Greedy approach

70 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 70 / 49 Partitioning Overhead  Average WCET increase  Caused by additional jumps

71 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 71 / 49 Backup: Combined

72 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 72 / 49 Combined Approach (2) Adaption of algorithms for multi-tasking model  SPMA  Requires heuristic to assign memory space per task  Three heuristics directly apparent:  WCET: ratio of single task WCET to accumulated task-set WCET  CS: ratio of code-size to accumulated code-size  WCET&CS = (WCET/CS)/2: based on assumption that larger portions of assigned space also yields performance improvements  CP and MCS  restrict to functions not allocated to SPM already

73 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 73 / 49 Evaluation Sample results for multi-task set of:  crc, g721 marcuslee decoder, h264dec_ldecode_block CP CP&MCS Gain compared to unoptimized code Allowed relative cache size of full task-set Algorithm:

74 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 74 / 49 Backup: DemoCar

75 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 75 / 49

76 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 76 / 49

77 © J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 77 / 49


Download ppt "Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis for Safety-Critical Real-Time Systems – WCET-aware program."

Similar presentations


Ads by Google