Download presentation
Presentation is loading. Please wait.
Published byBeverly Hunt Modified over 9 years ago
1
Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis for Safety-Critical Real-Time Systems – WCET-aware program optimizations Heiko Falk Embedded Systems/Real-Time Systems Ulm University, Germany Jan C. Kleinsorge Computer Science 12 TU Dortmund, Germany
2
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 2 / 49 Outline WCET-aware Optimizations and Code Quality Graph Coloring Register Allocation Scratchpad Memory Allocation Cache-aware Memory Content Selection Cache Partitioning for Multi-task Real-time Systems Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning
3
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 3 / 49 WCET-aware Optimizations and Code Quality WCET as objective function Actual speed-up but also by enhancing analyzability Side-effects of changes on timing hard to anticipate Issuing just a single instruction can lead to uncertainty regarding: Location, alignment, access pattern (cache), schedule (pipeline), branch prediction, etc. Code quality for WCET-aware optimizations Avoid dynamic dispatch, excessive inflation and layout changes without being clear about its effects In short: maintain predictability first
4
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 4 / 49 WCET-aware Optimizations and Systems (1) Semantics: Computation Layout Pipeline System AbstractionAbstraction Uncertainty Open parameters: + Ideally: just program input + Location: accesses to busses, memories + Order, registers + Dependencies Expressions Insn (virt.) Insn (phys.) “BLOB”
5
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 5 / 49 WCET-aware Optimizations and Systems (2) Practical heuristics to pick the right level of abstraction: Still on WCEP? Will decision change WCEP? Are side-effects possible and (in how far) are they bounded? What's the overall impact on the system? How often do we need to re-evaluate intermediate solutions? Uncertainty that cannot be tackled at any level: Speculative execution, cache hierarchies (and replacement policies), timing anomalies in general, general I/O, etc. However: Trend towards (many) simpler cores in fact improves situation as far as per-task predictability is concerned
6
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 6 / 49 Outline WCET-aware Optimizations and Code Quality Graph Coloring Register Allocation Scratchpad Memory Allocation Cache-aware Memory Content Selection Cache Partitioning for Multi-task Real-time Systems Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning
7
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 7 / 49 Workflow of Graph Coloring RA 1.Initialization: Build Interference Graph G = (V, E) with V = {virtual registers} {K physical processor registers}, e = (v, w) E v and w may never share the same PHREG, (i. e. v and w interfere) 2.Simplification: Remove all nodes v V with degree < K 3.Spilling: After step 2, each node of G has degree K. Select one v V; mark v as potential spill; remove v from G 4.Repeat steps 2 and 3 until G = 5.Coloring: Successively re-insert nodes v into G in reverse order; if there is a free color k v, color v; else, mark v as actual spill [A. W. Appel, Modern compiler implementation in C, 1998]
8
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 8 / 49 Problem of Standard Graph Coloring 3.Spilling: After step 2, each node of G has degree K. Select one v ∊ V; mark v as potential spill; remove v from G Which node v should be selected as potential spill? Common graph coloring implementations select … … the first node v according to the order in which VREGs were generated during code selection, ... the node with highest degree in the interference graph, ... a node with high degree, with many DEFs/USEs, in some inner loop – maybe depending on profiling data. Uncontrolled spill code generation – potentially along Worst-Case Execution Path (WCEP) defining the WCET!
9
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 9 / 49 WCET-aware Register Allocation Derived from classic Chaitin graph coloring Registers allocation as a problem from the „tip“ of the memory hierarchy Besides runtime overhead, spill-code affects: Instruction count, schedule, memory layout, cache access and pattern, etc. WCET-aware optimization must take into account: Where to store data (actual allocation decision)? But also: Where to store (spill) instruction (relative to WCEP)? The catch: …relies on WCET data provided by WCET analysis using aiT ...can’t obtain WCET data since code contains virtual registers
10
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 10 / 49 WCET-aware RA: of Chickens and Eggs Pessimistic register allocation: Start by marking all VREGs as actual spill (each VREG is spilled. Now code is fully analyzable) Perform WCET analysis, get WCEP Allocate VREGs of basic block b with most worst-case spill code executions to PHREGs using standard GC on original program Re-evaluate novel WCEP Stop and allocate rest if no more VREGS on WCEP
11
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 11 / 49 Results – Worst-Case Execution Times 100% = WCET EST using Standard Graph Coloring (highest degree) 93% 24% 69%
12
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 12 / 49 Results – Average-Case Execution Times 100% = ACET using Standard Graph Coloring (highest degree) -6% – -12%
13
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 13 / 49 Summary & Caveats Summary Standard graph coloring unaware of worst-case properties May thus lead to uncontrolled spill code generation along WCEP WCET-aware register allocation: combination of standard graph coloring with WCET-aware spill heuristic Average WCET reductions over 46 benchmarks: 31.2% Caveats “Bad” spills not revocable, might unbalance pipeline load Experiments with highly accurate ILP-based WCET-aware register allocation [H. Falk, WCET-aware Register Allocation based on Graph Coloring, DAC 2009]
14
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 14 / 49 Outline WCET-aware Optimizations and Code Quality Graph Coloring Register Allocation Scratchpad Memory Allocation Cache-aware Memory Content Selection Cache Partitioning for Multi-task Real-time Systems Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning
15
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 15 / 49 Caches vs. Scratchpad Memories (SPM) Caches: Processor L1-Cache Main Memory Scratchpads: Processor SPM Main Memory Hardware-controlled Cache contents difficult to predict statically Latencies of memory accesses highly variable WCET EST often imprecise Caches often deactivated in hard real-time systems No autonomous hardware SPM seamlessly integrated in processor’s address space Latencies of memory accesses constant WCET EST extremely precise SPM contents to be defined by compiler
16
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 16 / 49 Scratchpad Allocation: Variants and Caveats Characteristics of code and data allocation: Data relocation and mutation „naturally“ supported by architecture Code relocation usually requires modification of instructions Locality annihilated (Potentially already optimized) Runtime properties destroyed Static and dynamic scratchpad optimization: Static: Precompute global and static relocation, maintain order (therefore locations implicit) Dynamic: Precompute dynamic exchange of SPM contents Perspective of static analysis: self-modifying code Static dispatch (overlaying targets) Memory allocation is hard
17
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 17 / 49 ILP for WCET-aware SPM Allocation of Code Goal Determine set of basic blocks to be allocated to the SPM ...such that selected basic blocks lead to overall minimization of WCET EST ...under consideration of switching WCEPs. Approach Integer-linear programming (ILP) Optimality of results: no need for backtracking techniques In the following: uppercase = constants, lowercase = variables
18
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 18 / 49 Costs of basic block b i : models the WCET EST of b i if it is allocated to main memory or SPM, respectively Decision Variables & Costs Binary decision variables per basic block (BB):
19
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 19 / 49 Intraprocedural Control Flow Modeling of a function’s control flow: A CB D E Acyclic sub-graphs:(Reducible) Loops: B A C D E Treat body of inner- most loop L like acyclic sub-graph Fold loop L Costs of L: Continue with next innermost loop [V. Suhendra et al., WCET Centric Data Allocation to Scratchpad Memory, RTSS 2005] = WCET of any path starting at A Loop L B, C, D
20
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 20 / 49 Jump scenarios: Cross-Memory Jumps Allocation of consecutive BBs: Allocation of consecutive BBs in the CFG to different memories requires adaption/insertion of dedicated jumping code Cross-memory jumps are costly Jumping code: variable overhead in terms of WCET EST and code size, depending on decision variables bibi bkbk bjbj bibi bkbk bjbj bibi bjbj a) Implicitb) Unconditional c) Conditional
21
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 21 / 49 Penalties for jump scenarios ( Boolean XOR): Penalty for Implicit jumps: Add high penalty if BBs i and j are placed in different memories Penalty for Unconditional jumps: If b i and b j in different memories: If b i and b j adjacent in same memory: 0 If b i and b j not adjacent in same memory: Conditional jumps: Obvious combination of and Penalties for Cross-Memory Jumps bibi bkbk bjbj bjbj bkbk bjbj
22
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 22 / 49 Jump Penalties & Interprocedural Control Flow Jump penalties for basic block b i : Modeling of the global control flow: Variable models cost of WCEP starting at b F entry If F’ calls F, must be added to WCET EST of F’
23
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 23 / 49 Call Penalties Call penalties for basic block b i : If b i calls F, add WCET EST of F to call penalty. Furthermore, add if b i contains cross-memory call, otherwise. Final control flow constraints per basic block b i : Add jump and call penalties to variable modeling WCET EST of any path starting at b i
24
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 24 / 49 Objective Function WCET EST of entire program: Variable models WCET EST of entire program Size of BB b i depends on actual jumping code for b i : Size of jumping code for b i : # bytes for jumping code, depending on jump/call scenario Total size of basic block b i : Size of b i without any jumping code plus Size of b i ’s jumping code Scratchpad Capacity
25
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 25 / 49 Average WCET EST for 73 Benchmarks Steady WCET EST decreases for increasing SPM sizes WCET EST reductions from 7% – 40% 7%
26
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 26 / 49 Summary & Caveats Summary Current state of the art: Neglects varying jumping code in basic blocks Select one element of power set of basic blocks Our approach: Models changing WCEPs Uses jump scenarios to cope with varying jumping code Caveats Implicit control-flow model requires well-structured code No component-wise compilation [H. Falk, J. Kleinsorge Optimal Static WCET-aware Scratchpad Allocation of Program Code, DAC 2009]
27
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 27 / 49 Outline WCET-aware Optimizations and Code Quality Graph Coloring Register Allocation Scratchpad Memory Allocation Cache-aware Memory Content Selection Cache Partitioning for Multi-task Real-time Systems Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning
28
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 28 / 49 Cache-aware Memory Content Selection Compilers good at dealing with registers (register/stack) WCC good at SPM-allocation (spm/main memory) Aspects of cache-aware optimizations: Generally unresolved problem due to system-wide influence of local decisions and generally unknown cache parameters Only generalized attempts on data - like loop transformations - to improve average access pattern on data For predictability and idle optimization potential in code: Divide program in cached or uncached parts Software-controlled memory content selection to adapt to actual access pattern
29
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 29 / 49 Example for unprofitable memory layout: Mutual eviction of functions Could lead to a highly increased WCET EST due to thrashing void foo1() { for(i=0; i<100; i++) { foo2(); foo3();... } Cache-aware Memory Content Selection
30
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 30 / 49 Basic idea: Step-wise allocation of functions to cached memory areas Select functions whose WCET EST benefits most from cached execution Unprofitable functions w. r. t. a program’s WCET EST must not evict profitable ones from cache Hill-climbing approach with a “profit”-metric: Cache-aware Memory Content Selection
31
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 31 / 49 Memory Content Selection Algorithm (1) LLIR mcs( LLIR P, Cache cache ): // Precompute profit profit = computeFunctionProfit( P ) // Fill cache exactly once for_each( sort( F in P, profit ) ): allocate( F, cache ) if cache.full: break // Perform WCET-aware cache-allocation...
32
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 32 / 49 Memory Content Selection Algorithm (2) // “Overcharge” cache memory unless WCET EST degrades wcet = computeWCET( P ) profit = computeFunctionProfit( P ) // As before: most profitable function first for_each( sort( F in P, profit ) ): allocate( F, cache ) tmp = computeWCET( P ) // Only keep improvements if ( wcet < tmp ): deallocate( F, cache ) else wcet = tmp profit = computeFunctionProfit( P )
33
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 33 / 49 Results compared to unoptimized cache usage EST 20%
34
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 34 / 49 Conclusion Iterative approach ensures optimizing along a possibly switching WCEP Profitable functions not evicted from cache by unprofitable ones w.r.t. their WCET EST Achieves WCET EST reductions of up to 20% Caveats Greedy approach (upside: direct, simple) Functions as allocation units might be too coarse Summary & Caveats [S. Plazar, P. Lokuciejewski and P. Marwedel, WCET-driven Cache-aware Memory Content Selection, ISORC 2010]
35
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 35 / 49 Outline WCET-aware Optimizations and Code Quality Graph Coloring Register Allocation Scratchpad Memory Allocation Cache-aware Memory Content Selection Cache Partitioning for Multi-task Real-time Systems Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning
36
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 36 / 49 Software-based Cache Partitioning General thoughts on presented optimization strategies: Until now, greedy relocation successful strategy to get around intra-task cache conflicts due to tight coupling with static WCET analysis Fails in multi-task environments: Analysis unaware of potential preemptions Safety can only be achieved by guaranteeing no collisions Granularity: instructions (possibly splitting basic blocks) Intuition: Divide the cache into partitions of optimal size Assign one task per partition to prevent mutual eviction
37
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 37 / 49 Software-based Cache Partitioning Exploit the cache addressing logic (index bits) Distribute memory blocks of tasks over address space Ensure mapping to particular cache lines Effectively inverts the logical mapping direction 0x0 0x80 0x100
38
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 38 / 49 WCET-aware Cache Partitioning Greedy approach Partition size depends on task’s code size Example: 4 tasks with the same code size Better ILP-model to select individual partition size per task Take number of activations into account Cache Line 0 63 Task 1 Task 2 Task 4 Task 3 Cache Line 0 63 Task 1 Task 2 Task 4 Task 3 [F. Müller, Compiler Support for Software-Based Cache Partitioning, 1995]
39
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 39 / 49 ILP Formulation
40
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 40 / 49 ILP Formulation Each task must have a partition assigned: Keep track of the cache size:
41
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 41 / 49 ILP Formulation Partition-specific WCET per task: Objective function to minimize:
42
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 42 / 49 Results: MRTC benchmarks Average of 100 sets of randomly selected tasks: 5 tasks: ~6kB 10 tasks: ~12kB 15 tasks: ~19kB WCET relative to greedy approach
43
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 43 / 49 Conclusion Optimal partition sizes w.r.t. the overall system WCET Partitioning introduces predictability for preemptive schedules Average WCET reduction of 12% (5 tasks) up to 19% (15 tasks) compared to greedy approach Caveats “Zero-collision” policy can be too conservative depending on the actual cache logic and scheduling policy Pre-computation of partitions time consuming Locality in address space (basic block splits, instruction corrections) Summary & Caveats [S. Plazar, P. Lokuciejewski and P. Marwedel, WCET-aware Software Based Cache Partitioning for Multi-Task Real-Time Systems, WCET 2009]
44
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 44 / 49 Outline WCET-aware Optimizations and Code Quality Graph Coloring Register Allocation Scratchpad Memory Allocation Cache-aware Memory Content Selection Cache Partitioning for Multi-task Real-time Systems Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning
45
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 45 / 49 An Experiment: Combined Approach Can SPM allocation, memory content selection and cache partitioning be combined? Intention is to fully exploit memory hierarchy All three severely alter the memory layout due to relocation and partitioning Order of application critical for good results Example: MCS prior to SPM CachedUncachedSPM Task i Task i,j Task i,k Task i,l Task i,j,0 Task i,j,1 Task i,j,2 Task i,k,0 Task i,k,1
46
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 46 / 49 Combined Approach (1) Reasoning about the order of application SPM allocation (SPMA) should be performed prior to Memory Content Selection (MCS) and Cache Partitioning (CP) CP prior to MCS: Similar to previous example: cache potentially under-utilized MCS prior to CP: CP only considers objects designated to be cached by MCS Likely that the greedy MCS decision was inappropriate given the potential exploited by a fine-grained partitioning Computing MCS solution per partition in precomputation of CP Apply CP ILP to determine optimal combination
47
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 47 / 49 Application in order: Effects of SPM, CP invoking MCS in preprocessing Remains uncached (MCS) Not affected by CP/MCS (SPMA)
48
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 48 / 49 Evaluation Gain compared to unoptimized code SPM size (%) Cache size (%) 92% Gains in WCET EST : crc, fft1, gsm_decode, trellis 73%
49
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 49 / 49 Remarks WCET-aware compilation: Compilers usually are unaware of timing Optimistic optimization strategies: no clearly defined objective “Maybe faster but could be worse” doesn’t quite cut it for hard real-time applications (profile-guided optimization no match) Fine-grained optimization decisions span from well-directed exploitation over conflict reduction to full conflict freedom Challenges Multi-tasking: component-wise compilation, interaction, OS Multi-core: inter-core communication Tailor (fully) predictable but still highly configurable systems Conclusion
50
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 50 / 49
51
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 51 / 49 Backup: RA
52
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 52 / 49 WCET-Aware Graph Coloring (1) LLIR WCET_GC_RA( LLIR P ): // Iterate until current WCEP is fully allocated. while ( true ): // Clone P, spill all VREGs of P’ onto stack. LLIR P’ = P.copy() P’.spillAllVREGs() // Compute Worst-Case Execution Path for fully spilled LLIR. set WCEP = computeWCEP( P’ ) // If there are no more VREGs, the allocation loop is over. if ( getVREGs( WCEP ) == ) break
53
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 53 / 49 WCET-Aware Graph Coloring (2) // Determine that block on the WCEP with highest product of // Worst-Case Execution Count * spilling instructions. basic_block b’ = getMaxSpillCodeBlock( WCEP ) basic_block b = getBlockOfOriginalP( b’ ) // Collect all VREGs of this most critical block. list vregs = getVREGs( b ) // Sort VREGs by #occurrences, apply standard graph coloring. vregs.sort( occurrences of VREG in b ) traditionalGraphColoring( P, vregs ) end while // Allocate all remaining VREGs not lying on the WCEP. traditionalGraphColoring( P, getVREGs( P ) ) return P; }
54
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 54 / 49 WCET-aware RA: spilling
55
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 55 / 49 Backup: SPM allocation
56
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 56 / 49 Timing Predictability of Caches & SPMs (G.721) SPMs are – in contrast to caches – highly predictable: WCET EST scale with ACETs [L. Wehmeyer, P. Marwedel, Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software, DATE 2005]
57
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 57 / 49 Support of the ILP by WCC Infrastructure WCET EST of BB b i for SPM and main memory:, Max. Iteration counts of loop L: Size of BB b i : SPM Size = 47 kB SPM Access = 1 Cycle Flash Access = 6 Cycles Other parameters hard-coded:,, …
58
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 58 / 49 WCET EST for g721 encoder Steady WCET EST decreases for increasing SPM sizes WCET EST reductions from 29% – 48% X-Axis: SPM size = x% of benchmark’s code size Y-Axis: 100% = WCET EST when not using SPM at all
59
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 59 / 49 WCET EST for cover X-Axis: SPM size = x% of benchmark’s code size Y-Axis: 100% = WCET EST when not using SPM at all Stepwise WCET EST decreases: Useful content allocated to SPM only at 40%, 70% and 100% relative SPM size WCET EST reductions of 10%, 35% and 44%
60
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 60 / 49 WCET EST for md5 X-Axis: SPM size = x% of benchmark’s code size Y-Axis: 100% = WCET EST when not using SPM at all Almost invariable WCET EST reductions for all SPM sizes: 40% – 44% ILP clearly finds tiny but time-critical hot-spot of md5 and allocates it to SPM
61
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 61 / 49 Backup: Content selection
62
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 62 / 49 Example for unprofitable memory layout: Mutual eviction of functions Could lead to a highly increased WCET EST due to increased number of possible cache misses Cache-aware Memory Content Selection WCET EST reduction: (350-195+690-470 = ) 375 cycles
63
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 63 / 49 Evaluation Infineon TriCore TC1796,16 kB 2-way set associative cache (LRU), 2 MB program flash Employed the 10 largest benchmarks of our benchmark suites DSP Stone, MediaBench, MiBench, MRTC, Netbench and UTDSP Code size ranges from 5 kB (v32.modem_bencode) up to 15 kB (the two rijndael benchmarks) Using optimization level –O3 (incl. procedure positioning) Artificially limit cache sizes to 5, 10 and 20% of overall code size
64
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 64 / 49 Optimization Time Most of the optimization time consumed by repetitive WCET analyses employing aiT Maximal number of analyses amounts to: Test machine: Intel Xeon X3220 (2.4 GHz) rinjndael_decoder : 6 WCET analyses consumed almost 2 hours of CPU time g721/g723_encode : 17 WCET analyses amount to 8 respectively 10 minutes analysis time
65
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 65 / 49 Backup: Cache partitioning
66
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 66 / 49 Distribution of Code Achieved by exploiting the linker Each portion is assigned to its own section Example linker script for two tasks:
67
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 67 / 49 Memory usage
68
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 68 / 49 Optimization Time Host machine: Dual Xeon L5420 @ 2.50GHz Using a single core Complete workflow consists of: Compilation Analysis Optimization : up to 3 minutes : up to 1 hour / task : up to 1 minute
69
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 69 / 49 Results: UTDSP benchmarks Average of 100 sets of randomly selected tasks: 5 tasks: ~8kB 10 tasks: ~18kB 15 tasks: ~26kB WCET relative to Greedy approach
70
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 70 / 49 Partitioning Overhead Average WCET increase Caused by additional jumps
71
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 71 / 49 Backup: Combined
72
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 72 / 49 Combined Approach (2) Adaption of algorithms for multi-tasking model SPMA Requires heuristic to assign memory space per task Three heuristics directly apparent: WCET: ratio of single task WCET to accumulated task-set WCET CS: ratio of code-size to accumulated code-size WCET&CS = (WCET/CS)/2: based on assumption that larger portions of assigned space also yields performance improvements CP and MCS restrict to functions not allocated to SPM already
73
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 73 / 49 Evaluation Sample results for multi-task set of: crc, g721 marcuslee decoder, h264dec_ldecode_block CP CP&MCS Gain compared to unoptimized code Allowed relative cache size of full task-set Algorithm:
74
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 74 / 49 Backup: DemoCar
75
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 75 / 49
76
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 76 / 49
77
© J.C.Kleinsorge | 2012-03-31 CGO 2012 Computer Science 12 | DAES Slide 77 / 49
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.