Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Framework for Profile-Analysis Data-Layout Optimizations Shai RubinRas BodikTrishul Chilimbi Microsoft ResearchUniversity of Wisconsin.

Similar presentations


Presentation on theme: "1 Framework for Profile-Analysis Data-Layout Optimizations Shai RubinRas BodikTrishul Chilimbi Microsoft ResearchUniversity of Wisconsin."— Presentation transcript:

1 1 Framework for Profile-Analysis Data-Layout Optimizations Shai RubinRas BodikTrishul Chilimbi Microsoft ResearchUniversity of Wisconsin

2 2 Data Layout Optimization (What) CPU Cache Memory References sequence: A.x, B, A.z 1 cycle 10 2 cycles 10 6 cycles Disk B A A.x time cache blocks Memory Pages 1 2 B A A time cache blocks B Memory Pages 1 2 DL Optimization A.x B A.z A B B DL optimization: increase spatial locality of data to prevent memory faults. Original data layout Modified data layout A.z B A A.xA.z A.x

3 3 Data Layout Layout Space Data Layout Optimization (How) Optimal for simple loops Heuristic Reference Summary Array Dep. Analysis (static) Ref. Trace (dynamic) Scientific (array based) General purpose (pointer based) Compile Time 1. Compile Time 2. Runtime Program Optimal Layout Enforce layout Data Layout Optimizer “ Good ” Layout Program′

4 4 Problems with Current Data-Layout Optimization Computationally hard to find the optimal layout [Petrank]. Computationally hard to approximate the optimal layout [Petrank]. Implication - heuristics are not robust: –will not work for all programs. From our experience with heuristics: –Field Reordering [Chilimbi PLDI ’ 99] – no improvement (on perl). –Custom Memory Allocator [Seidl ASPLOS ’ 98] degrades performance (on espresso). Our approach: replace heuristic with feedback-driven search.

5 5 Data Layout Space Searching For a Data Layout Current program data layout “Good” Layouts “Good” + “easy” to enforce layouts –a “good” layout. Search advantage: –Robust, for each program finds a “ good ” layout. Optimal data layout –an “easy” to enforce layout. Problem: Perform a search in the data layout space. Look for:

6 6 Is Search Practical? Possible layouts Data Layout Reference Trace Optimizer (Heuristic) Enforce layout EditCompileExecuteEvaluateContinue? End Not clear: Enforce

7 7 Outline Background and Problem Definition Search is a solution, but may not practical –Making the search practical Applications Summary

8 8 Making the Search Practical Reference Trace Data Layout Search Engine EditCompileExecuteEvaluate Continue? En d Compress(T)  CST Data Object Analysis DOA(CST,LS)  NLS Layout Selector LS(NLS,B,CST,SS)  DL Enforce Layout AL(DL,CST)  NT Evaluate Simulate(NT)  B “good “and enforceable layouts Class Splitting Linearization Field Reordering Layout Space Narrowed Space Search Strategy Trace Data Layout New Trace Continue(B) Benefit Compressed Symbolic Trace Search Strategy T T Trace Framework for Data Layout Optimization T

9 9 Trace Representation Problem: reference trace cannot be easily manipulated since it is too large (>10GB, >100M references). Solution: compressed trace (using modified SEQUITUR). Example: - Trace: acbcbcbcbdbdbdbde Representation advantage: - Compact; fits into main memory [ChilimbiPLDI’01]. - Expose repetitions (we use this later). - It produces a symbolic trace (i.e., a terminal is a data object). SEQUITUR Representation S  acBBBAAe B  bc A  CC C  bd

10 10 Framework for Data-Layout Optimization Reference Trace Data Layout Search Engine Compile Continue? En d Compress(T)  CST Data Object Analysis DOA(CST,LS)  NLS Layout Selector LS(NLS,B,CST,SS)  DL Enforce Layout EL(DL,CST)  CST’ Evaluate Simulate(NT)  B “good “and enforceable layouts Class Splitting Linearization Field Reordering Layout Space Narrowed Space Search Strategy Trace Data Layout Continue(B) Benefit Compressed Symbolic Trace Search Strategy  New Trace

11 11 Avoid re-compilation Problem: data layout evaluation  (edit+compilation+simulation). Solution: “ pretend ” that the program was edited and compiled. A.x, B, A.z, B A.x  10 A.z  14 B  20 30,20,34,20 New concrete trace Single symbolic trace Compile Run (simulate) Edit program Enforce Layout Symbolic trace + data layout  concrete address trace. A.x  30 A.z  34 B  20 30,20,34,20 Simple, but crucial for an efficient search. User (Optimizer) Simulate

12 12 Framework for Data-Layout Optimization Reference Trace Data Layout Search Engine Compile Continue? En d Compress(T)  CST Data Object Analysis DOA(CST,LS)  NLS Layout Selector LS(NLS,B,CST,SS)  DL Enforce Layout EL(DL,CST)  CST’ Evaluate Simulate(CST’)  B “good “and enforceable layouts Class Splitting Linearization Field Reordering Layout Space Narrowed Space Search Strategy Trace Data Layout Continue(B) Benefit Compressed Symbolic Trace Search Strategy   New Trace

13 13 Memoization: Efficient Trace Simulation Evaluation using simulation: MissRate T =Simulate(T); Problem: simulation of the whole trace (T) is too expensive. Solution: avoids re-simulation of repeated sub-traces. SEQUITUR Representation S  BBBAA B  bc A  CC C  bd CS C =Simulate′(C) CS B =Simulate ′ (B) CS A = CS C  CS C CS S = CS B  CS B  CS B  CS A  CS A T: bcbcbcbdbdbdbd Memoization: 1.Simulate each “low level” rule, compute its memoization value. −For cache simulation: memoization value = CacheState [CS]. 2.Recursively compose memoization values for “ higher ” rules. MissRate T =

14 14 Outline Background and Problem Definition Search is a solution, but maybe not feasible –Making the search practical: Trace representation Avoid recompilation Efficient simulation Applications Summary

15 15 Framework Application (1) Application: an implementation of the framework that searches in a sub-space of the layout space. Field Reordering: –Objective: reduce number of cache misses. –Sub-space: all possible (legal) orders of fields in (heap) objects. –Our search strategy: (almost) exhaustive search.

16 16 Field Reordering: Exhaustive Search We compared: –Best field order found by our iterative search. –Field orders produced by existing heuristics: Fields Temporal Affinity [ChilimbiPLDI ’ 99] Fields Access Frequency [TruongPACT ’ 98]. Runtime improvement: 0%-4.5%.

17 17 Custom Memory Allocator (CMA) A B A Page 1 Page 2 B A time address ABA Page 1 Page 2 BA time address Objective: reduce number of page faults. Allocator 1 Allocator 2 Poor localityGood locality CMA can work well if it has a good placement function: assigns dynamically allocated heap objects to memory pages (heaps). Reference trace: ABABA

18 18 CMA Placement Function (PF) malloc(size s){ } PF: Map objects to heaps PF(heap object)  int How we can find a placement function using our framework? A placement function defines a data layout. Learn by measuring the benefits of its data layout. How: use a learning algorithm. Learner PF(Attributes)  int Use Framework to Evaluate PF Size 1 2 size<24 size  24 Decision Tree Learner Profiling Information Profile(Heap objects)  runtime attributes

19 19 CMA Results ProgramNumber of heaps Espresso2 Boxsim8 Twolf5 Perl5 Ghostscript10 Lp_solve6 1 Relative to original working set size.

20 20 Contributions and Future Work Formulate data layout optimization as a search process. Build a framework for efficient search process. Improve existing optimizations; enable new optimizations. Framework limitations: –Difficult to handle very large traces (>0.5B references). –Requires some guidance from the programmer (search strategy). Future work –Advanced search strategies that combine several optimizations. –Other non-data-layout optimization – prefetching.


Download ppt "1 Framework for Profile-Analysis Data-Layout Optimizations Shai RubinRas BodikTrishul Chilimbi Microsoft ResearchUniversity of Wisconsin."

Similar presentations


Ads by Google