Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9 Dynamic Compilation

Similar presentations


Presentation on theme: "Lecture 9 Dynamic Compilation"— Presentation transcript:

1 Lecture 9 Dynamic Compilation
Motivation & Background Overview Compilation Policy Partial Method Compilation Partial Dead Code Elimination Escape Analysis Results “Partial Method Compilation Using Dynamic Profile Information”, John Whaley, OOPSLA 01 Advanced Compilers L9. Dynamic Compilation

2 I. Goals of This Lecture Beyond static compilation
Example of a complete system Use of data flow techniques in a new context Experimental approach Advanced Compilers L9. Dynamic Compilation

3 Static/Dynamic Compiler: high-level  binary, static
Interpreter: high-level, emulate, dynamic Dynamic compilation: high-level  binary, dynamic machine-independent, dynamic loading cross-module optimization Specialize program using runtime information (without profiling) Interpreter: matlab, Binary translator: Advanced Compilers L9. Dynamic Compilation

4 High-Level/Binary Binary translator: Binary-binary; mostly dynamic
Run “as-is” Software migration (x86  alpha, sun, transmeta;  powerPC  x86) Virtualization (make hardware virtualizable) Dynamic optimization (Dynamo Rio) Security (execute out of code in a cache that is “protected”) Interpreter: matlab, Binary translator: Advanced Compilers L9. Dynamic Compilation

5 Closed-world vs. Open-world
Closed-world assumption (most static compilers) All code is available a priori for analysis and compilation. Open-world assumption (most dynamic compilers) Code is not available; arbitrary code can be loaded at run time. Open-world assumption precludes many optimization opportunities. Solution: Optimistically assume the best case, but provide a way out if necessary. Advanced Compilers L9. Dynamic Compilation

6 II. Overview in Dynamic Compilation
Interpretation/Compilation policy decisions Choosing what and how to compile Collecting runtime information Instrumentation Sampling Exploiting runtime information frequently-executed code paths Advanced Compilers L9. Dynamic Compilation

7 Speculative Inlining Virtual call sites are deadly.
Kill optimization opportunities Virtual dispatch is expensive on modern CPUs Very common in object-oriented code Speculatively inline the most likely call target based on class hierarchy or profile information. Many virtual call sites have only one target, so this technique is very effective in practice. Advanced Compilers L9. Dynamic Compilation

8 III. Compilation Policy
∆Ttotal = Tcompile – (nexecutions * Timprovement) If ∆Ttotal is negative, our compilation policy decision was effective. We can try to: Reduce Tcompile (faster compile times) Increase Timprovement (generate better code) Focus on large nexecutions (compile hot spots) 80/20 rule: Pareto Principle 20% of the work for 80% of the advantage Advanced Compilers L9. Dynamic Compilation

9 Latency vs. Throughput Tradeoff: startup speed vs. execution performance Startup speed Execution performance Interpreter Best Poor ‘Quick’ compiler Fair Optimizing compiler Advanced Compilers L9. Dynamic Compilation

10 Multi-Stage Dynamic Compilation System
interpreted code Stage 1: when execution count = t1 (e.g. 2000) compiled code Execution count is the sum of method invocations & back edges executed. Stage 2: Our dynamic compilation system uses a three stage system. In Stage 1, all methods are interpreted. When the execution count for a method exceeds a certain threshold, t1, we transition to Stage 2, a compiled version. When the execution count hits another threshold, t2, we transition to Stage 3, a fully optimized version of the method. when execution count = t2 (e.g ) fully optimized code Stage 3: Advanced Compilers L9. Dynamic Compilation

11 Granularity of Compilation
Compilation takes time proportional to the amount of code being compiled. Many optimizations are not linear. Methods can be large, especially after inlining. Cutting inlining too much hurts performance considerably. Even “hot” methods typically contain some code that is rarely or never executed. All of the previous techniques compile a method-at-a-time. However, when we compile a method at a time, we really don’t have much choice in the matter. Methods can be large, especially after performing inlining, and cutting inlining too much can hurt performance considerably. Even when we are frugal about inlining, methods can still become very large. Advanced Compilers L9. Dynamic Compilation

12 Example: SpecJVM db Hot loop void read_db(String fn) {
int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } catch (IOException ioe) { Hot loop Here’s an example from the SpecJVM db benchmark. This method contains a hot loop which contains a method call. However… Advanced Compilers L9. Dynamic Compilation

13 Example: SpecJVM db Lots of rare code! void read_db(String fn) {
int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } catch (IOException ioe) { Lots of rare code! This method also contains lots of error handling code that is rarely or never executed. The “read” method in the inner loop also itself contains hot and not-hot portions. Advanced Compilers L9. Dynamic Compilation

14 Optimize hot “regions”, not methods
Optimize only the most frequently executed segments within a method. Simple technique: any basic block executed during Stage 2 is said to be hot. Beneficial secondary effect of improving optimization opportunities on the common paths. The problem is that the regions that are important to compile have nothing to do with the method boundaries. By using a method granularity, the compiler wastes a lot of time compiling and optimizing large pieces of code that do not matter. Advanced Compilers L9. Dynamic Compilation

15 Method-at-a-time Strategy
% of basic blocks compiled Check is an anomaly because it is a code just to check the correctness of the JVM – not indicative of real programs. Mtrt – raytracer – tight innermost loop Javac is real. Now let’s compare the performance of a method-at-a-time strategy with an optimal strategy. This graph shows the percentage of basic blocks that are compiled using a method-at-a-time strategy at different compilation threshold values. The vertical axis gives the percentage of basic blocks that are compiled, and the horizontal axis gives the execution thresholds. Notice that the horizontal axis is on a semi-log scale. At threshold one, all basic blocks in all methods that have ever been executed are compiled, so the value is one hundred percent. At threshold 5000, the percentage ranges from about 3% up to about 75%, depending on the application. execution threshold Advanced Compilers L9. Dynamic Compilation

16 Actual Basic Blocks Executed
% of basic blocks Now let’s take a look at how many basic blocks are actually executed at different threshold values. The first thing to notice is that even at threshold one, on average only slightly more than half of the basic blocks are ever executed. As the threshold value increases, the percentage continues to drop. This shows that a large amount of code is rare, even in hot methods. Note that because the horizontal axis is a log scale, we see a rather gentle descent. If this were graphed on a linear scale, the dropoff would be much more sharp. execution threshold Advanced Compilers L9. Dynamic Compilation

17 Dynamic Code Transformations
Compiling partial methods Partial dead code elimination Escape analysis Advanced Compilers L9. Dynamic Compilation

18 IV. Partial Method Compilation
Based on profile data, determine the set of rare blocks. Use code coverage information from the first compiled version First, based on profile data, we determine the set of rare blocks. We use the code coverage information I described earlier from the Stage 2. Any basic block that is never executed in Stage 2 is said to be rare. Advanced Compilers L9. Dynamic Compilation

19 Partial Method Compilation
Perform live variable analysis. Determine the set of live variables at rare block entry points. live: x,y,z Next, before we perform any code transformations or optimizations, we perform live variable analysis on the code. We determine the set of live variables at each of the rare block entry points. Advanced Compilers L9. Dynamic Compilation

20 Partial Method Compilation
Redirect the control flow edges that targeted rare blocks, and remove the rare blocks. to interpreter… Third, we redirect the control flow edges that targeted the rare blocks, and then remove the rare blocks as unreachable code. Advanced Compilers L9. Dynamic Compilation

21 Partial Method Compilation
Perform compilation normally. Analyses treat the interpreter transfer point as an unanalyzable method call. Next, we perform compilation normally. All of the analyses and optimizations simply treat the interpreter transfer point as an unanalyzable method call. One of the most attractive aspects of this technique is that it doesn’t require any changes to any of the existing optimizations and analyses. Advanced Compilers L9. Dynamic Compilation

22 Partial Method Compilation
Record a map for each interpreter transfer point. In code generation, generate a map that specifies the location, in registers or memory, of each of the live variables. Maps are typically < 100 bytes live: x,y,z x: sp - 4 Next, in code generation, we record a map for each interpreter transfer point. This map specifies the location, in registers or memory, where the live variable can be found. These maps are typically less than 100 bytes each. y: R1 z: sp - 8 Advanced Compilers L9. Dynamic Compilation

23 V. Partial Dead Code Elimination
Move computation that is only live on a rare path into the rare block, saving computation in the common case. The first optimization is partial dead code elimination. We made a slight modification to our existing dead code elimination algorithm to treat rare blocks specially. We move computation that is only live on a rare path into the rare block. This saves computation in the common case. Advanced Compilers L9. Dynamic Compilation

24 Partial Dead Code Example
if (rare branch 1){ ... z = x + y; } if (rare branch 2){ a = x + z; if (rare branch 1) { x = 0; ... z = x + y; } if (rare branch 2) { a = x + z; Here is a really simple example. X is defined on the main path, but only used in rare paths. Therefore, the algorithm will copy the instruction to the rare blocks … Advanced Compilers L9. Dynamic Compilation

25 VI. Escape Analysis Escape analysis finds objects that do not escape a method or a thread. “Captured” by method: can be allocated on the stack or in registers. “Captured” by thread: can avoid synchronization operations. All Java objects are normally heap allocated, so this is a big win. We also modified our pointer and escape analysis to support rare block information. Now, treating an entrance to the rare path as a method call is a conservative assumption. The method call could perform arbitrary operations, much more than the original code. Now typically, this does not matter because there are no merges Advanced Compilers L9. Dynamic Compilation

26 Escape Analysis Stack allocate objects that don’t escape in the common blocks. Eliminate synchronization on objects that don’t escape the common blocks. If a branch to a rare block is taken: Copy stack-allocated objects to the heap and update pointers. Reapply eliminated synchronizations. Advanced Compilers L9. Dynamic Compilation

27 VII. Run Time Improvement
This graph shows the run time improvement by using our technique. Each benchmark has three bars. The first bar is the original run time; these are all normalized to 100%. The second bar shows the run time using our partial method compilation technique. The third bar adds the two optimizations that I described earlier. The partial method compilation technique by itself improves run times by an average of 5%. Adding the two optimizations I described earlier improves First bar: original (Whole method opt) Second bar: Partial Method Comp (PMC) Third bar: PMC + opts Bottom bar: Execution time if code was compiled/opt. from the beginning Advanced Compilers L9. Dynamic Compilation


Download ppt "Lecture 9 Dynamic Compilation"

Similar presentations


Ads by Google