Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003.

Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

“That’s Why They Play the Game” Programs are executed because we can’t determine their behavior statically! Idea: Optimize programs dynamically to take advantage of runtime information we can’t get statically  Look at portions of the program for predictable inputs that we can optimize for

Specialization Recompile portions of the program, using known runtime values as constants  Possibly many variants of the same code  Allow for fallback to original code when assumptions are not met  Predictable == recurrent Generic UnpredictablePredictable GP2P3P4 Unpredictable Predictable

LOAD pcX = … How It Works Chose a good region of code to specialize: after a good predictable instruction Insert dispatch that checks the result of the chosen instruction Recompile code for different results of the instruction During execution, jump to appropriate specialized code Dispatch(X) Spec1Spec2Default …… Dispatch(X) Spec1Spec2Default …… Dispatch(X) Spec1Spec2Default … Rest of Code …

Tying Things Together If Foo is specialized on X And because of X, Y is constant And Foo calls Bar with param Y And Bar is specialized on Y Foo can jump straight to that specialized version of Bar Dispatch Spec_X Bar(Y) Method Foo Dispatch Spec_Y … Method Bar Spec_Z …

When Is This a Good Idea? Any app whose execution is heavily dependent on input For instance  Interpreters  Raytracers  Dynamic content producers (CGI scripts, etc.)

Specialization Is Hard! Specializing code at runtime is costly  Can even slow the program down Existing specializers rely on static annotations to clue them in about profitable areas  Difficult to get right  Limits specialization potential

Existing: DyC, Cyclone, etc. Explicitly annotate static data No support for automatic specialization of frequently-executed code  Could compile lots of useless stuff No concrete store information  Doesn’t take advantage of the fact that memory location X is constant for the lifetime of the program

Existing: Calpa Mock, et al, 2000. Extension to DyC. Profile execution on sample input to derive annotations But converting a concrete profile to an abstract annotation means  Still unable to detect concrete memory constants  Frequently executed code for arbitrary input? Still needs source, is offline!

Motivating Example: Interpreter while(1) { i = instrs[pc]; switch(instr.opcode) { case ADD: env[i.res] = env[i.op1] + env[i.op2]; pc++; break; case BNEQ; if (env[i.op1] != 0) pc = env[i.op2]; else pc++; break;... } Sample interpreted program: X = 10; … WHILE (Z != 0) { Y = X+Z; … } X is constant after initialization concrete memory location Y = X+Z executed frequently

Motivating Example: Interpreter while(1) { i = instrs[pc]; switch(instr.opcode) { case ADD: env[i.res] = env[i.op1] + env[i.op2]; pc++; break; case BNEQ; if (env[i.op1] != 0) pc = env[i.op2]; else pc++; break;... } Sample interpreted program: X = 10; … WHILE (Z != 0) { Y = X+Z; … } while(1) { while (pc == 15) { // Y = X + Z env[3] = 10 + env[2]; … // Z != 0 ? if (env[2] == 0) pc = 19; } else { // normal loop }

A More Concrete Approach Do everything at runtime! Specialize on execution-time hot values Know which concrete memory locations are constant Other benefits of this approach:  Specialize temporally, as execution progresses  Specialize dynamically loaded libraries as well  No annotations or source code necessary

LOAD pcX = … LOAD pc A Quick Recap Chose a good region of code to specialize Insert dispatch that checks the result of the chosen instruction (the “trigger”) Recompile code for different values of a hot instruction During execution, jump to appropriate specialized code Dispatch(X) Spec1Spec2Default …… Dispatch(X) Spec1Spec2Default …… Dispatch(X) Spec1Spec2Default …… Dispatch(pc) pc=15pc=27while(1) Rest of Code …

The Details Need to identify the best predictable instruction  Specializing on its result should provide the greatest benefit  To find it, gather profile information about all instructions Need to actually do the specializing

Instrumentation: Hot Values What’s a hot value? One that occurs frequently as the result of an instruction  x % 2 has two very hot values, 0 and 1 Good candidate instructions are predictable: result in (only) a few hot values  For instance, small_constant_table[x], but not rand(x) Case study: Interpreter  Predictable instructions: LOAD pc, instr.opcode instr = instrs[pc]; switch(instr.opcode) { … }

Instrumentation: Store Profile Keep track of memory locations that have been written to Idea: if a location hasn’t been written to yet, it probably won’t be later, either Case study: Interpreter  Store profile says env[Y] written to a lot, but env[X], instrs[] never written to regs[instr.res] = regs[instr.op1] + regs[instr.op2];

Invalidating Specialized Code Memory locations may not really be constant When ‘constant’ memory is overwritten, must invalidate or modify specializations that depended on it How does Calpa handle invalidation?  Computes points-to set  Inserts invalidation calls at all appropriate points (offline)  Too costly an approach, without modification

Invalidation Options Write barrier  Still feasible if field is private On-entry checks  Feasible if specialization depends on a small number of memory locations  e.g. Factor(BigInt x) Hardware support  e.g. Mondrian  Ideal solution  Possible to simulate? Class Interpreter { private Instruction[] instrs; void SetInstrs(Instruction[] is) { instrs = is; } Dispatch Spec1Default Hot Instruction CheckMem Invalidate

Specialization Procedure Recap: We know which instructions are good candidates, what their hot values are, and what parts of memory are likely to be invariant Want to compile different versions of the same block of code relative to a chosen trigger instruction Each version is keyed on a hot value of that instruction What instruction, if any, should be a basis for specialization?

Specialization Algorithm 1. Find good candidate instructions Predictable Frequently executed 2. For each candidate instruction Simultaneously evaluate method using constant propagation for some of its hot values Compute overall cost/benefit 3. Choose the best instruction

Algorithm Pseudo-code foreach(value v in hot values) worklist.push( ); previously_emitted = [ ]; while ( = pop worklist) { = evaluate( ); // uses store information, fixes jumps foreach (n'' in succ(n')) { // have we already seen this node/state pair before? prev_instr = previously_emitted[ ]; if (prev_instr) {// if so, link to it n'.modify_jump_to(n''->prev_instr); } else {// otherwise, keep evaluating worklist.push( ); } instr = emit_instruction(n'); // remember this pair in case we see it again previously_emitted[ ] = instr; }

Specializing the Interpreter while(1) { i = instrs[pc]; switch(instr.opcode) { case ADD: env[i.res] = env[i.op1] + env[i.op2]; pc++; break; case BNEQ; if (env[i.op1] != 0) pc = env[i.op2]; else pc++; break;... } } Instr.opcode: Executed very frequently A small handful of values pc: Executed very frequently More values, but still reasonable Candidates:

switch(ADD) Specializing on instr.opcode LOOP: i = instrs[pc] case ADD: switch(i.opcode) …… env[i.res] = env[i.op1]+env[i.op2] pc = pc + 1 goto LOOP switch(ADD) case ADD: benefit = 1 env[i.res] = env[i.op1]+env[i.op2] pc = pc + 1 goto LOOP LOOP: i = instrs[pc] benefit = 3 benefit = 2 i.opcode = ADD Dispatch(opcode) {} Other values of opcode have similar results…

LOOP: i = instrs[15] Specializing on pc LOOP: i = instrs[pc] case ADD: switch(i.opcode) …… env[i.res] = env[i.op1]+env[i.op2] pc = pc + 1 goto LOOP LOOP: i = instrs[15] switch(ADD) case ADD: env[Y] = 10 + env[Z] pc = 15 + 1 LOOP: i = instrs[16] switch(BNEQ) if (env[Z] != 0) pc = 15 pc = 15 ; i = ADD Y, X, Z pc = 16 ; i = ADD Y, X, Z pc = 16 ; i = BNEQ Z, 15 pc++; … Dispatch(pc) benefit = 1 benefit = 2 benefit = 3 benefit = 6 benefit = 7 benefit = 8 benefit = 9 benefit = 10 benefit = … Y = X + Z pc = 16 ; i = BNEQ Z, 15

Final Result Choose to specialize on pc because benefit is far greater than for instr.opcode Generate different versions for each of the hottest values of pc Terminate loop unrolling either naturally (when we don’t know what pc is anymore) or with a simple heuristic

Heuristics Algorithm may not terminate when unrolling loops  Simple heuristic: widen variables when we’ve seen the same node, say, 10 times (or use frequency statistics) Algorithm may generate lots of code  Need to only look at parts of state that matter  Widen somewhere… Other issues: Algorithm may be slow  Need better way to prune off bad candidates

Implementation Ideas Use Dynamo  Hot trace as basis for specialization  Intuitively, follow the lifetime of an object as it travels through the program across function boundaries  Unfortunately, closed-source, and API isn’t expressive enough

Implementation Ideas JikesRVM  Java VM written in Java  Has a primitive framework for sampling  Has a fairly sophisticated framework for dynamic recompilation  Does aggressive inlining  Only instrument hot traces (but compiler is slow…)

Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003.

Similar presentations

Presentation on theme: "Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003.

Similar presentations

Presentation on theme: "Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003."— Presentation transcript:

Similar presentations

About project

Feedback