1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp.

1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp

2 Overview ■ Toolchain JOP, JIT vs. ahead-of-time compilation ■ Existing open source tools ■ JOPtimizer framework and code representations ■ Inlining ■ Results

3 Toolchain Overview ■ Sourcecode compiled with javac to Java bytecode ■ Optimization defered to JVM, profiling information and JIT compiler is used ■ Not feasable on embedded processors like JOP

4 Toolchain Overview ■ Ahead-of-timeoptimization needed ■ Optimization of bytecode for target platform ■ Output is Java bytecode ■ Profiling vs. static WCET

5 Toolchain Overview ■ Advantages over JIT  Runtime is not critical  No warm-up phase to gather profiling information and to do JIT compiling ■ Disadvantages  Less accurate/no profiling information available at design-time, class hierarchy may change dynamically  Target platform must be known

6 Existing Tools ■ Soot framework looks promising, but not designed for embedded systems and very complex ■ Other open source tools usually only remove unused methods and obfuscate code

7 JOPtimizer ■ JOPtimizer: a new framework for optimizations  Intermediate code representations  Inlining which respects method size restrictions

8 Assumtions ■ Assumptions about embedded applications  No dynamic class loading or class modifications at runtime  Reflection is not used  All class files are available at compile-time (except native classes) ■ Allows more optimizations (but assumtions can be disables) ■ Exclude “library” code (like java.*)  Define: library classes must not extend/reference application classes

9 Java Class Files ■ A class consists of:  ConstantPool: indexed table of constants (numbers, Strings, class names, method names, signatures,.. )  Classname, super-class, interfaces (references to CP)  Fields, methods: name, signature, flags  Method code as attribute of methods  Stack architecture with variable length encoding ■ Parsing and compiling of classfiles done by existing Libraries (BCEL, ASM,...)

10 The JVM Instruction Set ■ (partially) typed stack instructions ■ 32bit (int, float, reference, byte, short,..) and 64bit (long, double) variables ■ exception-handling, synchronization, subroutines ■ Stack- and variable table entries always 32bit ■ No indirect jumps, stack size must be static private Map m; private void test(int i) { int j[] = new int[2]; float a = 2.0f; j[0] = i * (int) a; m.put(this, j); } private test(I)V ICONST_2 NEWARRAY T_INT ASTORE 2 FCONST_2 FSTORE 3 ALOAD 2 ICONST_0 ILOAD 1 FLOAD 3 F2I IMUL IASTORE ALOAD 0 GETFIELD #4 ALOAD 0 ALOAD 2 INVOKEINTERFACE #7 POP RETURN

11 Stackcode Representation ■ Internal representation (“stackcode”)  Types and constant values as parameters of instructions to reduce number of different instructions (~40 stackcode instructions)  Stack emulation to determine operand types for all instructions (swap, dup,..)  Variables and types instead of 32-bit slots  Constant values instead of references into CP  split basic blocks at exception handler ranges too ■ Still a stack architecture ■ Stackcode can be mapped directly to bytecode (allows analysis of code size and execution time)

12 Quadcode Representation ■ Stack creates implicit dependencies between instructions and blocks, makes optimizations more complex ■ Quadruple form of code (“quadcode”)  Create local variable per stack slot, emulate stack to determine the arguments of instructions  Instructions with types and constants as parameter  Instructions to manupulate stack not needed (pop) or replaced with copy instructions (load, swap, dup,..)

13 Quadcode Representation ■ Quadcode representation enables simpler implementation of optimizations, but code cannot be mapped to bytecode directly ■ Stackcode and Quadcode similar to Soot internal representations (Baf, Jimple, Shimple) public int calc(int a, int b) { copy.ref s0, l0 // load.ref l0// aload_0 getfield.'Test.fField' s0, s0 // getfield 'Test.fField'// getfield #3 copy.float s1, 2.0f // push.float 2.0f// fconst_2 binop.float.div s0, s0, s1 // binop.float.div// fdiv copy.float l3, s0 // store.float l3// fstore_3 copy.int s0, l1 // load.int l1// iload_1 return.int s0 // return.int// ireturn }

14 Creation of Bytecode ■ Transformation back from quadcode to bytecode  Create complete expressions from instructions (“Decompile” code), compile expression trees to JVM instructions like javac (Soot does this (Grimp))  Create stack form of quadruple instructions, compile to bytecode (JOPtimizer does this, optional in Soot)  Per quadcode instruction: load parameters on stack, execute operation and store result back ■ load/store elimination and local variable allocation for stackcode needed before bytecode can be created ■ Decompilation method of Soot gets slightly better results

15 Inlining ■ Invocations are expensive on JOP ■ Inline methods to eliminate invokation overhead ■ Inlining is not always possible  Callee code restrictions  Code size and variable table size restrictions of JOP ■ Inlining comes at a price  Caller code size increases, makes caller cache miss more costly  Overall program size increases if callee is not removed (p.e. is called somewhere else)

16 Inlining methods 1. Traverse callgraph bottom-up (leaves first) 2. Find and devirtualize invocations  static, final, private invokations not virtual  Check class hierarchy for overloading methods 3. Check if inlining is admissible 4. Estimate gain 5. Replace invocation with copy of callee  insert nullpointer-check for callee class reference  map local variables of callee above caller variables

17 Inlining Checks ■ Inlining is not possible if  new code size or variable table size of caller exceeds platform limits  the callee uses exception handlers or synchronized code  throwing an exception clears the stack  stack of the caller needs to be saved and restored if an exception is handled within the inlined method (NYI in JOPtimizer)  the method or class is excluded from inlining by configuration (caller or callee, p.e. Native class)

18 Inlining Checks (cont.) ■ Check field- and method references in callee code  Must be accessible from caller  Else make field or method public if possible  Always possible for fields as they are not virtual in Java  All overloading methods must be made public too  If a private method is made public, all invocations have to be changed from invokespecial to invokevirtual (luckily only methods of callee class have to be searched)  Naming conflicts or dynamic class loading can prevent changes, thus preventing inlining class A public a() tmp = new C() invoke tmp.b() class B public b() if (v == null) invoke B.c() private c() class C extends B private c()

19 Inlining Checks (cont.) ■ Estimate gain of inlining  Depends on cache state  Possible degredation of performance if inlined method is seldom invoked  Calculate gain based on invocation frequency and cache state estimations  Decrease weight of callees with multiple call sites to reduce increase of application code size  Select method with highest (positive) weight for inlining ■ Add inlined invocations to inlining candidate list, repeat inlining (check with new codesize)

20 Benchmark Results ■ Inlining of stackcode, jbe @60Mhz ■ Inlining limited by maximal code size imposed by JOP's memory cache ■ Removing of unused code should be implemented

21 Inlining Improvements ■ Many improvements possible  Type analysis/callgraph thinning for better devirtualization  Better cache state and invocation frequency estimation (WCET-driven?)  Run optimizations to reduce code size prior to inlining  Allow inlining of synchronized code/exception handlers  Try to find invocations with highest gain application-wide ...

22 Summary ■ Optimizing code at runtime not feasible for (realtime) embedded systems ■ Existing open source tools not designed for embedded systems ■ Inlining implemented in JOPtimizer which takes target platform into account (code restrictions, caching,..), up to 14% speedup of JBE benchmark ■ load/store elimination and local variable allocation needed for further optimizations to be implemented ■ Still many improvements possible..

23 Q&A Thanks for your attention! Questions?

24 Transformations

1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp.

Similar presentations

Presentation on theme: "1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp.

Similar presentations

Presentation on theme: "1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp."— Presentation transcript:

Similar presentations

About project

Feedback