Presentation is loading. Please wait.

Presentation is loading. Please wait.

HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000.

Similar presentations


Presentation on theme: "HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000."— Presentation transcript:

1 HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000

2 5/1/2000Zhanyong Wan2 Sources of Information 4 From Sun’s web-site –HotSpot white paper http://java.sun.com/products/hotspot/whitepaper.html –Various articles on Sun’s web-site http://java.sun.com/products/hotspot/ 4 From other web-sites –Java on Steroids: Sun's High-Performance Java Implementation, U. Hölzle et.al. (slides from HotChips IX, August 1997) http://www.cs.ucsb.edu/oocsb/papers/HotChips.pdf –The HotSpot Virtual Machine, Bill Venners http://www.artima.com/designtechniques/hotspot.html –HotSpot: A new breed of virtual machine, Eric Amstrong http://www.javaworld.com/jw-03-1998/f_jw-03-hotspot.html

3 5/1/2000Zhanyong Wan3 Overview 4 Why Java is different 4 Why JIT is not good enough 4 What HotSpot does 4 The HotSpot architecture –Memory model –Thread model –Adaptive optimization 4 Conclusions

4 5/1/2000Zhanyong Wan4 History 4 1 st generation JVM –Purely interpreting –30 - 50 times slower than C++ 4 2 nd generation JVM –JIT compilers –3 - 10 times slower than C++ 4 Static compilers –Better performance than JIT’s

5 5/1/2000Zhanyong Wan5 The Future? 4 HotSpot –Dynamic, fully optimizing compiler –Close-to-C++ performance –May even exceed the speed of C++ in the future

6 5/1/2000Zhanyong Wan6 Questions of Interest 4 How is it possible that HotSpot runs programs faster than the native code generated by a static optimizing Java compiler? 4 How does HotSpot score? (The collection of technologies used by HotSpot.) 4 Where did they get the ideas? 4 Which of these technologies also apply in other systems (e.g. JIT, static source code/bytecode compiler, C++)? 4 Can Java be made to surpass the performance of C++, or is this a hype?

7 5/1/2000Zhanyong Wan7 Why Java Is Different (to C++) 4 Granularity of factoring –Smaller classes –Smaller methods –More frequent calls –Standard compiler analysis fails 4 Dynamic dispatch –Slower calls for virtual functions –Much more frequent than in C++ 4 Sophisticated run-time system –Allocation, garbage collection –Threads, synchronization 4 Dynamically changing program –Classes loaded/discarded on the fly

8 5/1/2000Zhanyong Wan8 Why Java Is Different (cont’d) 4 Distributed in a portable form –A compiler can generate optimal machine code for a particular processor version e.g. Pentium vs. Pentium II –Welcomes dynamic compilation (developed in the last decade)!

9 5/1/2000Zhanyong Wan9 Find the Java Bottleneck 4 Time used in a typical Java program executed w/ JDK interpreter: –Allocation/GC: 1/6 –Synchronization: 1/6 –Byte code: 2/3 –Native methods: negligible 4 Performance critical code: the “hot spots”

10 5/1/2000Zhanyong Wan10 Why JIT Is Not Good Enough 4 Compiles on method-by-method basis when a method is first invoked 4 Compilation consumes “user time” –Startup latency –Dilemma: either good code or fast compiler Gains of better optimization may not justify extra compile time More concerned w/ generating code quickly than w/ generating the quickest code 4 Root of problem: compilation is too eager

11 5/1/2000Zhanyong Wan11 The Baaad Way to Optimize 4 People try to help: the optimization lore –Make methods final or static –Large classes/methods –Avoid interfaces (interface method invocation much slower than regular dynamic method dispatch) –Avoid creating lots of short-lived objects –Avoid synchronization (very expensive) –Against good OO design! 4 “Premature optimization is the root of all evil.” (Donald Knuth)

12 5/1/2000Zhanyong Wan12 The HotSpot Way to Optimize 4 Optimize only when you know you have a problem 1.A program starts off being interpreted 2.A profiler collects run-time info in the background 3.After a while, a set of hot spots is identified 4.A thread is launched to compile the methods in the hot spots Execution of the program is *not* blocked “Take your time!” – fully optimizing Take advantage of the late compilation: run-time info used 5.Once a method is compiled, it doesn’t need to be interpreted 6.Native code can be discarded when the hot spots change Keeping the footprint small Bytecode is always kept around

13 5/1/2000Zhanyong Wan13 The HotSpot Way (cont’d) 4 Tackles each of the bottlenecks –Adaptive optimization –Fast, accurate garbage collection –Fast thread synchronization 4 Performance –2-3 times faster than JITs –Comparable to C++ 4 Most importantly, eliminates the “performance excuse” for poor designs/code

14 5/1/2000Zhanyong Wan14 The HotSpot Architecture 4 Memory model 4 Thread model 4 Adaptive compiler

15 5/1/2000Zhanyong Wan15 The HotSpot Memory Model 4 Object references –Java 2 SDK: as indirect handles Relocating objects made easy A significant performance bottleneck –HotSpot: as direct pointers A performance boost GC must adjust all reference to an object when it is relocated 4 Object headers –Java 2 SDK: 3-word –HotSpot: 2-word 2 bits for GC mark (reference count removed?) An 8% savings in heap size

16 5/1/2000Zhanyong Wan16 Garbage Collection Background 4 GC traditionally considered inefficient –Takes 1/6 of the time in an interpreting JVM –Even worse in a JIT VM 4 Modern GC technology –Performs substantially better than explicit freeing –How can this be true? Unnecessary copies avoided Memory segmentation, space locality

17 5/1/2000Zhanyong Wan17 The HotSpot Garbage Collector 4 A high-level GC framework –New collection algorithms can be “plugged-in” –Currently has 3 cooperating GC algorithms 4 Major features –Fast allocation and reclamation –Fully accurate: guarantees full memory reclamation –Completely eliminates memory fragmentation –Incremental, no perceivable pauses (usually < 10ms) –Small memory overhead 2-bit GC mark per object 2-word object header (instead of 3- in Java 2 SDK)

18 5/1/2000Zhanyong Wan18 The HotSpot GC: Accuracy 4 A partially accurate (conservative) collector must –Either avoid relocating objects –Or use handles to refer indirectly to objects (slow) 4 The HotSpot collector –Fully accurate –All inaccessible objects can be reclaimed –All objects can be relocated Eliminates memory fragmentation Increases memory locality

19 5/1/2000Zhanyong Wan19 The HotSpot GC: the Structure 4 Three cooperating collectors –A generational copying collector For short-lived objects –A mark-compact “old object” collector For longer-lived objects when the live object set is small –An incremental “pauseless” collector For longer-lived objects when the live object set is big

20 5/1/2000Zhanyong Wan20 Generational Copying Collector 4 Observation: the vast majority (often > 95%) of the objects are very short-lived 4 The way it works –A memory area is reserved as an object “nursery” –Allocation is just updating a pointer and checking for overflow: extremely fast –By the time the nursery overflows, most objects in it are dead; the collector just moves the few survivors to the “old object” memory area

21 5/1/2000Zhanyong Wan21 Mark-Compact Collector 4 Rare case –Triggered by low-memory conditions or programmatic requests 4 Time proportional to the size of the set of live objects –Calls for an incremental collector when the size is large

22 5/1/2000Zhanyong Wan22 Incremental Pauseless Collector 4 An alternative to the mark-compact collector 4 Relatively constant pause time even w/ extremely large data set 4 Suitable for server applications and soft-real time applications (games, animations) 4 The way it works –The “train” algorithm –Breaks up GC pauses into tiny pauses –Not a hard-real time algorithm: no guarantee for upper limit on pause times 4 Side-benefit: better memory locality –Tends to relocate tightly-coupled objects together

23 5/1/2000Zhanyong Wan23 The HotSpot Thread Model 4 Native thread support –Currently supports Solaris & 32bit Windows –Preemption –Multiprocessing 4 Per-thread activation stack is shared w/ native methods –Fast calls between C and Java

24 5/1/2000Zhanyong Wan24 Thread Synchronization 4 takes 1/6 of the time in an interpreting JVM –(I think) the proportion can be even higher for a JIT 4 HotSpot’s thread synchronization –Ultra-fast (“a breakthrough”) –Constant time for all uncontended (no rival) synch –Fully scalable to multiprocessor –Makes fine-grain synch practical, encouraging good OO design

25 5/1/2000Zhanyong Wan25 Adaptive Inlining 4 Method invocations reduce the effectiveness of optimizers –Standard optimizers don’t perform well across method boundaries (need bigger block of code) –Inlining is the solution 4 Inlining has problems –Increased memory foot-print –Inlining is harder w/ OO languages because of dynamic dispatching (worse in Java than in C++) 4 HotSpot uses run-time information to –Inline only the critical methods –Limit the set of methods that might be invoked at a certain point

26 5/1/2000Zhanyong Wan26 Dynamic Deoptimization 4 Simple inlining may violate the Java semantics –A program can change the patterns of method invocation –Java program can change on the fly via dynamic class loading/discarding –Optimizations may become invalid 4 Must be able to deoptimize dynamically! –HotSpot can deoptimize (revert back to bytecode?) a hot spot even during the execution of the code for it.

27 5/1/2000Zhanyong Wan27 Fully Optimizing Compiler 4 Performs all the classic optimizations –Dead code elimination –Loop invariant hoisting –Common sub-expression elimination –Constant propagation –And more … 4 Java-specific optimizations –Null-check elimination –Range-check elimination 4 Global graph coloring register allocator 4 Highly portable –Relying on a small machine description file

28 5/1/2000Zhanyong Wan28 Transparent Debugging & Profiling Semantics 4 Native code generation & optimization fully transparent to the programmer –Uses two stacks One real, one simulating –Overhead of two stacks? 4 Pure bytecode semantics: easy debugging & profiling 4 Question: what’s the point of a transparent profiling semantics?

29 5/1/2000Zhanyong Wan29 Performance Evaluation 4 Micro-benchmarks: not the way –No or few method calls/synchronizations –Small live data set –No correlation w/ real programs –Give unrealistic results for HotSpot 4 SPEC JVM98 benchmark –The only industry-standard benchmark for Java –Predictive of the performance across a number of real applications

30 5/1/2000Zhanyong Wan30 Where are the ideas from? 4 Mostly from the last decade’s academic work –Dynamic compilation –Modern GC –HotSpot puts them together 4 Academic research is relevant!

31 5/1/2000Zhanyong Wan31 (My) Conclusions 4 HotSpot is great –Many new technologies previously only seen in academia 4 Java performance may come close to or exceed the current implementation of C++ 4 However Sun’s argument that Java can be faster than C++ is not convincing yet: –C++ has better control on machine resources –Many technologies used in HotSpot can be exploited for C++ as well. Especially: Fast synchronization Dynamic compilation Maybe GC (for some dialects of C++) –Whether Java can exceed C++ remains to be tested


Download ppt "HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000."

Similar presentations


Ads by Google