Download presentation
Presentation is loading. Please wait.
1
HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000
2
5/1/2000Zhanyong Wan2 Sources of Information 4 From Sun’s web-site –HotSpot white paper http://java.sun.com/products/hotspot/whitepaper.html –Various articles on Sun’s web-site http://java.sun.com/products/hotspot/ 4 From other web-sites –Java on Steroids: Sun's High-Performance Java Implementation, U. Hölzle et.al. (slides from HotChips IX, August 1997) http://www.cs.ucsb.edu/oocsb/papers/HotChips.pdf –The HotSpot Virtual Machine, Bill Venners http://www.artima.com/designtechniques/hotspot.html –HotSpot: A new breed of virtual machine, Eric Amstrong http://www.javaworld.com/jw-03-1998/f_jw-03-hotspot.html
3
5/1/2000Zhanyong Wan3 Overview 4 Why Java is different 4 Why JIT is not good enough 4 What HotSpot does 4 The HotSpot architecture –Memory model –Thread model –Adaptive optimization 4 Conclusions
4
5/1/2000Zhanyong Wan4 History 4 1 st generation JVM –Purely interpreting –30 - 50 times slower than C++ 4 2 nd generation JVM –JIT compilers –3 - 10 times slower than C++ 4 Static compilers –Better performance than JIT’s
5
5/1/2000Zhanyong Wan5 The Future? 4 HotSpot –Dynamic, fully optimizing compiler –Close-to-C++ performance –May even exceed the speed of C++ in the future
6
5/1/2000Zhanyong Wan6 Questions of Interest 4 How is it possible that HotSpot runs programs faster than the native code generated by a static optimizing Java compiler? 4 How does HotSpot score? (The collection of technologies used by HotSpot.) 4 Where did they get the ideas? 4 Which of these technologies also apply in other systems (e.g. JIT, static source code/bytecode compiler, C++)? 4 Can Java be made to surpass the performance of C++, or is this a hype?
7
5/1/2000Zhanyong Wan7 Why Java Is Different (to C++) 4 Granularity of factoring –Smaller classes –Smaller methods –More frequent calls –Standard compiler analysis fails 4 Dynamic dispatch –Slower calls for virtual functions –Much more frequent than in C++ 4 Sophisticated run-time system –Allocation, garbage collection –Threads, synchronization 4 Dynamically changing program –Classes loaded/discarded on the fly
8
5/1/2000Zhanyong Wan8 Why Java Is Different (cont’d) 4 Distributed in a portable form –A compiler can generate optimal machine code for a particular processor version e.g. Pentium vs. Pentium II –Welcomes dynamic compilation (developed in the last decade)!
9
5/1/2000Zhanyong Wan9 Find the Java Bottleneck 4 Time used in a typical Java program executed w/ JDK interpreter: –Allocation/GC: 1/6 –Synchronization: 1/6 –Byte code: 2/3 –Native methods: negligible 4 Performance critical code: the “hot spots”
10
5/1/2000Zhanyong Wan10 Why JIT Is Not Good Enough 4 Compiles on method-by-method basis when a method is first invoked 4 Compilation consumes “user time” –Startup latency –Dilemma: either good code or fast compiler Gains of better optimization may not justify extra compile time More concerned w/ generating code quickly than w/ generating the quickest code 4 Root of problem: compilation is too eager
11
5/1/2000Zhanyong Wan11 The Baaad Way to Optimize 4 People try to help: the optimization lore –Make methods final or static –Large classes/methods –Avoid interfaces (interface method invocation much slower than regular dynamic method dispatch) –Avoid creating lots of short-lived objects –Avoid synchronization (very expensive) –Against good OO design! 4 “Premature optimization is the root of all evil.” (Donald Knuth)
12
5/1/2000Zhanyong Wan12 The HotSpot Way to Optimize 4 Optimize only when you know you have a problem 1.A program starts off being interpreted 2.A profiler collects run-time info in the background 3.After a while, a set of hot spots is identified 4.A thread is launched to compile the methods in the hot spots Execution of the program is *not* blocked “Take your time!” – fully optimizing Take advantage of the late compilation: run-time info used 5.Once a method is compiled, it doesn’t need to be interpreted 6.Native code can be discarded when the hot spots change Keeping the footprint small Bytecode is always kept around
13
5/1/2000Zhanyong Wan13 The HotSpot Way (cont’d) 4 Tackles each of the bottlenecks –Adaptive optimization –Fast, accurate garbage collection –Fast thread synchronization 4 Performance –2-3 times faster than JITs –Comparable to C++ 4 Most importantly, eliminates the “performance excuse” for poor designs/code
14
5/1/2000Zhanyong Wan14 The HotSpot Architecture 4 Memory model 4 Thread model 4 Adaptive compiler
15
5/1/2000Zhanyong Wan15 The HotSpot Memory Model 4 Object references –Java 2 SDK: as indirect handles Relocating objects made easy A significant performance bottleneck –HotSpot: as direct pointers A performance boost GC must adjust all reference to an object when it is relocated 4 Object headers –Java 2 SDK: 3-word –HotSpot: 2-word 2 bits for GC mark (reference count removed?) An 8% savings in heap size
16
5/1/2000Zhanyong Wan16 Garbage Collection Background 4 GC traditionally considered inefficient –Takes 1/6 of the time in an interpreting JVM –Even worse in a JIT VM 4 Modern GC technology –Performs substantially better than explicit freeing –How can this be true? Unnecessary copies avoided Memory segmentation, space locality
17
5/1/2000Zhanyong Wan17 The HotSpot Garbage Collector 4 A high-level GC framework –New collection algorithms can be “plugged-in” –Currently has 3 cooperating GC algorithms 4 Major features –Fast allocation and reclamation –Fully accurate: guarantees full memory reclamation –Completely eliminates memory fragmentation –Incremental, no perceivable pauses (usually < 10ms) –Small memory overhead 2-bit GC mark per object 2-word object header (instead of 3- in Java 2 SDK)
18
5/1/2000Zhanyong Wan18 The HotSpot GC: Accuracy 4 A partially accurate (conservative) collector must –Either avoid relocating objects –Or use handles to refer indirectly to objects (slow) 4 The HotSpot collector –Fully accurate –All inaccessible objects can be reclaimed –All objects can be relocated Eliminates memory fragmentation Increases memory locality
19
5/1/2000Zhanyong Wan19 The HotSpot GC: the Structure 4 Three cooperating collectors –A generational copying collector For short-lived objects –A mark-compact “old object” collector For longer-lived objects when the live object set is small –An incremental “pauseless” collector For longer-lived objects when the live object set is big
20
5/1/2000Zhanyong Wan20 Generational Copying Collector 4 Observation: the vast majority (often > 95%) of the objects are very short-lived 4 The way it works –A memory area is reserved as an object “nursery” –Allocation is just updating a pointer and checking for overflow: extremely fast –By the time the nursery overflows, most objects in it are dead; the collector just moves the few survivors to the “old object” memory area
21
5/1/2000Zhanyong Wan21 Mark-Compact Collector 4 Rare case –Triggered by low-memory conditions or programmatic requests 4 Time proportional to the size of the set of live objects –Calls for an incremental collector when the size is large
22
5/1/2000Zhanyong Wan22 Incremental Pauseless Collector 4 An alternative to the mark-compact collector 4 Relatively constant pause time even w/ extremely large data set 4 Suitable for server applications and soft-real time applications (games, animations) 4 The way it works –The “train” algorithm –Breaks up GC pauses into tiny pauses –Not a hard-real time algorithm: no guarantee for upper limit on pause times 4 Side-benefit: better memory locality –Tends to relocate tightly-coupled objects together
23
5/1/2000Zhanyong Wan23 The HotSpot Thread Model 4 Native thread support –Currently supports Solaris & 32bit Windows –Preemption –Multiprocessing 4 Per-thread activation stack is shared w/ native methods –Fast calls between C and Java
24
5/1/2000Zhanyong Wan24 Thread Synchronization 4 takes 1/6 of the time in an interpreting JVM –(I think) the proportion can be even higher for a JIT 4 HotSpot’s thread synchronization –Ultra-fast (“a breakthrough”) –Constant time for all uncontended (no rival) synch –Fully scalable to multiprocessor –Makes fine-grain synch practical, encouraging good OO design
25
5/1/2000Zhanyong Wan25 Adaptive Inlining 4 Method invocations reduce the effectiveness of optimizers –Standard optimizers don’t perform well across method boundaries (need bigger block of code) –Inlining is the solution 4 Inlining has problems –Increased memory foot-print –Inlining is harder w/ OO languages because of dynamic dispatching (worse in Java than in C++) 4 HotSpot uses run-time information to –Inline only the critical methods –Limit the set of methods that might be invoked at a certain point
26
5/1/2000Zhanyong Wan26 Dynamic Deoptimization 4 Simple inlining may violate the Java semantics –A program can change the patterns of method invocation –Java program can change on the fly via dynamic class loading/discarding –Optimizations may become invalid 4 Must be able to deoptimize dynamically! –HotSpot can deoptimize (revert back to bytecode?) a hot spot even during the execution of the code for it.
27
5/1/2000Zhanyong Wan27 Fully Optimizing Compiler 4 Performs all the classic optimizations –Dead code elimination –Loop invariant hoisting –Common sub-expression elimination –Constant propagation –And more … 4 Java-specific optimizations –Null-check elimination –Range-check elimination 4 Global graph coloring register allocator 4 Highly portable –Relying on a small machine description file
28
5/1/2000Zhanyong Wan28 Transparent Debugging & Profiling Semantics 4 Native code generation & optimization fully transparent to the programmer –Uses two stacks One real, one simulating –Overhead of two stacks? 4 Pure bytecode semantics: easy debugging & profiling 4 Question: what’s the point of a transparent profiling semantics?
29
5/1/2000Zhanyong Wan29 Performance Evaluation 4 Micro-benchmarks: not the way –No or few method calls/synchronizations –Small live data set –No correlation w/ real programs –Give unrealistic results for HotSpot 4 SPEC JVM98 benchmark –The only industry-standard benchmark for Java –Predictive of the performance across a number of real applications
30
5/1/2000Zhanyong Wan30 Where are the ideas from? 4 Mostly from the last decade’s academic work –Dynamic compilation –Modern GC –HotSpot puts them together 4 Academic research is relevant!
31
5/1/2000Zhanyong Wan31 (My) Conclusions 4 HotSpot is great –Many new technologies previously only seen in academia 4 Java performance may come close to or exceed the current implementation of C++ 4 However Sun’s argument that Java can be faster than C++ is not convincing yet: –C++ has better control on machine resources –Many technologies used in HotSpot can be exploited for C++ as well. Especially: Fast synchronization Dynamic compilation Maybe GC (for some dialects of C++) –Whether Java can exceed C++ remains to be tested
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.