Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is cool in Java 8 and new in 9

Similar presentations


Presentation on theme: "What is cool in Java 8 and new in 9"— Presentation transcript:

1 What is cool in Java 8 and new in 9
From a VM engineer‘s perspective Tobias Hartmann Compiler Group – Java HotSpot Virtual Machine Oracle Corporation March 2017

2 About me Software engineer in the HotSpot JVM Compiler Team at Oracle
Based in Baden, Switzerland Master’s degree in Computer Science from ETH Zurich Worked on various compiler-related projects Currently working on future Value Type support for Java Preparing for JDK

3 General disclaimer Everything is subject to change, not a promise, don‘t rely on it

4 Outline Intro: The HotSpot Java Virtual Machine
Part 1: What's cool in Java 8 Background: JIT compilation in HotSpot Tiered Compilation Part 2: What's new in Java 9 Segmented Code Cache Compact Strings Ahead-of-time Compilation Emphasize that this talk is from a VM engineer‘s perspective

5 A typical computing platform
Application Software User Applications Spring Apache Sling I’m expecting to hear many interesting talks today about this part of the world. Below that is a collection of layers often referred to as “System software” System Software Java SE Java EE Java Virtual Machine Operating system Hardware

6 A typical computing platform
Application Software User Applications Spring Apache Sling Hopefully: Black box System Software Java SE Java EE Java Virtual Machine Operating system Hardware

7 A typical computing platform
Application Software User Applications Spring Apache Sling Very often: Grey box Developer’s using system-internal features (for optimization or just by curiosity). Today’s talk: Java Virtual Machine. More specifically: a particular implementation of it, HotSpot. And as we are compiler engineers, we’ll mostly talk about topics related to compilation. System Software Java SE Java EE Java Virtual Machine Operating system Hardware

8 Programming language implementation
C Linux ARM Compiler Standard libraries Debugger Memory management Solaris SPARC Compiler Standard libraries Debugger Memory management Compiler Standard libraries Debugger Memory management Compiler Standard libraries Debugger Memory management And you have to do this for every different language. For developers: the awareness of different platforms/architecture: extra effort without too much gain in productivity. So a nightmare. Language implementation Operating system Windows Linux Intel x86 Intel x86 Hardware

9 (Language) virtual machine
Programming language Java JavaScript Scala Python HotSpot VM Language virtual machine (at particular HotSpot). The VM offers all services required by a language implementation (e.g., memory management). From the developer’s perspective: there’s only one VM, no different platforms Also, HotSpot support many different programming languages, not only Java, but today’s talk is going to focus mostly on Java. Virtual machine Linux Mac OS X Solaris Operating system Windows PPC ARM SPARC Intel x86 Hardware

10 The VM: An application developer’s view
Java source code int i = 0; do { i++; } while (i < f()); Bytecodes 0: iconst_0 1: istore_1 2: iinc 5: iload_1 6: invokestatic f 9: if_icmplt 2 12: return compile HotSpot Java VM execute Ahead-of-time Using javac Instructions for an abstract machine Stack-based machine (no registers)

11 The VM: A VM engineer’s view
Bytecodes 0: iconst_0 1: istore_1 2: iinc 5: iload_1 6: invokestatic f 9: if_icmplt 2 12: return HotSpot Java VM Compilation system compile C1 C2 Compiled method produce Machine code Debug info Object maps Garbage collector manage The VM engineers world starts here Efficiency really depends on the VM Interpreter execute Heap Stack access

12 Outline Intro: The HotSpot Java Virtual Machine
Part 1: What's cool in Java 8 Background: JIT compilation in HotSpot Tiered Compilation Part 2: What's new in Java 9 Segmented Code Compact Strings Ahead-of-Time Compilation Conclusion

13 Interpretation vs. compilation in HotSpot
Template-based interpreter Generated at VM startup (before program execution) Maps a well-defined machine code sequence to every bytecode instruction Compilation system Speedup relative to interpretation: ~100X Two just-in-time compilers (C1, C2) Aggressive optimistic optimizations Bytecodes 0: iconst_0 1: istore_1 2: iinc 5: iload_1 6: invokestatic f 9: if_icmplt 2 12: return Machine code mov -0x8(%r14), %eax movzbl 0x1(%r13), %ebx inc %r13 mov $0xff40,%r10 jpmq *(%r10, %rbx, 8) Isn’t that fast enough? Load local variable 1 Dispatch next instruction

14 Ahead-of-time vs. just-in-time compilation
AOT: Before program execution JIT: During program execution Tradeoff: Resource usage vs. performance of generated code Performance Amount of compilation Bad performance due to interpretation Good performance due to good selection of compiled methods and applied optimizations Bad performance due to compilation overhead WITH JDK 8 javac bytecode compilation to bytecode In JDK 9 AOT compilation to native code Interpretation Compile everything

15 JIT compilation in HotSpot
Resource usage vs. performance Getting to the “sweet spot” Selecting methods to compile Selecting compiler optimizations Limitation: Single compiler in the VM (client or server) Starting with JDK 8: Both compilers enabled at the same time (Tiered Compilation)

16 1. Selecting methods to compile
Hot methods (frequently executed methods) Profile method execution # of method invocations, # of backedges A method’s lifetime in the VM Compiler’s optimistic assumptions proven wrong Deoptimization # method invocations > THRESHOLD1 # of backedges > THRESHOLD2 Interpreter Compiler (C1 or C2) Code cache Gather profiling information Compile bytecode to native code Store machine code

17 Example optimization:
Hot path compilation Control flow graph Generated code S1; S2; S3; if (x > 3) guard(x > 3) S1; S2; S3; S4; S8; S9; Deoptimize Not only selection on the method level but also selection on the bytecode level 10’000 True False S4; S5; S6; S7; S8; S9;

18 Example optimization:
Virtual call inlining Class hierarchy Method to be compiled class A { void bar() { S1; } void foo() { A a = create(); // return A or B a.bar(); } Compiler: Inline call? loaded ... and selection on the callees to be inlined Yes. class B extends A { void bar() { S2; } not loaded

19 Example optimization: Virtual call inlining
Class hierarchy Method to be compiled class A { void bar() { S1; } void foo() { A a = create(); // return A or B S1; } Compiler: Inline call? loaded Yes. Benefits of inlining Virtual call avoided Code locality Optimistic assumption: only A is loaded Note dependence on class hierarchy Deoptimize if hierarchy changes class B extends A { void bar() { S2; } not loaded

20 Example optimization: Virtual call inlining
Class hierarchy Method to be compiled class A { void bar() { S1; } void foo() { A a = create(); // return A or B a.bar(); } Compiler: Inline call? loaded No. class B extends A { void bar() { S2; } loaded

21 Deoptimization Compiler’s optimistic assumption proven wrong
Assumptions about class hierarchy Profile information does not match method behavior Switch execution from compiled code to interpretation Reconstruct state of interpreter at runtime Complex implementation Compiled code Possibly thrown away Possibly reprofiled and recompiled We have seen two optimizations that rely on assumptions that may be wrong at runtime

22 Performance effect of deoptimization
Follow the variation of a method’s performance Interpreted Compiled Interpreted Compiled Performance Time Reference from AOT part VM Startup Compilation Deoptimization Compilation

23 Background: JIT compilation in HotSpot
Resource usage vs. performance Getting to the “sweet spot” Selecting methods to compile Selecting compiler optimizations Limitation: Single compiler in the VM (client or server) Starting with JDK 8: Both compilers enabled at the same time (Tiered Compilation)

24 2. Selecting compiler optimizations
C1 compiler Limited set of optimizations Fast compilation Small footprint C2 compiler Aggressive optimistic optimizations High resource demands High-performance code Graal Experimental compiler Will be part of HotSpot for AOT in JDK 9 Client VM Tiered Compilation (enabled since JDK 8) Server VM

25 Outline Intro: The HotSpot Java Virtual Machine
Part 1: What's cool in Java 8 Background: JIT compilation in HotSpot Tiered Compilation Part 2: What's new in Java 9 Segmented Code Compact Strings Ahead-of-Time Compilation Conclusion

26 Tiered Compilation Introduced in JDK 7, enabled by default in JDK 8
Combines the benefits of Interpreter: Fast startup C1: Fast compilation C2: High peak performance Within the sweet spot Faster startup More profile information

27 Benefits of Tiered Compilation
Client VM (C1 only) warm-up time Performance Time Interpreted C1-compiled This is an artist’s concept! VM Startup Compilation

28 Benefits of Tiered Compilation
Server VM (C2 only) warm-up time Performance Interpreted C2-compiled VM Startup Time Compilation

29 Benefits of Tiered Compilation
warm-up time Performance Interpreted C1-compiled C2-compiled VM Startup Time Compilation Compilation

30 Additional benefit: More accurate profiling
w/ tiered compilation: 1’100 samples gathered w/o tiered compilation: 300 samples gathered Profiling without tiered compilation Interpreter C2 (non-profiled) 300 samples 100 samples 200 samples time 100 samples 1000 samples Interpreter C1 (profiled) C2 (non-profiled) Profiling with tiered compilation

31 Tiered Compilation Combined benefits of interpreter, C1, and C2
Additional benefits More accurate profiling information Drawbacks Complex implementation Careful tuning of compilation thresholds needed More pressure on code cache

32 A method’s lifetime (Tiered Compilation)
Deoptimization Code cache Interpreter C1 C2 Collect profiling information Generate code quickly Continue collecting profiling information Generate high-quality code Use profiling information

33 Performance of a method (Tiered Compilation)
Interpreted C1 compiled C2 compiled Interpreted C2 compiled Performance Time VM Startup Compilation Compilation Deoptimization Compilation

34 Compilation levels (detailed view)
Typical compilation sequence C2 Associated thresholds: Tier4InvocationThreshold Tier4MinInvocationThreshold Tier4CompileThreshold Tier4BackEdgeThreshold 4 C1: full profiling 3 It‘s not as simple as it seems Compilation level Associated thresholds: Tier3InvokeNotifyFreqLog Tier3BackedgeNotifyFreqLog Tier3InvocationThreshold Tier3MinInvocationThreshold Tier3BackEdgeThreshold Tier3CompileThreshold C1: limited profiling 2 C1: no profiling 1 Interpreter

35 JDK 8 features JVM enhancements Tools Java for Everyone Innovation
Tiered Compilation PermGen removal Performance improvements Profiles for constrained devices JSR 310: Date & Time APIs Non-Gregorian calendars Unicode 6.2 ResourceBundle BCP47 locale matching Globalization & Accessibility JSR 308: Annotations on Java Type Native app bundling App Store Bundling tools jdeps JVM enhancements Tools Parallel operations for core collections APIs Improvements in functionality Improved type inference Lambda Expressions Language Interop Nashorn Java for Everyone Innovation Core libraries Deployment enhancements JavaFX 8 Public UI Control API Java SE Embedded support Enhanced HTML5 support 3D shapes and attributes Printing Mission Control Flight Recorder Usage Tracker Advanced Management Console MSI Enterprise JRE Installer Limited doPrivilege NSA Suite B algorithm support SNI Server Side support DSA updated to FIPS186-3 AEAD JSSE CipherSuites Security Client Enterprise

36 Outline Intro: The HotSpot Java Virtual Machine
Part 1: What's cool in Java 8 Background: JIT compilation in HotSpot Tiered Compilation Part 2: What's new in Java 9 Conclusion

37 JDK 9 features Specialized Behind the scenes Housekeeping
Process API Updates Variable Handles Spin-Wait Hints Dynamic Linking of Language-Defined Object Models Enhanced Method Handles More Concurrency Updates Compiler Control Segmented Code Cache Ahead-of-Time Compilation Parser API for Nashorn Prepare JavaFX UI Controls & CSS APIs for Modularization Modular Java Application Packaging New Version-String Scheme Reserved Stack Areas for Critical Sections Segmented Code Cache Ahead-of-Time Compilation Indify String Concatenation Unified JVM Logging Unified GC Logging Make G1 the Default Garbage Collector Use CLDR Locale Data by Default Validate JVM Command-Line Flag Arguments Java-Level JVM Compiler Interface Disable SHA-1 Certificates Deprecate the Applet API Process Import Statements Correctly Annotations Pipeline 2.0 Elide Deprecation Warnings on Import Statements Filter Incoming Serialization Data Milling Project Coin Simplified Doclet API JDK 9 features Compact Strings Store Interned Strings in CDS Archives Improve Contended Locking Compact Strings Improve Secure Application Performance Leverage CPU Instructions for GHASH and RSA Tiered Attribution for javac Javadoc Search Marlin Graphics Renderer HiDPI Graphics on Windows and Linux Enable GTK 3 on Linux Update JavaFX/Media to Newer Version of GStreamer Specialized HTTP 2 Client Unicode 8.0 UTF-8 Property Files ECMAScript 6 Features in Nashorn Datagram Transport Layer Security (DTLS) OCSP Stapling for TLS TLS Application-Layer Protocol Negotiation Extension SHA-3 Hash Algorithms DRBG-Based SecureRandom Implementations Create PKCS12 Keystores by Default Merge Selected Xerces Updates into JAXP HarfBuzz Font-Layout Engine XML Catalogs HTML5 Javadoc Behind the scenes Jigsaw – Modularize JDK Enhanced Deprecation Stack-Walking API Convenience Factory Methods for Collections Platform Logging API and Service jshell: The Java Shell (Read-Eval-Print Loop) Compile for Older Platform Versions Multi-Release JAR Files Platform-Specific Desktop Features TIFF Image I/O Multi-Resolution Images Housekeeping Remove GC Combinations Deprecated in JDK 8 Remove Launch-Time JRE Version Selection Remove the JVM TI hprof Agent Remove the jhat Tool New functionality New standards Removed

38 Outline Intro: The HotSpot Java Virtual Machine
Part1: What's cool in Java 8 Background: JIT compilation in HotSpot Tiered Compilation Part2: What's new in Java 9 Segmented Code Compact Strings Ahead-of-Time Compilation Conclusion

39 What is the code cache? Stores code generated by JIT compilers
Continuous chunk of memory Managed (similar to the Java heap) Fixed size Essential for performance Simplify slide

40 Code cache usage: JDK 6 and 7

41 Code cache usage: JDK 8 (Tiered Compilation)

42 Code cache usage: JDK 9

43 Challenges Tiered compilation increases amount of code by up to 4X
All code is stored in a single code cache High fragmentation and bad locality But is this a problem in real life?

44 Code cache usage: Reality
free space profiled code non-profiled code

45 Code cache usage: Reality
hotness free space profiled code non-profiled code

46 Design: Types of compiled code
Optimization level Size Cost Lifetime Non-method code optimized small cheap immortal Profiled code (C1) instrumented medium limited Non-profiled code (C2) highly optimized large expensive long

47 Design Without Segmented Code Cache With Segmented Code Cache
non-profiled methods profiled methods non-methods

48 Segmented Code Cache: Reality
profiled methods non-profiled methods free space profiled code non-profiled code

49 Segmented Code Cache: Reality
profiled methods non-profiled methods hotness

50 Evaluation: Responsiveness
Sweeper (GC for compiled code) Just as with GC, sweeping takes time and affects performance

51 Evaluation: Performance

52 What we have learned Segmented Code Cache helps
To reduce the sweeper overhead and improve responsiveness To reduce memory fragmentation To improve code locality And thus improves overall performance To be released with JDK 9

53 Outline Intro: The HotSpot Java Virtual Machine
Part 1: What's cool in Java 8 Background: JIT compilation in HotSpot Tiered Compilation Part 2: What's new in Java 9 Segmented Code Compact Strings Ahead-of-Time Compilation Conclusion

54 Java Strings H E L L O 0x0048 0x0045 0x004C 0x004C 0x004F
public class HelloWorld { public static void main(String[] args) { String myString = "HELLO"; System.out.println(myString); } public final class String { private final char value[]; ... } H E L L O char value[] = 0x0048 0x0045 0x004C 0x004C 0x004F UTF-16 encoded 2 bytes

55 “Perfection is achieved, not when there is nothing more to add, but when there is nothing more to take away.” Antoine de Saint Exupéry

56 There is a lot to take away here..
UTF-16 encoded Strings always occupy two bytes per char Wasted memory if only Latin-1 (one-byte) characters used: But is this a problem in real life? H E L L O char value[] = 0x0048 0x0048 0x0045 0x0045 0x004C 0x004C 0x004C 0x004C 0x004F 0x004F 2 bytes

57 Real life analysis: char[] footprint
950 heap dumps from a variety of applications char[] footprint makes up 10% - 45% of live data Majority of characters are single byte Predicted footprint reduction of 5% - 10%

58 Project Goals Memory footprint reduction by improving space efficiency of Strings Meet or beat performance of JDK 9 Full compatibility with related Java and native interfaces Full platform support x86/x64, SPARC, ARM Linux, Solaris, Windows, Mac OS X

59 Design String class now uses a byte[] instead of a char[]
Additional 'coder' field indicates which encoding is used public final class String { private final byte value[]; private final byte coder; ... } H E L L O byte value[] = 0x00 0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F UTF-16 encoded byte value[] = 0x48 0x45 0x4C 0x4C 0x4F Latin-1 encoded H E L L O

60 Design If all characters have a zero upper byte
String is compressed to Latin-1 by stripping off high order bytes If a character has a non-zero upper byte String cannot be compressed and is stored UTF-16 encoded byte value[] = 0x00 0x47 0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F UTF-16 encoded Compression Inflation byte value[] = 0x48 0x45 0x4C 0x4C 0x4F Latin-1 encoded

61 Design Compression / inflation needs to fast
Requires HotSpot support in addition to Java class library changes JIT compilers: Intrinsics and String concatenation optimizations Runtime: String object constructors, JNI, JVMTI GC: String deduplication Kill switch to enforce UTF-16 encoding (-XX:-CompactStrings) For applications that extensively use UTF-16 characters

62 Evaluation: Performance
SPECjbb2005 21% footprint reduction 27% less GCs 5% throughput improvement SPECjbb2015 7% footprint reduction 11% critical-jOps improvement Weblogic (startup) 10% footprint reduction 5% startup time improvement To be released with JDK 9

63 Outline Intro: The HotSpot Java Virtual Machine
Part 1: What's cool in Java 8 Background: JIT compilation in HotSpot Tiered Compilation Part 2: What's new in Java 9 Segmented Code Compact Strings Ahead-of-Time Compilation Conclusion

64 Ahead-of-Time Compilation
Compile Java classes to native code prior to launching the VM AOT compilation is done by new jaotc tool Uses Java based Graal compiler as backend Stores code and metadata in shared object file Improves start-up time Limited impact on peak performance Sharing of compiled code between VM instances

65 Revisit: Performance of a method (Tiered Compilation)
Interpreted C1 compiled C2 compiled Compilation Deoptimization Performance Time VM Startup

66 Performance of a method (Tiered AOT)
AOT compiled C2 compiled Compilation Interpreted Deoptimization C1 compiled Performance Time We replace the interpreter VM Startup

67 Ahead-of-Time Compilation
Experimental feature Supported on Linux x64 Limited to the java.base module Try with your own code - feedback is welcome! To be released with JDK 9 More to come in future releases

68 Summary Many cool features in Java 8
Tiered Compilation More features to come with Java 9 in July 2017 Segmented Code Cache, Compact Strings, Ahead-of-Time compilation Java – A vibrant platform Early access releases are available: I hope I was able to convince you that Java is a vibrant platform Here is a quote from Larry Ellison that shows Oracle‘s dedication to Java "Our SaaS products are built on top of Java and the Oracle DB—that’s the platform.” Larry Ellison, Oracle CTO

69

70 Evaluation: Code locality
Instruction Cache (ICache) 14% less ICache misses Instruction Translation Lookaside Buffer (ITLB1) 44% less ITLB misses Overall performance 9% speedup with microbenchmark 1 caches virtual to physical address mappings to avoid slow page walks

71 Microbenchmark: LogLineBench
public class LogLineBench { int size; String method = generateString(size); public String work() throws Exceptions { return "[" + System.nanoTime() + "] " + Thread.currentThread().getName() + "Calling an application method \"" + method + "\" without fear and prejudice."; }

72 LogLineBench results Kill switch works (no regression)
Performance ns/op Allocated b/op 1 10 100 Baseline 149 153 231 888 904 1680 CS disabled 152 150 230 CS enabled 142 139 169 504 512 Kill switch works (no regression) 27% performance improvement and 46% footprint reduction


Download ppt "What is cool in Java 8 and new in 9"

Similar presentations


Ads by Google