Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Limits of Java Performance: Breaking through the Scalability Barriers imposed by the Java Platform. Ron Kleinman, Lead Product Technologist.

Similar presentations


Presentation on theme: "The Limits of Java Performance: Breaking through the Scalability Barriers imposed by the Java Platform. Ron Kleinman, Lead Product Technologist."— Presentation transcript:

1 The Limits of Java Performance: Breaking through the Scalability Barriers imposed by the Java Platform. Ron Kleinman, Lead Product Technologist

2 ©2008 Azul Systems, Inc. 2 Agenda Java Platform Scalability Barriers? What Scalability Barriers? ─ The language is the platform. How big a platform? Avoiding the Barriers: What do people do today? When the Barriers can’t be avoided: Scalability Design Patterns ─ Scaling Out: When performance fades, add some blades ─ Scaling Up: Performance Gains through Virtual Domains ─ Scaling Middleware: What seems local is remote ─ Scaling External: Moving to a bigger house Focused Solution: Java Compute Appliance Building it out: Leveraging an integrated Appliance Architecture

3 ©2008 Azul Systems, Inc. 3 Scalability Barriers: What Scalability Barriers? There just seems to be something about Managed Runtime Environments “Perhaps the most commonly asked questions regarding memory management in.NET are: "How long does a garbage collection take?" and "How can I control when the garbage collector runs?" Apprehensive that "pauses" caused by garbage collections will be perceived by users, application developers often search for ways to control when garbage collections occur”. - Steven Pratschner, Microsoft Program Manager for.Net Common Language Runtime “Ruby’s garbage collector (GC) has become a problem for the Luz user experience. The GC process can cause the entire application to pause for upwards of 200ms at a time (on a P3 1.2ghz), which is simply unacceptable for an application doing real-time animation where, to achieve even 24fps, a new frame must be generated every 42ms. As a result, we see ‘hiccups’ in the animation.” - Gnome Coder

4 ©2008 Azul Systems, Inc. 4 What Java Platform Scalability Barriers? 1. Resource Limit on Maximum # usable GB of Memory ─ Unused Memory must be freed and defragmented ─ All “in use” references must be found, flagged and changed ─ GC Pauses scale linearly with memory size (~ 1GB max)‏ 2. Resource Limit on Maximum # usable CPUs ─ Synchronized Method “Large Grained” ─ Lock suspends all but one thread ─ Lock Contention, not Data Contention ─ 4 CPUs vs. 400

5 ©2008 Azul Systems, Inc. 5 “Avoiding the Barriers” (Living within 4 CPU / 4 GB constraints)‏ Force garbage collection to occur at non-peak times Write components in C, C++ ─ Use native components rather than Java components (JNI)‏ Limit Java cache sizes ─ Reuse own Memory (keep own pool)‏ Handcrafted fine tuning: ─ GC Algorithm Dependency (a la VMS Fortran “File Open”)‏ Throw more hardware at the problem (may not work)‏ Recode the Application ─ Increase CPU concurrency with finer-grained locking (R/W)‏ ─ Attack GC Pauses with Real Time Java Extensions

6 ©2008 Azul Systems, Inc. 6 “Confronting the Barriers” “If something can’t go on forever, it won’t.” -Herb Stein, Former Chair of the Council of Economic Advisers (pre-2005)

7 ©2008 Azul Systems, Inc. 7 Dealing with Increasing Peak Loading: The Trade Exchange Program Operating System Memory CPUs Application JVM Stock DB Cache

8 ©2008 Azul Systems, Inc. 8 Maintaining Service Level Agreements in the face of massively increasing demand # Stock Feeds up ─ More sources of data to correlate # Trades up ─ Greater volume of transactions to handle # Metrics up ─ More things to monitor for each trade Processing / Metric up ─ “Secret Sauce” Trading algorithms more complex Required maximum response times way down ─ 1-2 msec and lower ─ Significant swings in Latency Jitter intolerable ─ GC Pause can cost $$$

9 ©2008 Azul Systems, Inc. 9 Handling Increasing Workload: It’s not just providing additional Capacity

10 ©2008 Azul Systems, Inc. 10 It’s making that Capacity usable

11 ©2008 Azul Systems, Inc. 11 Java Application Scalability Design Patterns: Adding Computing Capacity Multiple Real Application Instances ─ 1. Horizontal (Scale Out - with Commodity Servers)‏ ─ 2. Vertical (Scale Up - with Hypervisor Domains on Enterprise Servers)‏ Single Virtual Application Instance ─ 3. Middleware (Scale Virtually – with customized software modules)‏ Single Real Application Instance ─ 4. External (Scale Specialized - with Java Compute Appliances)‏

12 ©2008 Azul Systems, Inc. 12 1. Horizontal Scale Out to host multiple instances Add more commodity servers to Data Center Operating System Memory CPUs Application JVM Stock DB Cache [M-Z] ?? Operating System Memory CPUs Application JVM Stock DB Cache [A-L] ??

13 ©2008 Azul Systems, Inc. 13 2. Vertical Scale Up to host multiple instances Create more virtual servers on Hypervisor CPUs Memory Operating System Hypervisor JVM Application JVM Application Stock DB Cache [M-Z] Stock DB Cache [A-L]

14 ©2008 Azul Systems, Inc. 14 Breaking through the Java Platform Scalability Barrier: Hardware Servers vs. Virtual Servers Same Java Platform limitations within instance: - Refactor Data - Recode Application - Peak load swings can still exceed JVM memory capacity / result in huge pauses Cloud “Orthogonal” Refactor Data (Shards)‏ Recode Application Peak load swings can exceed resource limits Partial crashes Load Management Over Provisioning Server sprawl Issues Hypervisor provides better resource utilization. Reduces Server Sprawl. Easier to manage. Easy expansion via addition of homogeneous commodity servers. “Cloud-izable” / Hadoop-ish Advantages Separate Application Instances on Virtual Servers Separate Application Instances on Hardware Servers Strategy Pure VerticalPure Horizontal Scale:

15 ©2008 Azul Systems, Inc. 15 Stock DB [A-Z] 3. Memory Scale Out to multiple systems Use Middleware to simulate one huge memory heap Operating System Memory CPUs Application JVM Instrumented Byte Codes Operating System Memory CPUs Application JVM Instrumented Byte Codes Operating System Physical Memory CPUs Federated Global Memory Cache Virtual Memory Hub

16 ©2008 Azul Systems, Inc. 16 Breaking through the Java Platform Scalability Barrier: Multiple Local Memory Heaps vs. Single Global Memory Heap Not all data elements can be shared (ex: hash keys). Cache misses can cause widely varying response latencies Performance dependent upon data usage (reads >> writes good)‏ Multiple points of partial failure Central hub limits Cloud Computing Global thread locks tough to scale Refactor Data Recode Application Peak load swings can exceed resource limits Partial crashes Load Management Over Provisioning Server sprawl Issues Selected object elements shared, dynamically cached, transparently updated from central source Effective JVM memory limits transparently bypassed Easy expansion via addition of homogeneous commodity servers “Cloud-izable” Hadoop-ish Advantages + Shared global memory supported by Java byte code instrumentation (get/put element)‏ Separate Application Instances on Commodity Servers with separate local memory Strategy + Shared Global MemoryPure Horizontal Scale:

17 ©2008 Azul Systems, Inc. 17 4. Externally Scale on a Specialized Java Appliance Add physical memory and CPUs as needed Kernel Memory CPUs Application JVM Stock DB Cache Java Compute Appliance JVM Proxy Original Deployed Host

18 ©2008 Azul Systems, Inc. 18 Scalability with the Appliance Design Pattern Physically Isolate Resource Share Resource Centrally Manage Expand Capacity Extend Functionality Appliance External Resources Transparently Utilize Remote Hardware Application Memory CPUs Network Storage Application

19 ©2008 Azul Systems, Inc. 19 Example #1: Router Share, Manage, Scale Up, Extend Router Appliance Guaranteed Message Delivery Auto-encryption Protocol Gateway High bandwidth WAN connections NETWORKNETWORK Resource Externalized: (Network)‏ Application Operating System Memory CPUs

20 ©2008 Azul Systems, Inc. 20 Example #2: Storage Area Network Share, Manage, Scale up, Extend Storage Storage Area Network (SAN)‏ Flash as Storage Disk Mirroring Need based Allocation Resource Externalized: (Storage)‏ Application Operating System Memory CPUs

21 ©2008 Azul Systems, Inc. 21 Ex #3: Java Compute Appliance (JCA) (Share, Manage, Scale up, Extend )‏ Transparently Bring the Application to the Resources Operating System Memory CPUs Proxy JVM Original Deployed System Optimized Kernel 100’s GBs Memory 100’s CPUs Java Application Appliance JVM JCA

22 ©2008 Azul Systems, Inc. 22 Ex #3: Java Compute Appliance (JCA) Complete Java Application / Deployed Platform separation JVM: Decouples a Java Application from the OS ─ Decoupled from local hardware (& any Hypervisor)‏ ─ Decoupled from connected appliances ─ Decoupled from Middleware ─ Last remaining resource connections are Memory and CPUs Move the Java Application to its Computing Resources ─ Decouple from the original deployment platform entirely ─ Transparently redeploy on a Java Compute Appliance (JCA)‏ ─ Use Appliance Memory and CPUs ─ Same appliance advantages apply ─ Share ─ Centrally Manage ─ Expand ─ Optimize / Extend ─ And some other ones as well (Stability)‏

23 ©2008 Azul Systems, Inc. 23 DatabaseWeb VM Proxy Compute Pool App Server Hosts

24 ©2008 Azul Systems, Inc. 24 Java Compute Appliance: Sharing the Resources

25 ©2008 Azul Systems, Inc. 25 An integrated Java Compute Appliance Vega 3 Up to 864 CPU Cores, 768 Gbytes Memory On-Chip Hardware Extensions Azul Thread Execution Kernel (AzTEK) Mission Critical Java Application Azul VM Mission Critical Java Application Azul VM

26 ©2008 Azul Systems, Inc. 26 #1. The GC Pause Scalability Barrier: Make the problem part of the solution Problem Solution Maximum usable Memory Limit Removed ─ Scale from 1 to 100 GB heap ─ Constant response latencies of 1-3 msec ─ No change to existing Java code

27 ©2008 Azul Systems, Inc. 27 Impact of Garbage Collection (Actual Financial Service Trading system under load)‏ Performance Impact Complexity Impact Native Configuration -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:TargetSurvivorRatio=80 -XX:CMSInitiatingOccupancyFraction=85 -XX:SurvivorRatio=8 -XX:MaxNewSize=320m -XX:NewSize=320m -XX:MaxTenuringThreshold=10 Azul Configuration -Xms3g –Xmx3g With Azul (yes, that flat blue line)‏ Native

28 ©2008 Azul Systems, Inc. 28 #2. The Large-Grained Lock Scalability Barrier A two-tier Integrated approach Serialized portions of program severely limit scalability Amdahl’s Law: Efficiency = 1/P /((1- P) + P/N))‏ (N = # of concurrent threads, P = run time fraction of parallelizable code)‏ ─ At 4 threads ─ 5% serialized code = 87%+ efficiency ─ At 400 threads: ─ 5% serialized code = <5% efficiency! Solution ─ Automated: Optimistic Thread Concurrency (OTC)‏ ─ Manual: Real Time Performance Monitoring (RTPM)‏

29 ©2008 Azul Systems, Inc. 29 Optimistic Thread Concurrency (OTC)‏ Strategy: Assume no data contention How it Works ─ Java Synchronized Block: Similar to DB transaction ─ Block is Transactional around synchronized {…} ─ Transparent Roll back if object element impacted ─ JVM Dynamic lock levels (Speculative, Thick) ─ Runtime profile based Where it Works ─ Thread instances access different variables ─ Thread instances access same variables for read ─ Hash Table for product database: 100 readers for every writer When it Works ─ Parallel execution of all threads in same synchronized method: ─ Competition for actual data elements, not lock ─ Amdahl’s law: Efficiency reflects actual data (not lock) contention times

30 ©2008 Azul Systems, Inc. 30 + Real Time Performance Monitoring (RTPM)‏ JVM-assisted deep visibility into Application Performance ─ Threads (List / States, Trace, Lock Contention details, CPU Usage)‏ ─ GC (Cycle phase results, min/max pauses, memory used / freed)‏ ─ Memory (Detailed Live Objects breakdown / updated every GC cycle)‏ ─ Socket IO (Open connections, quantity of data, associated latency)‏ Performance Bottleneck and Problem Detection ─ Multi-core processing & concurrency ─ Memory Demands and Memory Leaks ─ Multithreading Race Conditions (**)‏ Zero Overhead ─ Monitoring won’t impact application being monitored ─ No disturbance to production environment Real Time – Always On ─ Allows ID of Performance Problems as they happen ─ No application restarts

31 ©2008 Azul Systems, Inc. 31 A Java Compute Appliance gives Java Applications Room to Scale TRADITIONALTRADITIONAL Garbage Collection Pauses 2GB Heaps Instabilities due to resource limitations Over-provisioning and server sprawl Lock Contention 2-4 CPUs 100s of CPUs OTC / RTPMUp to 670 GB Heaps No resource related restarts Server consolidation Pauseless GC

32 ©2008 Azul Systems, Inc. 32 JCA Product Proof Point : Winner of Largest single instance JVM benchmark 7380 E25K Itanium RX6600 T5220 PowerEdge 2950 P570 SPECjbb 2005 New $4.5M $0.75M

33 ©2008 Azul Systems, Inc. 33 Breaking through the Java Platform Scalability Barrier: D istributed Global Memory Heap vs. Appliance-enhanced JVM Not applicable to all Java Apps: Heavy JNI Use “Chatty” DB applications Single threaded IT objections to new hardware configuration SAAS, not Cloud Not all data elements can be shared (ex: hash keys). Cache misses can cause widely varying response latencies Performance dependent upon data usage (reads >> writes good)‏ Multiple points of partial failure Global data locks tough to scale Issues Transparent JVM scalability to full utilization of all memory and CPU resources on the JCA. No “resource limit” crashes Predictably low response latencies Usable JVM memory limits transparently extended via local cache of larger global memory Easy CPU scalability via addition of commodity servers. Advantages Single Application Instance on Appliance provided with massive amounts of usable CPUs and memory Separate Application Instances on Commodity Servers provided with local dynamically updated cache of shared global memory. Strategy External Java ApplianceShared Global Memory Scale:

34 ©2008 Azul Systems, Inc. 34 Summary Java barriers to scalability are becoming more painful: ─ Memory utilization limited by GC pauses ─ CPU utilization limited by course grained thread locks (“synchronized”)‏ Obvious Workarounds take you only so far Standard “scale out” & “scale up” strategies have drawbacks ─ Server sprawl, code modifications, partial failures,... Additional (and transparent) scalability solutions possible for “Managed Environments” ─ Shared Global Memory ─ External Java Compute Appliance No one answer is right in all cases

35 ©2008 Azul Systems, Inc. 35 Questions

36 ©2008 Azul Systems, Inc. 36 References Azul Engineer to Engineer Technical Site ─ http://www.azulsystems.com/e2e/ http://www.azulsystems.com/e2e/ VMS Fortran File Open Options ─ http://www.astro.virginia.edu/class/oconnell/astr511/idl_5.1_html/idl1 30.htm http://www.astro.virginia.edu/class/oconnell/astr511/idl_5.1_html/idl1 30.htm My email address ─ ron.kleinman@yahoo.com ron.kleinman@yahoo.com

37 ©2008 Azul Systems, Inc. 37 Additional Material...

38 ©2008 Azul Systems, Inc. 38 Breaking through the Java Platform Scalability Barrier: Transparent App Redeployment on a Java Compute Appliance Not applicable to all Java Applications: - Heavy JNI Use - “Chatty” DB applications - Single threaded IT objection to new hardware configuration SAAS, not Cloud Same Java Platform limitations within instance: - Refactor Data - Recode Application - Peak load swings can still exceed JVM memory capacity Cloud orthogonal Refactor Data Recode Application Peak load swings can exceed resource limits Partial crashes Load Management Over Provisioning Server sprawl Issues Enough usable memory & CPUs to operate as before No “resource limit” crashes Scalability is Transparent Hypervisor provides better resource utilization. Reduces Server Sprawl. Easier to administrate, Easy expansion via addition of homogeneous commodity servers “Cloud-izable” Strengths Redeploy single Instance of Application on specialized JCA Separate Application Instances on Virtual Servers Separate Application Instances on Commodity Servers Strategy External ApplianceVerticalHorizontal Scale:

39 ©2008 Azul Systems, Inc. 39 Other Advantage of an Integrated Appliance: Low Level hooks into Kernel / Hardware Compute Pool Manager ─ Central view of Appliances Resource ─ Policy based management ─ Establish resources guarantees ─ Set application resources (min, max, redundancy, etc.)‏ Real Time Performance Monitor ─ Zero cost Java application probes ─ Extensive memory & thread usage info ─ Isolate problems even in Production

40 ©2008 Azul Systems, Inc. 40 Java Compute Appliance: Summary Value Proposition: Share, Manage, Scale up, Extend Large (100’s of GB) heap support / No user-visible pauses ─ Reduce maximum response latency time / jitter ─ Reduce total application instance count (fewer / larger instances)‏ ─ End crashes due to hitting memory limit under peak loading ─ Enable new design alternatives (e.g. cache the entire database in memory)‏ Hardware-assisted Optimistic Thread Concurrency ─ LHF Critical bottlenecks minimized JVM-assisted Real Time Performance Monitor ─ Critical bottlenecks discovered No new APIs required (ex: Real Time Java)‏ ─ Any code changes for tuning / performance No changes to application deployment procedures


Download ppt "The Limits of Java Performance: Breaking through the Scalability Barriers imposed by the Java Platform. Ron Kleinman, Lead Product Technologist."

Similar presentations


Ads by Google