Optimizing your Java Applications for multi-core hardware

Slides:



Advertisements
Similar presentations
Copyright © IBM Corp., All rights reserved. The presentation is licensed under Creative Commons Att. Nc Nd 2.5 license. RESTful Service Oriented.
Advertisements

RTC Agile Planning Component
© 2010 IBM Corporation ® IBM Software Group Assistive Technology As applied to the workplace Niamh Foley.
Memory Management 2010.
© 2014 IBM Corporation IBM Tivoli Storage Manager Virtual Appliance Smarter Data Protection for Cloud Environments Cyrus Niltchian, Product Management.
Operating System Organization
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
® IBM Software Group © 2013 IBM Corporation Innovation for a smarter planet Timeboxes in a New Paradigm of Behavior Modeling Barclay Brown, ESEP IBM
Please Note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information.
Computer System Architectures Computer System Software
® IBM Software Group © 2012 IBM Corporation OPTIM Data Studio – Jon Sayles, IBM/Rational November, 2012.
© 2014 IBM Corporation The insights to transform the business with speed and conviction Kevin Redmond Head of Information Management Central & Eastern.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
© 2009 IBM Corporation ® IBM Lotus Notes and Domino Product Roadmap April 2009.
Copyright © IBM Corp., All rights reserved; made available under the EPL v1.0 | March 19, 2008 | Short Talk Extending TPTP for TTCN-3 Paul Slauenwhite.
Session objectives Discuss whether or not virtualization makes sense for Exchange 2013 Describe supportability of virtualization features Explain sizing.
IBM ISPF Productivity Tool © 2008 IBM Corporation IBM ISPF Productivity Tool for z/OS V 5.10 More Than Just ISPF.
IBM Software Group ® Jazz Storage Service Thomas.
© 2011 IBM Corporation January 2011 Pam Denny, IBM V7 Reporting.
IBM Software Group AIM Core and Enterprise Solutions IBM z/Transaction Processing Facility Enterprise Edition Any references to future plans are.
© 2012 IBM Corporation Introducing IBM Cognos Insight.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
© 2015 IBM Corporation Big Data Journey. © 2015 IBM Corporation 2.
Full and Para Virtualization
IBM eServer iSeries © 2003 IBM Corporation ™™ iSeries Solutions for Business Continuity IBM eServerJ iSeriesJ © 2003 IBM Corporation.
® IBM Software Group © 2011 IBM Corporation Innovation for a smarter planet IBM SOA Overview for MITRE “Driving SOA Program Success and Efficiency” April.
© 2012 IBM Corporation IBM Security Systems 1 © 2012 IBM Corporation Cloud Security: Who do you trust? Martin Borrett Director of the IBM Institute for.
Domino iSeries Multi-Versioning © 2002 IBM Corporation | Lotus software © 2002 IBM Corporation Domino Multi-Versioning Mike Gordon – IBM Global Services.
Background Computer System Architectures Computer System Software.
Click to add text © 2012 IBM Corporation Session # INV305 Getting beyond “good enough” with Microsoft Sharepoint Louis Richardson Worldwide Social Business.
Tuning Threaded Code with Intel® Parallel Amplifier.
© 2013 IBM Corporation IBM UrbanCode Deploy v6.0.1 Support Enablement Training Source Configuration and Database Upgrades Michael Malinowski
I want stress-free IT. i want control. i want an i. IBM System i ™ Session: Secure Perspective Patrick Botz IBM Lab Services Security Architecture Consulting.
IBM Innovate 2013 Define and Manage Requirements with IBM Rational Requirements Composer Peter Luckey North America Requirements Management & Quality Management.
IBM Software Group ® Jazz Team Build – Part 1 Overview Jonathan.
IBM Systems Group © 2004 IBM Corporationv 3.04 This presentation is intended for the education of IBM and Business Partner sales personnel. It should not.
© 2013 IBM Corporation IBM Security Systems © 2012 IBM Corporation Offense Magnitude.
IBM Innovate 2012 Title Presenter’s Name Presenter’s Title, Organization Presenter’s Address Session Track Number (if applicable)
Comparison between EPF Composer and Rational Method Composer
Work smarter, keep connected with Lotus Software Jon Crouch | Senior Technical Specialist, Lotus Software Matt Newton | Senior Technical Specialist, Lotus.
Introduction to Operating Systems Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
David Hatten Developer, UrbanCode 17 October 2013
Processes and threads.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Java 9: The Quest for Very Large Heaps
Processes and Threads Processes and their scheduling
Virtualization Engine console Bridge Concepts
IBM System z9 109 Availability Eye Opener
Integrating Data With Cognos
Many-core Software Development Platforms
Flight Recorder in OpenJDK
Chapter 4: Threads.
Chapter 4: Threads.
IBM Blockchain An Enterprise Deployment of a Distributed Consensus-based Transaction Log Ben Smith & Kostantinos Christidis 1 ©2016 IBM Corporation.
Embedded Software (ESW) Engineering Practices Introduction
Pedro Miguel Teixeira Senior Software Developer Microsoft Corporation
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Optimizing your Java Applications for multi-core hardware Prashanth K Nageshappa prashanth.k.n@in.ibm.com Java Technologies IBM

Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability

As The World Gets Smarter, Demands On IT Will Grow Smart supply chains Intelligent oil field technologies Smart food systems Smart healthcare Smart energy grids Smart retail 10x 1 Trillion 25 Billion Devices will be connected to the internet by 2011 Global trading systems are under extreme stress, handling billions of market data messages each day Digital data is projected to grow tenfold from 2007 to 2011. 70% on average is spent on maintaining current IT infrastructure versus adding new capabilities IT infrastructure must grow to meet these demands global scope, processing scale, efficiency

Hardware Trends Increasing transistor density Clock Speed leveling off More number of cores Non-Uniform Memory Access Main memory getting larger

In 2010 POWER Systems Brings Massive Parallelism 4 threads/core 8 cores/chip 32 sockets/server 1024 threads Threads POWER6™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER5™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER4™ 1 thread/core 2 cores/chip 16 sockets/server 32 threads 2001 180 nm 2004 130 nm 2007 65 nm 2010 45 nm

Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability

Your application may be re-used Better performance Why should I care? Your application may be re-used Better performance Better leverage additional resources Cores, hardware threads, memory etc

Think about scalability Serial bottlenecks inhibit scalability Organize your application into parallel tasks Consider TaskExecutor API Too many threads can be just as bad as too few Do not rely on JVM to discover opportunities No automatic parallelization Java class libraries do not exploit vector processor capabilities

Think about scalability Load imbalance Workload not evenly distributed Consider breaking large tasks into smaller ones Change serial algorithms to parallel ones Tracing and I/O Bottleneck unless infrequent updates or log is striped (RAID) Blocking disk/console I/O inhibit scalability

Synchronization and locking J9's Three-tiered locking Spin Yield OS Avoid synchronization in static methods Consider breaking long synchronized blocks into several smaller ones May be bad if results in many context switches Java Lock Monitor (JLM) tool can help http://perfinsp.sourceforge.net/jlm.html

Synchronization and locking Volatiles Compiler will not cache the value Creates memory barrier Avoid synchronized container classes Building scalable data structures is difficult Use java.util.concurrent (j/u/c) Non-blocking object access Possible with j/u/c

Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability

java.util.concurrent package Introduced in Java SE 5 Alternative strong synchronization Lighter weight, better scalability Comparing to intrinsic locks java.util.concurrent.atomic.* java.util.concurrent.locks.* ConcurrentCollections Synchronizers TaskExecutor

j/u/c/atomic.* Atomic primitives Strong form of synchronization But does not use lock – non blocking Exploit atomic instructions such as compare- and-swap in hardware Supports compounded actions AtomicBoolean AtomicInteger AtomicIntegerArray AtomicIntegerFieldUpdater AtomicLong AtomicLongArray AtomicLongFieldUpdater AtomicMarkableReference AtomicReference AtomicReferenceArray AtomicReferenceFieldUpdater AtomicStampedReference

j/u/c/atomic.* Getter and setters Updates CAS Conversions get set lazySet Updates getAndSet getAndAdd/getAndIncrement/getAndDecrement addAndGet/incrementAndGet/decrementAndGet CAS compareAndSet/weakCompareAndSet Conversions toString, intValue, longValue, floatValue, doubleValue

j/u/c/locks.* Problems with intrinsic locks j/u/c/locks Impossible to back off from a lock attempt Deadlock Lack of features Read vs write Fairness policies Block-structured Must lock and release in the same method j/u/c/locks Greater flexibility for locks and conditions Non-block-structured Provides reader-writer locks Why block other readers? Better scalability

j/u/c/locks.* Interfaces: Classes: Condition Lock ReadWriteLock ReentrantLock ReentrantReadWriteLock LockSupport AbstractQueuedSynchronizer

j/u/c.* - Concurrent Collections Concurrent, thread safe implementations of several collections HashMap → ConcurrentHashMap TreeMap → ConcurrentSkipListMap ArrayList → CopyOnWriteArrayList ArraySet → CopyOnWriteArraySet Queues → ConcurrentLinkedQueue or one of the blocking queues

Strains on the VM Excessive use of temporary memory can lead to increased garbage collector activity Stop the world GC pauses the application Excessive class loading Updating class hierarchy Invalidating JIT optimizations Consider creating a “startup” phase Transitions between Java and native code VM access lock

Memory Footprint Little control over object allocation in Java Small short lived objects are easier to cache Large long lived objects likely to cause cache misses Memory Analysis Tool (MAT) can help Consider using large pages for TLB misses -Xlp, requires OS support Tune your heap settings Heap lock contention with flat heap

Affinitizing JVMs Can exploit cache hierarchy on a subset of cores JVM working set can fit within the physical memory of a single node in a NUMA system Linux: taskset, numactl Windows: start

Is my application scalable? Low CPU means resources are not maximized Evaluate if application has too few/many threads Locks and synchronization Network connections, I/O Thrashing working set is too large for physical memory High CPU is generally good, as long as resources are spent in application threads, doing meaningful work Evaluate where time is being spent Garbage collection VM/JIT OS Kernel functions Other processes Tune, tune, tune

Write Once, Tune Everywhere HealthCenter, GCMV, MAT http://www.ibm.com/developerworks/java/jdk/tools/ Dependence on operating System Memory allocation Socket layer Tune for hardware capabilities How many cores? How much memory? What is the limit on network access? Are there storage bottlenecks?

Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability

IBM Java Execution Model is Built for Parallelism Generates high performance code for application threads Customizes execution to underlying hardware Optimizes locking performance Asynchronous compilation thread JIT Compiler Garbage Collector Application Threads Manages memory on behalf of the application Must balance throughput against observed pauses Exploits many multiple hardware threads Java software threads are executed on multiple hardware threads Thread safe libraries with scalable concurrency support for parallel programming

Configurable Garbage Collection policies Multiple policies to match varying user requirements Pause time, Throughput, Memory footprint and GC overhead All modes exploit parallel execution Dynamic adaptation to number of available hardware cores & threads GC scalability independent from user application scalability Very low overhead (<3%) on typical workloads

How do GC policies compare? - optthruput Optimize Throughput Highly parallel GC + streamlined application thread execution May cause longer pause times -Xgcpolicy:optthruput Java GC Thread 1 Thread 2 Thread 3 Thread n Time Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies.

How do GC policies compare? - optavgpause Optimize Pause Time GC cleans up concurrently with application thread execution Sacrifice some throughput to reduce average pause times -Xgcpolicy:optavgpause Java GC Concurrent Tracing Thread 1 Thread 2 Thread 3 Thread n Time Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies.

How do GC policies compare? - gencon Balanced Clean up many short-lived objects concurrent with application threads Some pauses needed to collect longer-lived objects -Xgcpolicy:gencon Java Global GC Scavenge GC Concurrent Tracing Thread 1 Thread 2 Thread 3 Thread n Time Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies.

How do GC policies compare? - subpools Scalable Scalable GC focused on the larger multiprocessor machines Improved object allocation algorithm May not be appropriate for small-to-midsize configurations –Xgcpolicy:subpool Uses multiple free lists Tries to predict the size of future allocation requests based on earlier allocation requests. Recreates free lists at the end of each GC based on these predictions. While allocating objects on the heap, free chunks are chosen using a “best fit” method, as against the “first fit” method used in other algorithms. Concurrent marking is disabled

JVM optimizations for multi-core scalability Lock removal across JVM and class libraries java.util.concurrent package optimizations Better working set for cache efficiency Stack allocation Remove/optimize synchronization Thread local storage for send/receive buffers Non-blocking containers Asynch JIT compilation on a separate thread Right-sized application runtimes

Thank You Questions? Grazie http://www.ibm.com/developerworks/java/ Russian Gracias Spanish Obrigado Brazilian Portuguese Merci Thank You French Traditional Chinese Arabic Simplified Chinese Thai Questions? Email: prashanth.k.n@in.ibm.com Grazie Korean Italian Japanese Danke http://www.ibm.com/developerworks/java/ German

Special notices © IBM Corporation 2010. All Rights Reserved. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. The following are trademarks of the International Business Machines Corporation in the United States and/or other countries: ibm.com/legal/copytrade.shtmlAIX, CICS, CICSPlex, DataPower, DB2, DB2 Universal Database, i5/OS, IBM, the IBM logo, IMS/ESA, Power Systems, Lotus, OMEGAMON, OS/390, Parallel Sysplex, pureXML, Rational, Redbooks, Sametime, SMART SOA, System z , Tivoli, WebSphere, and z/OS. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.