Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer.

Similar presentations


Presentation on theme: "Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer."— Presentation transcript:

1 Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer B. Sartor, Lieven Eeckhout Exploring Multi-Threaded Java Application Performance on Multicore Hardware

2 Modern Software & Hardware Managed languages Ubiquitous, but added runtime layer Many service threads interact with application  JIT compilation, on-stack replacement, collector  Stop the application, possibly critical  Share hardware resources Multicore with multiple sockets How do we schedule threads with constrained resources?  Scale core frequency for power  Use caches of all sockets, or limit communication p. 2

3 Extensive Performance Study Multi-threaded Java application on multicore, multi-socket hardware Large space to explore Number of threads Thread-to-core/socket mapping Pairing or isolating application and JVM threads Pinning Impact of frequency scaling Difference between startup and steady state p. 3 How do choices with scheduling and hardware resources affect performance?

4 Experimental Machine: Nehalem Scale frequency per socket to 1.596 or 3.059 GHz p. 4

5 Gain Insight on Scheduling Application Java Virtual Machine Garbage collector Just-in-time compiler with on-stack replacement Cao, et al. [ISCA 2012] studied JVM amenability to heterogeneity by measuring service threads’ performance per energy We study end-to-end performance p. 5

6 1. Cost of Isolation 1. Frequency Scaling Socket 1 Socket 0 Roadmap p. 6 Socket 0 Socket 1 3. Pairing Threads Socket 1 Socket 0

7 Experimental Methodology Jikes Research Virtual Machine (Dec 2011) Generational Immix collector 1.5, 2, and 3x minimum heap sizes Multithreaded DaCapo benchmarks 9.12-bach Avrora, lusearch (with fix), pmd, sunflow, xalan Also, pseudojbb2005 Timed 10 invocations Steady state, measure 15 th iteration Startup, measure 1 st iteration p. 7

8 Baseline Setup Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1 Application threads JVM service threads Collection Compilation p. 8 Pin application & collection threads

9 Boosting Socket Frequency Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1 1.596 3.059 GHz 27-50% improvement in execution time p. 9

10 Exploring The Cost of Isolation Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1 Collection threads p. 10

11 Isolating Collection Threads Isolating collector does not significantly hurt performance p. 11

12 Exploring The Cost of Isolation Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1 Compiler thread p. 12

13 Isolating Compiler Thread at Startup Isolating compiler at startup has little impact p. 13

14 Isolating On-Stack-Replace at Startup Isolating OSR at startup improves performance p. 14

15 Exploring The Cost of Isolation Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1 All JVM service threads p. 15

16 Isolating All JVM Threads Isolating service threads only significantly hurts one benchmark p. 16

17 Exploring Frequency Scaling Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1 Baseline: JVM service threads isolated, all cores at highest frequency p. 17

18 Exploring Frequency Scaling Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 versus Lower frequency of application threads Lower frequency of JVM service threads p. 18

19 Lower Frequency: Collector vs App Lowering collector frequency affects performance 5x less than for application p. 19

20 Lower Freq at Startup: Compiler vs App Lowering compiler frequency is not detrimental compared to application p. 20

21 Lower Frequency: JVM vs App Lowering JVM frequency affects performance 5x less than for application p. 21

22 Exploring Pairing Threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1 Pair application and collection threads p. 22

23 Pairing App & Collector, 2 Sockets With all but avrora, pairing application and collector performs best p. 23

24 Overall Performance Comparison Either use 1 socket, or isolate compiler thread p. 24

25 Conclusions: Scheduling Insights 1 socket: # application = # collection threads 2 sockets: Isolate compilation thread Pair application and collection threads Set # application threads = # cores, fewer collection threads Increasing application frequency is more important than for JVM service threads Analyzed Java performance given hardware resources p. 25


Download ppt "Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer."

Similar presentations


Ads by Google