Download presentation
Presentation is loading. Please wait.
Published byDamon Cunningham Modified over 9 years ago
1
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute of Technology
2
2 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work
3
3 Heterogeneous Architectures A particularly interesting class of parallel machines is Heterogeneous Architectures –Multiple types of Processing Elements (PEs) available on the same machine PE A PE B Interconnect
4
4 Heterogeneous Architectures Heterogeneous architectures are becoming very common IBM Cell processor Special Accelerator Fast core Slow core Slow core Slow core Slow core Focus of this talk Asymmetric Processors Fast core
5
5 Machine configurations All-slow (SMP)All processors running at their lowest frequency Half-half (AMP)Half of the processors running at their highest frequency, rest running at their lower frequency All-fast (SMP)All processors running at their highest frequency M-I experiments have 8 threads, M-II experiments have 16 threads AMPs emulated using SpeedStep/PowerNow Machine-I2 Socket 1.87 GHz Quad-core Intel Xeon 4MB L2 cache, 8GB RAM, 40GB HDD, RHEL 5 Machine-II4 Socket 2 GHz Quad-core AMD Opteron 8350 2MB L3 cache, 32GB RAM, 1TB HDD, RHEL 4
6
6 Power Measurement Using Extech 380801 Power Analyzer Total system power consumption Experiment Machine Windows Machine Power Cable Serial Cable Power Socket
7
7 PARSEC Benchmark Suite Desktop-oriented multithreaded benchmark suite –Multithreaded –Animation, Data Mining, Financial Analysis –Pthreads, OpenMP
8
8 Performance of PARSEC benchmarks On average, performance of half-half is between that of all-slow and all-fast Execution Time slow-limitedmiddle-perfunstable
9
9 barrier (a) slow-limited (b) middle-perf(c) unstable Classification of Benchmarks
10
10 In half-half/all-slow, total energy consumption is higher even though average power consumed might be lower Energy Consumption of PARSEC Energy consumption slow-limitedmiddle-perf
11
11 Observations –Different applications behave differently on AMPs –Usually SMP with fast processors saves energy Behavior of Parsec Benchmarks
12
12 Why do different applications behave differently on AMPs?
13
13 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work
14
14 Thread Interactions Sources of thread interactions Critical Sections Barriers
15
15 Case (a) Critical section Useful work Case (b) Waiting Critical Sections (CS) Waiting to enter CSs
16
16 Waiting for other threads to finish barrier Barriers barrier
17
17 Effect of Critical Section length CS limited application As critical section length increases, the average power consumed decreases Normalized Power Consumption
18
18 Effect of Critical Section length Normalized Execution Time CS limited application
19
19 Effect of Critical Section length Performance of AMPs sensitive to CS length Normalized Execution Time CS limited application
20
20 Effect of Critical Section length Energy consumption shows the same trend Normalized Energy Consumption CS limited application
21
21 Effect of Critical Section frequency Both length and frequency of CS affect performance and energy consumption As frequency increases, performance difference between half-half and all-fast reduces If majority of the execution time is spent waiting for locks, it is OK to have a few slow processors Results available in the paper
22
22 Effect of Barriers For few barriers, half-half performs similar to all- slow For large number of barriers, half-half performs similar to all-fast Results available in the paper
23
23 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work
24
24 Motivation: better run-time adaptivity Each thread requests for more work after completing the assigned work OpenMP, Intel Thread Building Blocks Dynamic Scheduling
25
25 Dynamic Scheduling Can help improve performance and reduce energy consumption in AMPs Should be preferred to static and guided policies Machine configuration Normalized Execution Time Normalized Energy Consumption Static/Dynamic 16 @ 1 GHz (SMP)1.0 16 @ 1.2 GHz (SMP)0.830.87 16 @ 1.4 GHz (SMP)0.710.78 16 @ 1.7 GHz (SMP)0.590.68 16 @ 2 GHz (SMP)0.500.61 8 @ 1 GHz, 8 @ 2 GHz (AMP)1.00/0.671.05/0.73 8 @ 1.2 GHz, 8 @ 2 GHz (AMP)0.83/0.630.90/0.70 8 @ 1.4 GHz, 8 @ 2 GHz (AMP)0.71/0.590.80/0.67 8 @ 1.7 GHz, 8 @ 2 GHz (AMP)0.59/0.540.69/0.63 Parallel-for application
26
26 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work
27
27 Scheduling in AMPs Longest Job to a Fast Processor First (LJFPF) [Lakshminarayana’08] barrier Fast core Slow core
28
28 How Does the Scheduler Know Length of work? Current mechanism: application sends task length information On-going work: Prediction mechanism
29
29 LJFPF ITK: Medical image processing applications (OpenSource) MultiRegistration (Registration method) –kernel with 50 iterations –50 iterations divided among 8 threads Normalized Execution TimeNormalized Energy Consumption
30
30 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work
31
31 Conclusion & Future Work Conclusion Evaluated the performance/energy consumption behavior of multithreaded applications in AMPs For symmetric workloads –With little thread interaction: SMP with fast processors –With a lot of thread interaction: AMP could be better For asymmetric threads – AMP could provide lowest energy consumption Future Work Predict application characteristics and use predicted information for thread scheduling on AMPs
32
32 Thank you!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.