Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.

Similar presentations


Presentation on theme: "Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute."— Presentation transcript:

1 Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute of Technology

2 2 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work

3 3 Heterogeneous Architectures A particularly interesting class of parallel machines is Heterogeneous Architectures –Multiple types of Processing Elements (PEs) available on the same machine PE A PE B Interconnect

4 4 Heterogeneous Architectures Heterogeneous architectures are becoming very common IBM Cell processor Special Accelerator Fast core Slow core Slow core Slow core Slow core Focus of this talk Asymmetric Processors Fast core

5 5 Machine configurations All-slow (SMP)All processors running at their lowest frequency Half-half (AMP)Half of the processors running at their highest frequency, rest running at their lower frequency All-fast (SMP)All processors running at their highest frequency M-I experiments have 8 threads, M-II experiments have 16 threads AMPs emulated using SpeedStep/PowerNow Machine-I2 Socket 1.87 GHz Quad-core Intel Xeon 4MB L2 cache, 8GB RAM, 40GB HDD, RHEL 5 Machine-II4 Socket 2 GHz Quad-core AMD Opteron 8350 2MB L3 cache, 32GB RAM, 1TB HDD, RHEL 4

6 6 Power Measurement Using Extech 380801 Power Analyzer Total system power consumption Experiment Machine Windows Machine Power Cable Serial Cable Power Socket

7 7 PARSEC Benchmark Suite Desktop-oriented multithreaded benchmark suite –Multithreaded –Animation, Data Mining, Financial Analysis –Pthreads, OpenMP

8 8 Performance of PARSEC benchmarks On average, performance of half-half is between that of all-slow and all-fast Execution Time slow-limitedmiddle-perfunstable

9 9 barrier (a) slow-limited (b) middle-perf(c) unstable Classification of Benchmarks

10 10 In half-half/all-slow, total energy consumption is higher even though average power consumed might be lower Energy Consumption of PARSEC Energy consumption slow-limitedmiddle-perf

11 11 Observations –Different applications behave differently on AMPs –Usually SMP with fast processors saves energy Behavior of Parsec Benchmarks

12 12 Why do different applications behave differently on AMPs?

13 13 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work

14 14 Thread Interactions Sources of thread interactions Critical Sections Barriers

15 15 Case (a) Critical section Useful work Case (b) Waiting Critical Sections (CS) Waiting to enter CSs

16 16 Waiting for other threads to finish barrier Barriers barrier

17 17 Effect of Critical Section length CS limited application As critical section length increases, the average power consumed decreases Normalized Power Consumption

18 18 Effect of Critical Section length Normalized Execution Time CS limited application

19 19 Effect of Critical Section length Performance of AMPs sensitive to CS length Normalized Execution Time CS limited application

20 20 Effect of Critical Section length Energy consumption shows the same trend Normalized Energy Consumption CS limited application

21 21 Effect of Critical Section frequency Both length and frequency of CS affect performance and energy consumption As frequency increases, performance difference between half-half and all-fast reduces If majority of the execution time is spent waiting for locks, it is OK to have a few slow processors Results available in the paper

22 22 Effect of Barriers For few barriers, half-half performs similar to all- slow For large number of barriers, half-half performs similar to all-fast Results available in the paper

23 23 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work

24 24 Motivation: better run-time adaptivity Each thread requests for more work after completing the assigned work OpenMP, Intel Thread Building Blocks Dynamic Scheduling

25 25 Dynamic Scheduling Can help improve performance and reduce energy consumption in AMPs Should be preferred to static and guided policies Machine configuration Normalized Execution Time Normalized Energy Consumption Static/Dynamic 16 @ 1 GHz (SMP)1.0 16 @ 1.2 GHz (SMP)0.830.87 16 @ 1.4 GHz (SMP)0.710.78 16 @ 1.7 GHz (SMP)0.590.68 16 @ 2 GHz (SMP)0.500.61 8 @ 1 GHz, 8 @ 2 GHz (AMP)1.00/0.671.05/0.73 8 @ 1.2 GHz, 8 @ 2 GHz (AMP)0.83/0.630.90/0.70 8 @ 1.4 GHz, 8 @ 2 GHz (AMP)0.71/0.590.80/0.67 8 @ 1.7 GHz, 8 @ 2 GHz (AMP)0.59/0.540.69/0.63 Parallel-for application

26 26 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work

27 27 Scheduling in AMPs Longest Job to a Fast Processor First (LJFPF) [Lakshminarayana’08] barrier Fast core Slow core

28 28 How Does the Scheduler Know Length of work? Current mechanism: application sends task length information On-going work: Prediction mechanism

29 29 LJFPF ITK: Medical image processing applications (OpenSource) MultiRegistration (Registration method) –kernel with 50 iterations –50 iterations divided among 8 threads Normalized Execution TimeNormalized Energy Consumption

30 30 Outline Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling Conclusion and Future Work

31 31 Conclusion & Future Work Conclusion Evaluated the performance/energy consumption behavior of multithreaded applications in AMPs For symmetric workloads –With little thread interaction: SMP with fast processors –With a lot of thread interaction: AMP could be better For asymmetric threads – AMP could provide lowest energy consumption Future Work Predict application characteristics and use predicted information for thread scheduling on AMPs

32 32 Thank you!


Download ppt "Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute."

Similar presentations


Ads by Google