Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp.

Similar presentations


Presentation on theme: "The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp."— Presentation transcript:

1 The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp. 32 nd Annual International Symposium on Computer Architecture

2 2 Performance asymmetry -Architectural differences -Micro-architectural parameters -Other -Heat: Thermal throttling Why need asymmetry now? -CMP/ Many cores as commodity systems -Run variety of workloads -Good serial performance and high throughput -Optimal energy consumption... difference in compute power of processors Assume an asymmetric multicore system FF SS

3 3 Asymmetry & MT workloads Performance Compute power Scalable? N procs. Diff configs. Need to utilize asymmetry. perform better Need predictable and robust performance FF SS SS SS Performance Same/Many Runs Stable? N procs. Same config.

4 4 The problems Algorithm, Correctness, Thread Partitioning Programmers Don’t reason about asymmetry Characteristics of threads Partitioning, Synchronization barriers, Interference, Lifetime Scheduling of threads OS Kernel, Library, Application, DB/Web servers, Managed runtime systems (Java,.NET)

5 5 Contributions Asymmetry negatively affects applications - Studied many workloads on real hardware - Observed unpredictable workload behavior This can be fixed by - Evaluating threads’ work partitioning -Scheduling of threads with asymmetry

6 6 Outline Asymmetry and Performance Evaluation Methodology Asymmetric Configurations Workloads and Results

7 7 Evaluation methodology Asymmetry in real hardware - Intel 4-way 3-GHz Xeon - Different cores run at different frequencies - Software controlled Benefits - Long real-time runs (no simulations) - Workloads are setup according to specs - Representative of other forms of asymmetry - Communication - Micro-architecture etc.

8 8 Configurations FF FF SS SS FF SS FF F S F S SS SymmetricAsymmetric all fastall slow1 slow2 slow3 slow F = Full frequency S = one-eighth of Full frequency (in talk and paper) S = one-fourth of Full frequency (in paper)

9 9 Studying impact 3 slow 2 slow1 slow all slow all fast Perf. Metric (Asymm) Same or Many runs Perf. Metric ScalabilityStability

10 10 SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Workloads evaluated Middle-tier business apps. Throughput parallel Webservers Throughput parallel Task-based parallelization Embarrassingly parallel

11 11 SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Impact of asymmetry Scalable StableWorkloads                   Fix 

12 12 Managed runtime system (BEA JRockit & Sun HotSpot) Windows 2003 and Linux 2 GCs- Parallel and Gen. Concurrent. Only Minor GC Upto 20 threads Minimal communication SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Workloads

13 13 SPECjbb Stability (JRockit/Gencon GC) on 2 slow -Problem: Interference from runtime system (JVM, GC) 4 runs with kernel fix -Fix: Kernel scheduler moves jobs from slow to fast if free Scalable?  Stable? 

14 14 SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Workloads Webserver on Linux Thread-based vs. Event-based model ApacheBench Raw perf. with static page Light and heavy loads

15 15 Apache -Problem: light load - threads can be on fast/slow -No issues under heavy load -Fixes: Kernel scheduler or shorter lifetime of threads Scalability & Stability (light load) Stable?  Scalable? 

16 16 Zeus Scalability & Stability -Under heavy and light loads: unpredictable -Superior perf. on symmetric configs. -Problem: Aggressive application-level scheduling Stable?  Scalable? 

17 17 SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Workloads OMP: Scientific app. Loop-based parallelization Intel Fortran,OpenMP on Linux H.264: Media encoding OpenMP on Windows 2003 PMake: Parallel Make of Linux Kernel

18 18 SPECOMP -OpenMP schedules tasks assuming equal perf. procs. -Problem: Fast processors are held by slow Scalability with app. fix -Fix: Change scheduling of tasks to on-demand -Downside: Overheads Scalable?  Stable? 

19 19 H.264 & PMake -H.264 slows down significantly with 1 slow proc. -Speeds up with 1 fast proc. H.264PMake -PMake linearly scalable on all configurations Scalable?  Stable? 

20 20 App. fix Kernel fix Scalable Stable                   Fix  SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Impact of asymmetry Query parallelization not aware of asymm. Intra-query parallelization worsens stability. OpenMP based parallelization with sync. barriers. Fast cores held by slow. Interference from runtime system. Garbage collector dependent. Concurrent GC causes more problems. Robust, multi-tier application. Feedback tunes the workload. Very responsive to interference, small heaps etc. Thread serves many requests to reduce overheads. Problems with light load. Threads can map to fast or slow proc. Superior perf. in symmetric system Unpredictable on asymm. with heavy and light loads. Independent application scheduling Robust application. Heavy utilization. Threads well-balanced and abundant. Multi-programming with several tasks. Migrate tasks from slow to fast core if one is free. Inspect runtime software, interference between threads (GC). Migrate tasks from slow to fast core if one is free. Or, Handle few requests and recycle threads. High overhead, low perf. Reconsider application scheduling Approx. application change by reducing degree of Parallelization. Fix application scheduler. Consider asymm. in query optimization engine. Assign tasks on-demand instead of up-front. Make OpenMP understand asymm.

21 21 Conclusions Asymmetric systems - Good for energy and performance - But can introduce unpredictability Software to understand asymmetry - Evaluate application’s work partitioning - Scheduling of tasks. Mostly no other changes. - May be, feedback based Suitable asymmetry - Many slow & few fast processors

22 Questions?


Download ppt "The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp."

Similar presentations


Ads by Google