Presentation is loading. Please wait.

Presentation is loading. Please wait.

SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.

Similar presentations


Presentation on theme: "SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra."— Presentation transcript:

1 SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra Fedorova School of Computing Science, Simon Fraser University, Vancouver, Canada

2 SYNAR Systems Networking and Architecture Group Architectural Signatures in a Nutshell Task: – to schedule jobs appropriately given a variety of different cores available Caveats: – Scheduler doesn’t know job behaviour a priori – Scalability: hundreds of cores potentially available Our approach: – Analyze job performance offline – Describe findings in a job’s architectural signature – Scheduler uses signatures to make intelligent core assignment decisions

3 SYNAR Systems Networking and Architecture Group Talk Outline Background Methodology Results Summary and Future Work

4 SYNAR Systems Networking and Architecture Group Background: Heterogeneous CPUs Heterogeneous CPUs = several types of cores: – Simple vs. Complex: cache size, issue width, presence of advanced features, power consumption – Specialized (possibly) Example: many FPUs Expose a common ISA May contain 100s or 1000s of cores (“manycore”) Bottom line: better efficiency = saved power Future: heterogeneous multi- and manycore CPUs Now: homogeneous multicore CPUs ComplexSimpleSpecialized Cores:

5 SYNAR Systems Networking and Architecture Group Background: Heterogeneous Scheduling Scheduler needs to be aware of: – underlying core features – job performance on various cores Otherwise, no informed scheduling decision can be made => no benefit from heterogeneity Scheduler ?

6 SYNAR Systems Networking and Architecture Group Architectural Signature Approach A signature is provided along with the job binary. Signatures – are constructed offline – are μarch.-independent – provide guidance for selecting appropriate cores Scheduler

7 SYNAR Systems Networking and Architecture Group Talk Outline Background Methodology Results Summary and Future Work

8 SYNAR Systems Networking and Architecture Group Constructing Signatures OFFLINE ANALYSIS Generate performance-predicting metrics that a scheduler is able to use Examples: optimal cache size, inherent ILP, clock speed sensitivity PREDICTION MODEL Create a model for generating meaningful performance-predicting metrics from collected profiling data SCHEDULING Interpret performance-predicting metrics and schedule OFFLINE PROFILING Collect microarchitecture-independent profiling data Examples: instruction mix, memory access patterns

9 SYNAR Systems Networking and Architecture Group Case Study: Clock Speed Sensitivity Frequency changes affect different jobs differently. Clock speed sensitivity is the means to capture these differences. Completion time at different clock speeds

10 SYNAR Systems Networking and Architecture Group Offline Profiling We use MICA, a custom toolkit for Pin by Hoste and Eeckhout [2] (http://trappist.elis.ugent.be/~kehoste/MICA/).http://trappist.elis.ugent.be/~kehoste/MICA/ MICA gathers a variety of μarch.-independent metrics. For clock speed sensitivity, we want reuse distance data.

11 SYNAR Systems Networking and Architecture Group Offline Analysis Reuse distances are used to estimate abstract L2 cache miss rates. L2 cache miss rates are used to estimate clock speed elasticity, a metric that puts a number on sensitivity. – requires a prediction model for elasticity as function of cache miss rate (see next slide) Elasticity values are placed into the architectural signature.

12 SYNAR Systems Networking and Architecture Group Prediction Model The graph shows a mapping of SPEC CPU benchmarks displaying estimated L2 miss rates and clock speed elasticity We build a linear model and then use it to predict elasticity during offline analysis Constructed once, it can be used for all future analysis, unless a better model is proposed More sensitive Less sensitive

13 SYNAR Systems Networking and Architecture Group Scheduling Recall: the architectural signature contains elasticity values Elasticity is straightforward to interpret Using elasticity, the scheduler categorizes jobs into: highly, moderately and insensitive Finally, we’re ready to schedule

14 SYNAR Systems Networking and Architecture Group Clock Speed Sensitivity Data Flow MICA reuse distance data abstract L2 cache miss rates clock speed elasticity values clock speed sensitivity category

15 SYNAR Systems Networking and Architecture Group Talk Outline Background Methodology Results Summary and Future Work

16 SYNAR Systems Networking and Architecture Group Evaluating Clock Speed Sensitive Scheduling Completion times with our clock speed aware prototype normalized to completion times with the default Linux 2.6.18 scheduler Highly heterogeneous workload. Two 2GHz cores, two 3GHz cores Balanced workload. One of each of 2GHz, 2.33GHz, 2.67GHz, 3GHz cores Uniform workload. Two 2GHz cores, two 3GHz cores.

17 SYNAR Systems Networking and Architecture Group Talk Outline Background Methodology Results Summary and Future Work

18 SYNAR Systems Networking and Architecture Group Summary A framework for developing microarchitecture- independent architectural signatures to assist heterogeneity-aware scheduling Proof of concept: clock speed aware scheduling Results: tangible benefits even on mildly heterogeneous platforms – up to 4% average throughput increase on a multicore system with 2GHz and 3GHz cores

19 SYNAR Systems Networking and Architecture Group Future Work Extend our framework to include other core characteristics (cache size, issue width,..) Develop and analyze a heterogeneity-aware scheduler in a real operating system (Sun Solaris) Compare that scheduler with other heterogeneity-aware schedulers

20 SYNAR Systems Networking and Architecture Group References [1] M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. In Proceedings of the Conference on Computing Frontiers, 2006 [2] K. Hoste and L. Eeckhout. Microarchitecture-Independent Workload Characterization. IEEE Micro Hot Tutorials, 27(3):63- 72, 2007. [3] R. Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, N. Jouppi, and K. Farkas. Single-ISA Heterogeneous Multicore Architectures for Multithreaded Workload Performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004

21 SYNAR Systems Networking and Architecture Group Appendix A: Existing Approaches Algorithms by Becchi [1] and Kumar [3] These rely on performance monitoring to determine optimal assignment. Potential drawbacks: – don’t scale well to many types of cores – limited applicability to short- lived threads Scheduler

22 SYNAR Systems Networking and Architecture Group Appendix B: Inputs Sets and Performance Varying input sets can drastically affect performance – ref vs. test input in SPEC CPU2000 One architectural signature can provide for at most one input Difficult problem that we are not currently tackling There are smart ways to create parameterized approximations that account for data input size: – Y. Zhong, S. G. Dropsho and C. Ding. Miss rate prediction across all program inputs. In Proceedings of Parallel Architechtures and Compilation Techniques, 2003.

23 SYNAR Systems Networking and Architecture Group Appendix C: Elasticity We need two measurements of completion time at two different frequencies Then we calculate clock speed elasticity of completion time as follows (E = Elasticity, T = Completion time, F = clock speed): The larger the magnitude, the more sensitive is the completion time to clock speed In this case, -1.0 is considered very elastic (sensitive), because it means that an increase in frequency by a factor of X will decrease the completion time by the same factor.

24 SYNAR Systems Networking and Architecture Group Appendix D: Different Cache Sizes L2 miss rates (and elasticity) depend heavily on cache size => it has to be taken into account Solution: calculate miss rates and elasticity for common cache configurations, the scheduler picks appropriate Reasonable approach, because cache size aware scheduling takes precedence before clock speed aware scheduling


Download ppt "SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra."

Similar presentations


Ads by Google