Presentation is loading. Please wait.

Presentation is loading. Please wait.

Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Similar presentations


Presentation on theme: "Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,"— Presentation transcript:

1 Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture Institute of Computing Technology, C.A.S. Univ. of Chinese Academy of Sciences

2 Trends in Cloud Computing  The increasing computing demands  More massive  More diverse  High service level agreement(response time, throughput)  The computing platform to meet these demands  Multicore to manycore  Homogeneous to heterogeneous

3 Two Orthogonal Ways to Boost Performance  Scale-out speedup: explore many cores for higher thread-level parallelism  Scale-up speedup: explore heterogeneous cores for optimal application-core mapping

4 Quantifying Scale-out and Scale-up Speedup  The overall performance Indicate how to improve overall performance of each application. How to figure out the application-specific scale-out and scale-up speedup?

5 Amphisbaena: an Analytical Approach to Model Performance  Amphisbaena, or shortly,  Modeling the overall performance speedup coming from two orthogonal ways I’m The ratio of performance on target cores to current cores under the same multithreading configuration. The ratio of performance on target multithreading configuration to current configuration on the same type of cores.

6 Experimental Setup cluster-based layout distributed, banked LLC directory-based MOESI protocol

7 Scale-out Speedup – the serial part. – the parallelizable part. – the multithreading penalty.

8 Observation – modulating constant. – synchronization waiting cycles per kilo- instructions(SPKI). – thread number. – modulating constant. – misses waiting cycles per kilo- instructions(MPKI). – thread number squared.

9 The Details of Multithreading Penalty offline online

10 Alpha Model Accuracy Our error is under 5% on average, which outperforms the error of Amdahl’s Law with error of 11.4%.

11 Scale-up Speedup the frontend: issue width W [Big, Small] the backend: ROB size R[Big, Small] How to predict the CPI on various type of cores? SBSB SBSB SBSB SBSB BBBB BBBB SSSS SSSS C0C1 C2 C3

12 Observation  this trend is well approximated by a power law.  this trend fits an exponential function well.

13 The Details of CPI Model  memory intensity.  computing intensity.  bias. offline online

14 Beta Model Accuracy Our error is kept below 8% on average, which outperforms the error of PIE with error of 12.2%.

15 Phi Model Accuracy The prediction error of overall performance is kept below 12% on average.

16 Orthogonality Validation  three measured values. For most applications, the error about orthogonality is below 5% on average.

17 Application of Phi Model  Using Phi for runtime management Predict the performance speedup coming from scale-out and scale-up on any other target configurations online. optimal configuration maximizing performance Invoke scheduling algorithm to figure out the optimal configuration in terms of maximizing performance. The operating system enables the specified multithreading and application-core mapping.

18 Phi Scheduling D out D up Phi “application with higher scale-out speedup should spawn more thread.” “application with largest scale-up speedup is allocated with the fastest type of cores.” “decide the thread number to spawn for each application.” “decide the cores to map for each application.” “Phi scheduling use the heuristic algorithm to maximize performance.” function policy algorithm

19 Performance Comparison Phi averagely outperforms the other three baselines by 12.2% (Static), 13.3% (Bias) and 12.9% (PIE).

20 Related Works  Performance prediction and optimization periodically  Only decided the number of threads/active cores CPR: Composable Performance Regression for Scalable Multiprocessor – [Benjamin C. Lee etc. MICRO2008] FDT: Feedback-Driven Threading Power-Efficient and High- Performance Execution of Multi-threaded Workloads on CMPs – [M. Aater Suleman etc. ASPLOS2008]  Only decided the type of heterogeneous cores Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance – [Rakesh Kumar etc. ISCA2004] Scheduling Heterogeneous Multi-cores Through Performance Impact Estimation (PIE) – [Kenzo Van Craeynest etc. ISCA2012]

21 Conclusion  Analytical model for performance prediction  Scale-out speedup  Scale-up speedup  Overall performance  Phi scheduling  Apply for runtime management  Return optimal performance

22 Thanks for Your Attention


Download ppt "Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,"

Similar presentations


Ads by Google