Canturk ISCI Margaret MARTONOSI

Slides:



Advertisements
Similar presentations
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
1 OS Driven Core Selection for HCMP Systems Anand Bhatia, Rishkul Kulkarni.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
Project Proposal Presented by Michael Kazecki. Outline Background –Algorithms Goals Ideas Proposal –Introduction –Motivation –Implementation.
NTPT: On the End-to-End Traffic Prediction in the On-Chip Networks Yoshi Shih-Chieh Huang 1, June 16, Department of Computer Science, National Tsing.
Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Revisiting Load Value Speculation:
Parapet Research Group, Princeton University EE Vice-Versa Talk #2 Apr 29, 2005 Phase Analysis on Real Systems Canturk ISCI Margaret MARTONOSI.
Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget Represented by: Majid Malaika Authors:
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
Energy Management in Virtualized Environments Gaurav Dhiman, Giacomo Marchetti, Raid Ayoub, Tajana Simunic Rosing (CSE-UCSD) Inside Xen Hypervisor Online.
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Princeton University Electrical Engineering 12th International Symposium on High-Performance Computer Architecture HPCA-12, Austin, TX Feb 14, 2006.
Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Best detection scheme achieves 100% hit detection with
GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.
The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.
Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance.
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
PPEP: online Performance, power, and energy prediction framework
Supervised Learning Based Model for Predicting Variability-Induced Timing Errors Xun Jiao, Abbas Rahimi, Balakrishnan Narayanaswamy, Hamed Fatemi, Jose.
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
What we need to be able to count to tune programs
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Tosiron Adegbija and Ann Gordon-Ross+
Canturk Isci Advisor: Margaret Martonosi
Module 3: Branch Prediction
Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors
Haishan Zhu, Mattan Erez
Phase Capture and Prediction with Applications
Energy Efficient Scheduling in IoT Networks
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Shane Case and Kanad Ghose Dept. of Computer Science
CARP: Compression-Aware Replacement Policies
Lecture 10: Branch Prediction and Instruction Delivery
Canturk ISCI Gilberto CONTRERAS Margaret MARTONOSI
Hardware Counter Driven On-the-Fly Request Signatures
Request Behavior Variations
Garbage Collection Advantage: Improving Program Locality
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Phase based adaptive Branch predictor: Seeing the forest for the trees
Canturk Isci Gilberto Contreras Margaret Martonosi
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Canturk ISCI Margaret MARTONOSI Phase Detection and Prediction on Real Systems for Workload-Adaptive Power Management Canturk ISCI Margaret MARTONOSI Talk will present Recent study on phase abaliz on pwr char-n Where we look at the problem from two different angles Namely c-f-b and e-c-b approaches and Eval their responses

Canturk Isci - Margaret Martonosi Program Phases Distinct and often-recurring regions of program behavior How can we detect recurrent execution under real system variability? How can we predict future phase patterns? How can we leverage predicted phase behavior for workload-adaptive power management? Can we do better than simple, reactive methods? Useful for: Characterizing execution regions Use current phase/behavior to predict future behavior Managing dynamic adaptation Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi Research Overview Monitor application execution via specific features Classify features into phases Detect/Predict phase behavior Apply dynamic power management guided by phase predictions Validate with real measurements Dynamic Management Power Estimation Phase Analysis Power Estimation Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Application Real Measurements Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi This Talk Application Track memory accesses per instruction (Mem/Uop) via performance counters Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Classify execution into phase patterns based on Mem/Uop rates Predict future behavior with the Global Phase History Table (GPHT) predictor This talk a specific recent project Power Estimation Use phase predictions to guide dynamic voltage and frequency scaling (DVFS) Phase Classification Phase Prediction Phase Analysis Dynamic Management Real Measurements Canturk Isci - Margaret Martonosi

From Execution to Phases Mem/Uop Phases 0.020 0.015 Mem/Uop Rate 0.010 0.005 1 2 3 4 5 Phases 0.000 2.80E+10 2.90E+10 3.00E+10 3.10E+10 3.20E+10 3.30E+10 Cycles Assign different Mem/Uop ranges to different phases Higher phase number  more memory bound phase Phase patterns expose available recurrence! Simple phase definition  Resilient to system variations Invariant to dynamic power management actions Canturk Isci - Margaret Martonosi

Predicting Phases with the GPHT PHT Tags PHT Pred-n Age / Invalid Pt’ Pt’-1 Pt’-2 … Pt’-N Pt’ Pt’-1 Pt’-2 … … … … Pt’-N Pt’+1 15 20 : -1 GPHR Pt-1 Pt-2 … Pt-N Pt Pt-N-1 Pt’’ Pt’’ Pt’’-1 Pt’’-2 … Pt’’-N Pt’’-1 Pt’’-2 … … … … Pt’’-N Pt’’+1 Pt’’+1 : : : : : : : : : GPHR depth PHT entries Pt Pt : : : : : : : : : : : : : : : : : : Last observed phase from performance counters P0 P0 P0 … … … … P0 P0 GPHR depth Predicted Phase From GPHR(0) if no matching pattern From the corresponding PHT Prediction entry if matching pattern in PHT Similar to a global history branch predictor Implemented in OS for on-the-fly phase prediction Canturk Isci - Margaret Martonosi

Prediction Accuracies 100 90 80 LastValue Prediction Accuracy (%) 70 PHT:1024, GPHR:8 60 PHT:128, GPHR:8 PHT:64, GPHR:8 50 PHT:1, GPHR:8 40 gzip_log mcf_inp gcc_200 gap_ref gcc_166 apsi_ref gcc_scilab gcc_expr ammp_in parser_ref mgrid_in applu_in equake_in wupwise_ref gcc_integrate bzip2_program bzip2_source bzip2_graphic Compare to reactive approaches (Last Value prediction) GPHT performs significantly better for highly varying applications Up to 6X and on average 2.4X misprediction improvement Good performance down to 128 PHT entries Converges to last value as PHT entries  1 Canturk Isci - Margaret Martonosi

Phase Driven Dynamic Power Management Phase definitions  Memory boundedness  DVFS potential Each predicted phase  Corresponding (V,f) setting Implementation overview: Now we can use these phases to guide dynamic power mgmt Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi Complete Example Mem/Uop (GPHT) ACTUAL_PHASE PRED_PHASE (GPHT) 0.000 0.004 0.008 0.012 0.016 0.020 0.024 Mem/Uop GPHT can accurately predict varying application behavior! 1 2 3 4 5 Phases 2 4 6 8 10 12 14 Significant power savings compared to baseline! Power (Baseline) Power (GPHT) Power [W] 0.3 0.6 0.9 1.2 1.5 1.8 2.1 BIPS (Baseline) BIPS (GPHT) Negligible performance degradation! BIPS 1.5E+09 2.0E+09 2.5E+09 3.0E+09 3.5E+09 4.0E+09 4.5E+09 5.0E+09 Instructions Canturk Isci - Margaret Martonosi

Improvement over Reactive Methods 7% EDP improvement over reactive methods! Comparable or less performance degradation! Plots show EDP impr. And perf degr. For GPHT and last val, wrt baseline exec-n Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi Conclusions Phase characterizations help identify repetitive application behavior under real-system variability and dynamic management actions Runtime phase predictions with the Global Phase History Table can accurately predict future application behavior Up to 6X and on average 2.4X less mispredictions than reactive approaches Dynamic power management guided by these phase predictions help improve system power/performance efficiency 27% EDP improvements over baseline and 7% over reactive approaches Presented research framework and real-system experiments can guide phase-oriented characterization and dynamic adaptation applications In this work, we showed our observations with a real sys exp-n to eval workload pwr char-n with control flow and event counter based features The results of our study showed : We hope that Resulting experimental framework and observations can guide phase-oriented characterization and system adaptation work on real systems Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi Thanks! Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi EXTRAS 1.1) Why care about phases examples 1.2) Why care about pwr phases examples 1.3) What are different features that prev studies looked at? 2) Experiment setup details Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi 1.1) Why Care About Phases? Characterizing execution regions E1 E2 E3 E4 Summarize exec. Into repr exec. regions Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi 1.1) Why Care About Phases? Characterizing execution regions Managing dynamic adaptation OFF ON Dynamic/adaptive mgmt Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi 1.1) Why Care About Phases? Characterizing execution regions Managing dynamic adaptation Use current phase/behavior to predict future behavior 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3 8 13 Time [s] Load Refs Store Misses Canturk Isci - Margaret Martonosi

1.2) Why Care About Power Phases? Useful for: Guiding power budget / temperature limit management Slow down! Power [W] Temp. [oC] Time [s] Uncontrolled T Enforced T I.e. Montecito/Foxton I.e. Montecito/Foxton Canturk Isci - Margaret Martonosi

1.2) Why Care About Power Phases? Useful for: Guiding power budget / temperature limit management Power/Temperature aware scheduling Power [W] This helps in 2 ways: Reduce cooling cost/heat removal rate for a server Extend battery life for a mobile as less cooling power/time is needed Time [s] [Bellosa et al. COLP’03] Canturk Isci - Margaret Martonosi

1.2) Why Care About Power Phases? Useful for: Guiding power budget / temperature limit management Power/Temperature aware scheduling Power balancing for multiprocessor systems/activity migration Power Power Task1 Task2 Swap hot task Migrate hot task Or Slow down hot core Core/μP 1 Core/μP 2 Speed up! Slow down! Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi Older Canturk Isci - Margaret Martonosi

Canturk Isci - Margaret Martonosi This Talk Classify application execution into phases based on HW performance counters Predict phase behavior Apply dynamic power management guided by phase predictions Validate with real measurements Application Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Power Estimation Power Estimation Phase Analysis Dynamic Management Real Measurements Canturk Isci - Margaret Martonosi

Predicting Phases with the GPHT PHT Tags PHT Pred-n Age / Invalid Pt’’ Pt’’-1 Pt’’-2 … Pt’’-N Pt’ Pt’-1 Pt’-2 Pt’-N : P0 Pt’’+1 Pt’+1 : P0 15 20 : -1 GPHR Pt Pt-1 Pt-2 … … … … Pt-N GPHR depth PHT entries Pt Last observed phase from performance counters GPHR depth Predicted Phase From GPHR(0) if no matching pattern From the corresponding PHT Prediction entry if matching pattern in PHT Similar to a global history branch predictor Implemented in OS for on-the-fly phase prediction Canturk Isci - Margaret Martonosi