ECE 692 20091 Power Control for Chip Multiprocessors Xue Li Oct 27, 2009.

Slides:



Advertisements
Similar presentations
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
Advertisements

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.
Introductory Control Theory I400/B659: Intelligent robotics Kris Hauser.
Tunable Sensors for Process-Aware Voltage Scaling
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures J. Winter and D. Albonesi, Cornell University International Conference on Dependable.
Ensuring Robustness via Early- Stage Formal Verification Multicore Power Management: Anita Lungu *, Pradip Bose **, Daniel Sorin *, Steven German **, Geert.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Power Aware Virtual Machine Placement Yefu Wang. 2 ECE Introduction Data centers are underutilized – Prepared for extreme workloads – Commonly.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
Proactive Prediction Models for Web Application Resource Provisioning in the Cloud _______________________________ Samuel A. Ajila & Bankole A. Akindele.
1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
SIGMETRICS 2008: Introduction to Control Theory. Abdelzaher, Diao, Hellerstein, Lu, and Zhu. CPU Utilization Control in Distributed Real-Time Systems Chenyang.
Yefu Wang and Kai Ma. Project Goals and Assumptions Control power consumption of multi-core CPU by CPU frequency scaling Assumptions: Each core can be.
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.
An Analytical Performance Model for Co-Management of Last-Level Cache and Bandwidth Sharing Taecheol Oh, Kiyeon Lee, and Sangyeun Cho Computer Science.
CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Physical Layer Informed Adaptive Video Streaming Over LTE Xiufeng Xie, Xinyu Zhang Unviersity of Winscosin-Madison Swarun KumarLi Erran Li MIT Bell Labs.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget Represented by: Majid Malaika Authors:
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma.
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
MSE-415: B. Hawrylo Chapter 13 – Robust Design What is robust design/process/product?: A robust product (process) is one that performs as intended even.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
CONTROL ENGINEERING IN DRYING TECHNOLOGY FROM 1979 TO 2005: REVIEW AND TRENDS by: Pascal DUFOUR IDS’06, Budapest, 21-23/08/2006.
Secure In-Network Aggregation for Wireless Sensor Networks
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Present by Sheng Cai Coordinating Power Control and Performance Management for Virtualized Server Clusters.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Adaptive Control Loops for Advanced LIGO
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
ECE692 Course Project Proposal Cache-aware power management for multi-core real-time systems Xing Fu Khairul Kabir 16 September 2009.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.
Best detection scheme achieves 100% hit detection with
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Estimation of the critical temperature ratio
Xiaodong Wang, Shuang Chen, Jeff Setter,
System Control based Renewable Energy Resources in Smart Grid Consumer
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Presentation transcript:

ECE Power Control for Chip Multiprocessors Xue Li Oct 27, 2009

ECE Outline Two ways to control power of chip multiprocessors –MPC control with online model estimation –Simple closed loop control with risk evaluation

ECE Temperature-Constrained Power Control for Chip Multiprocessors with Online Model Estimation Yefu Wang, Kai Ma, Xiaorui Wang

ECE Introduction Power and thermal are the major constraints for further throughput improvement of CMP –Peak power consumption of a CMP should be controlled to enable higher computing densities. –The temperature of a CMP should be kept lower than a threshold in case of thermal failures. –Performance delivered per watt needs to be maximized.

ECE State of the Art Power control for CMP –Open-loop search or optimization [Isci’06], [Teodorescu’08], etc. Highly dependent on the accuracy of the system model –Heuristics [Isci’06], [Meng’08], etc. No theoretical guarantee of control accuracy/stability –Chip-wide DVFS (Dynamic Voltage and Frequency Scaling) [McGowen’06], [Floyd’07], etc. Suboptimal in performance Dynamic thermal management –Heuristics or feedback control theory [Brooks’01], [Skadron’03], etc. Power and temperature are controlled separately

ECE Challenges and Solutions Multiple cores may need to be manipulated simultaneously to control both power and temperature. Multi-Input-Multi-Output (MIMO) control Optimal control algorithms need to be designed for power shifting among different cores. Model predictive control (MPC) theory Different cores may be coupled together. Specific design constraints Workload is unpredictable at design time. Online parameter estimation Control accuracy and system stability is critical Theoretically guaranteed control performance and stability

ECE Temperature-Constrained Power Control MIMO control loop invoked periodically –Power monitor sends the chip-level power consumption to the controller –Controller reads temperature and performance metrics of each core –Controller computes new DVFS levels based on MPC control theory –New per-core DVFS levels are sent to the cores –Online model estimator updates the power model

ECE Steps of Model Predictive Control System modeling –Power model Controller design –MPC controller design –Constrains: Frequency range Power budget Temperature Other design requirements System stability analysis

ECE Steps of Model Predictive Control System modelingSystem modeling –Power model Controller design –MPC controller design –Constrains: Frequency range Power budget Temperature Other design requirements System stability analysis

ECE System Modeling: Power Model (1) Power consumption of one core A Estimated system parameters Initial value can be defined by system identification May change for different workloads Can be updated by online estimation

ECE System Modeling: Power Model (2) Total power consumption of CMP Power model validation Total power consumption of the chip

ECE Steps of Model Predictive Control System modeling –Power model Controller designController design –MPC controller design –Constrains: Frequency rangeFrequency range Power budgetPower budget TemperatureTemperature Other design requirementsOther design requirements System stability analysis

ECE Controller Design: MPC Controller Control objective: minimize the cost function Control accuracyPerformance optimization Model prediction Measured from power meter: feedback

ECE Controller Design: Constraints (1) Physical frequency range Power budget for each core Other design requirements

ECE Controller Design: Constraints (2) Model between temperature and frequency –Temperature & power –Power & frequency Temperature constraint

ECE Steps of Model Predictive Control System modeling –Power model Controller design –MPC controller design –Constrains: frequency range power budget temperature other design requirements System stability analysisSystem stability analysis

ECE Controller Design: Stability Analysis Stability: –Converge to desired bounds from any initial condition Unknown system gain: –Actual system parameter, estimating system parameter –The bigger range, the better system adaptability. The system is proved to be stable in a wide range –Uniform workload 0< g ≤ 8.83 –Different workload 0 < g 1 ≤ < g 2 ≤ 17.6 The model can work as long as the real parameter of a system is less than 8.83 times of the value used to design the system.

ECE Online Model Estimation Recursive Least Square (RLS) estimator to update the model periodically –RLS estimator recordsand –The estimator calculates and –The estimator updates with in the system model

ECE System Implementation Power lines (Current signal) Current probe (1mv/A) USB interface Physical TestbedSimulation CPUIntel Xeon X5365Alpha like Cores44, 8, 16 Power MonitorDigital MultiMeterWattch Temp SensorCoretemp driver ControllerSoftware WorkloadSPEC CPU 2006

ECE Experimentation Baselines Empirical results –Control accuracy –Application performance –Temperature constraints –Online model estimator Simulation results –Control accuracy –Application performance

ECE Experimentation Baselines Empirical results –Control accuracy –Application performance –Temperature constraints –Online model estimator Simulation results –Control accuracy –Application performance

ECE Baselines Priority –Per-core DVFS –Heuristic based Power > budget DVFS decreases by 1 Power < budget DVFS increases by 1 Improved priority –Priority with safety margin MaxBIPS –Per-core DVFS –Predictive based: uses a typical workload to build a static table offline –Exhaustive search from combination of DVFS levels for all cores Workload sensitive

ECE Baselines: MaxBIPS Define two N*M matrices: Power and BIPS –N: number of cores –M: number of power modes Fill in the matrices with actual and predictive values –Power: cubic scaling –BIPS: linear scaling Find out the power and core combination to achieve best BIPS under power budget Core1Core2 Mode12016 Mode21714 Core1Core2 Mode18060 Mode26952 Power Matrix BIPS Matrix Actual value ModePower Savings Performance Degradation Mode1None Mode215%5% BIPSPower 80+60= = = = = = = =31 If power budget is 32, last one will be selected

ECE Experimentation Baselines Empirical results –Control accuracy –Application performance –Temperature constraints –Online model estimator Simulation results –Control accuracy –Application performance

ECE Empirical Results: Control Accuracy (1) Comparison of steady state errors –Steady state error: violation of power budget at different power level. –MPC follows the set point well.

ECE Empirical Results: Control Accuracy (2) MPC V.S. MaxBIPS / Priority / Improved Priority Much lower than the set point Fits well Oscillates around the set point Exceeds the budget at times

ECE Empirical Results: Application Performance SPEC performance between MPC, MaxBIPS and improved priority under different power budgets. –MPC achieves better performance because MPC can precisely achieve the set-point power. –Average improvement of MPC is 9.69% over MaxBIPS and 8.95% over Improved Priority.

ECE Empirical Results: Temperature Constraints Emulate a thermal emergency by lowering the temperature constraint –Figure (a) shows that the temperature of cores are quickly constrained to the lower bound. –Figure (b) shows that the temperature constraints works effectively to reduce power consumption.

ECE Empirical Results: Online Model Estimator MPC V.S. MPC with estimator –Workload may change significantly at run time. –Estimator can correct system parameters dynamically. –MPC without estimator suffers large oscillations.

ECE Experimentation Baselines Empirical results –Control accuracy –Application performance –Temperature constraints –Online model estimator Simulation results –Control accuracy –Application performance

ECE Simulation Results: Control Accuracy Simulation with more cores (4, 8, 16) –Average power and standard deviation of different control method. MPC precisely converges to the budget. MaxBIPS’ absence of 16 due to exponentially increase of static prediction table

ECE Simulation Results: Application Performance SPEC benchmark performance comparison under different number of cores (Set point = 95%, 85%)

ECE Conclusion A temperature-constrained chip-level power controller –Designed based on MPC control theory –Accurately controls power consumption –Temperatures of the cores are limited to stay below the constraint. –An online model estimator periodically updates the system model Compared with state-of-the-art work –More accurate power control –Better application performance

ECE Multi-Optimization Power Management for Chip Multiprocessors Ke Meng, Russ Joseph, Robert P. Dick Northwestern University Li Shang University of Colorado

ECE Introduction Power is still a first-class design constraint in CMP era. –Higher transistor density –Higher leakage power Power is still a precious computing resource –When power is limited, maximizing the chip- wide performance requires global and local coordination. High power density Thermal Issues

ECE System Framework Select power optimizations and allowable power modes Collect data from sensors and counters; calculate power /performance. Analyze, search and tune Soft-limit budget

ECE Optimization Pool (1) DVFS –Simple models Frequency: linear with voltage Power: changes cubically with voltage Performance: roughly linear with frequency –High efficiency Cubical relationship between frequency and power

ECE Optimization Pool (2) Cache resizing –Large leakage: big savings –Workload variety: unused private capacity

ECE Models and Experimentations Models –Dynamic voltage / frequency scaling (DVFS) –Cache resizing –Unified analytic models –Risk evaluation –Search algorithms Experimentation –Configuration –Model validation –Model evaluation –Power violation

ECE Models and Experimentations ModelsModels –Dynamic voltage / frequency scaling (DVFS) –Cache resizing –Unified analytic models –Risk evaluation –Search algorithms Experimentation –Configuration –Model validation –Model evaluation –Power violation

ECE Analytic Models: DVFS DVFS modeling –CPI stack counters: counts computing stalls and L2 miss stalls Computing stalls: changes with frequency L2 miss stalls: constant in spite of frequency –Performance model Power: Cubic with frequency

ECE Analytic Models: Cache Resizing Cache resizing modeling –Non-stall cycles –Stall cycles due to cache misses Power: Average leakage power of a cache way times number of active ways

ECE Analytic Models: Unification Unified analytic models with DVFS and cache resizing –Performance Weak interaction among multiple optimization allow independent speed-ups –Power DVFS has a strong influence Additive contribution of cache resizing

ECE Analytic Models: Risk Evaluation Why to do risk evaluation? –Some optimizations are more prone to phase adjustment. –Severe performance loss and power violation. How to do risk evaluation? –DVFS: assume zero risk. –Cache resizing: cache activities variation threshold.

ECE Brute-force search –Traverse all possible power modes –Always find the best combination –Slow when search space are large Greedy search –Take currently best step available Current best step: power mode with the maximal delta power/performance ratio. –Fast –Can get stuck in local minima Results show it happens rarely Analytic Models: Search Algorithms(1)

ECE Models and Experimentations Models –Dynamic voltage / frequency scaling (DVFS) –Cache resizing –Unified analytic models –Risk evaluation –Search algorithms ExperimentationExperimentation –Configuration –Model validation –Model evaluation –Power violation

ECE Experiment: Configuration Processor Setup Cores4 Alpha21264-like cores L1 I/D Cache64KB 2-way private 64B blocks L2 Cache2MB 8-way private 128B blocks Tech node65 nm DVFS Range85%, 90%, 95%, 100% of 3GHz Group No.WorkloadsStability Group Aequake, swim, sixtrack, gccModerate Group Bapplu, gap, facerec, vortexModerate Group Cmesa, eon, lucas, wupwiseStable Group Dart, mcf, parser, vprUn-stable

ECE Experiment: Model Validation Cache CPI model validation

ECE Experiment: Model Evaluation (1) Modeling-greedy vs. modeling- global / trial-and-error –Trial-and-error (DVFS + cache resizing): Starting trial-stage when entering a stable phase Only works with workloads possessing stable phases (Group C). –Analytical modeling (DVFS + cache resizing): 8% perf loss vs. 35% power saving Greedy search works extremely well

ECE Experiment: Model Evaluation (2) Modeling with risk management vs. MaxBIPS –Simple (DVFS + cache resizing): Analytical modeling without risk evaluation. –With risk evaluation: Results either better or almost unchanged. –MaxBIPS (only DVFS): Not always the worst. Difficult to manage multiple optimizations Even with risk evaluation, errors can be made before risk being identified.

ECE Conclusion Power problem is critical in CMP. CMP power management must coordinate global and local power usage. Analytical modeling are more favorable than trial-and-error. Risk evaluation is necessary to avoid frequent prediction errors.

ECE Comparison 1 st paper2 nd Paper ControllingPredictive basedHeuristic based Power budgetHard limitSoft limit Temperature management YesNo L2 cache involvement NoYes Hardware implementation YesNo

ECE Critiques First paper –Temperature constraint seems much higher than normal working condition. –Explanation of in temperature constraints is not very clear. Second paper –Modeling accuracy is low. –No absolute guarantee of power consumption. –Too many arbitrary assumptions.

ECE Thank you