An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.

Slides:



Advertisements
Similar presentations
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
2013/06/10 Yun-Chung Yang Kandemir, M., Yemliha, T. ; Kultursay, E. Pennsylvania State Univ., University Park, PA, USA Design Automation Conference (DAC),
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures J. Winter and D. Albonesi, Cornell University International Conference on Dependable.
Ensuring Robustness via Early- Stage Formal Verification Multicore Power Management: Anita Lungu *, Pradip Bose **, Daniel Sorin *, Steven German **, Geert.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,
Some Opportunities and Obstacles in Cross-Layer and Cross-Component (Power) Management Onur Mutlu NSF CPOM Workshop, 2/10/2012.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Institute of Networking and Multimedia, National Taiwan University, Jun-14, 2014.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
Mathew Paul and Peter Petrov Proceedings of the IEEE Symposium on Application Specific Processors (SASP ’09) July /6/13.
Yefu Wang and Kai Ma. Project Goals and Assumptions Control power consumption of multi-core CPU by CPU frequency scaling Assumptions: Each core can be.
Research Directions for On-chip Network Microarchitectures Luca Carloni, Steve Keckler, Robert Mullins, Vijay Narayanan, Steve Reinhardt, Michael Taylor.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp.
Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor 1 Background Introduction = why 2 Goal 3 What 4 How 5 Experimental Result 6 Advantage.
ECE Power Control for Chip Multiprocessors Xue Li Oct 27, 2009.
Generalized Minimum Bias Models
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Variation Aware Application Scheduling in Multi-core Systems Lavanya Subramanian, Aman Kumar Carnegie Mellon University {lsubrama,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget Represented by: Majid Malaika Authors:
Moinuddin K.Qureshi, Univ of Texas at Austin MICRO’ , 12, 05 PAK, EUNJI.
Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma.
Problem Solving.
Energy Management in Virtualized Environments Gaurav Dhiman, Giacomo Marchetti, Raid Ayoub, Tajana Simunic Rosing (CSE-UCSD) Inside Xen Hypervisor Online.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
OPERETTA: An Optimal Energy Efficient Bandwidth Aggregation System Karim Habak†, Khaled A. Harras‡, and Moustafa Youssef† †Egypt-Japan University of Sc.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Embedded System Lab. 오명훈 Addressing Shared Resource Contention in Multicore Processors via Scheduling.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
CMP Design Space Exploration Subject to Physical Constraints Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu, Kevin Skadron HPCA’06 01/27/2010.
Sunpyo Hong, Hyesoon Kim
E-MOS: Efficient Energy Management Policies in Operating Systems
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
“Temperature-Aware Task Scheduling for Multicore Processors” Masters Thesis Proposal by Myname 1 This slides presents title of the proposed project State.
Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information.
Overview Motivation (Kevin) Thermal issues (Kevin)
Canturk ISCI Margaret MARTONOSI
Software Architecture in Practice
“Temperature-Aware Task Scheduling for Multicore Processors”
Department of Computer Science University of California, Santa Barbara
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Department of Computer Science University of California, Santa Barbara
Presented by Florian Ettinger
Presentation transcript:

An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret Martonosi The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06) Speaker: Jun Shen

Agenda Background Motivation Contribution Details of the contributions Overview of the global power management policy Briefs of the simulation Comparison of the different policies Evaluation methodology Three neglected issues Advance and Drawback of the paper The relationship between the paper and the course The impact of the paper on project Q&A

Background Multicore architecture is more and more popular and widespread due to the famous “walls” Power and temperature problems are becoming more and more crucial

Motivation To solve two “How” questions How to enforce a power budget through global power manager? How to minimize power given a performance target?

Contributions Primarily three contributions The creation of Global power manager (PM) A fast static power management analysis tool Evaluation of different PM policies (with different focus such as prioritization, fairness, throughput)

Overview of global PM (1) Why we need global power manager? – To exploit the widely known variability in demand and characteristics of the workloads (e.g. those across threads (cores)) – To cooperate with the adaptive action of each core with a given power budget

Overview of global PM (2)

Overview of global PM (3) Some preconditions Each core has its own dynamic controller has its power-performance monitor (e.g. current monitor, perf monitoring counter hw) can be running in multiple power modes

Overview of global PM (4) the loop of PM’s work PM periodically collects power-performance data from local monitors PM reports it to OS OS returns power budget, thread affinities, high-level scheduling and load-balancing plan to PM PM decides the power-mode of each core based on those info

Overview of global PM (5) Optional implementation of PM Separate ondie microcontroller with some foxton-alike underlying monitors Separate helper daemon on a dedicated core Low level hypervisor-like program interface

Brief of simulation (1) Based on IBM Turandot simulator Power statistics from IBM PowerTimer The list of core parameter

Brief of simulation(2) Use single-threaded Turando result for each power mode simulation Simulating multicore by simultaneously progressing over Turando-traces, and these traces are the execution of different benchmarks. Validate simulation with a cycle-accurate full CMP imple. of Turandot(???)

Brief of simulation (3) New ideas: – Time-driven L2 – Thread synchronization to handle multiple clock domain mode Experiment result: – Simulation power variation with CMP less than 5% – Performance variation [9%,30%] note:the upper bound is achieved with a highly memory- bound app

Brief of simulation (4) Core power mode Target: PowerSavings amount: PerformanceDegradation amount ratio of 3 : 1 Comparison between objective and experiment estimation

Brief of simulation (5) Target estimation

Global PM policies (1) Policy Introduction Priority: every core has a pre-defined priority, the core with higher priority, then the core has higher voltage---higher throughput Power balancing: try to equal the power consumption of every core. Throughput Optimization: pick up a combination of power mode so that maximizing throughput

Global PM policies (2) Chip-wide DVFS– an alternative Advantage: simple implementation (no synchronization across cores) Disadvantage: – high penalty with few power mode for small power overshoot – Great performance deviation for different type of tasks

Global PM policies (3)

Evaluation Methodology (1) Proposed Evaluation Methodology – Policy curve  Overall performance degradation under several budget (wrt all turbo execution) – Budget curve  Plot the percent of power consumed( with one specific policy) over the original power budget.

Evaluation Methodology (2) Policy Curve

Evaluation Methodology (3) Budget Curve

Evaluation Methodology (4) Other issues: – How about the fairness?--- some cores always get full budget while others always in starvation – Some metrics on fairness: weighted speedup, harmonic mean of thread speedups – Weighted slowdown = harmonic mean of individual speedup wrt turbo execution(harmonic mean stress the most unfairness) – Formula of Speedup = performance with enhancement / baseline performance, in this paper, this is actually a slowdown

Evaluation Methodology (5) Weighted slowdown

Evaluation Methodology (6)

Evaluation Methodology (7) Dynamic adaptability

How to get the knowledge of power/performance behavior of applications? – Careful exploration---try small scale of power change?? – Not suitable for harsh adaptation policy – Set up a profile of each application from past experience?? – Not always reliable Issues (1)

Issues (2) A new solution: – Rationale behind: An application’s behavior at another DVFS setting can be estimated analytically with reasonable accuracy – How to do: Setup core * power mode matrix Power has cubic relationship with scaling ratio BIPS has a linear relationship with scaling ratio Frequency has a linear dependency on voltage

Issues (3) Validity of the solution – With SPEC, power estimation error range 0.1%~0.3% – BIPS estimation error range 2%-4%

Issues (4) Where is the ceiling of optimization? – If we can know the future, everything will be easy. (sorry, I don’t know how to get the data from oracle)

Issues (5) How about the efficiency of MaxBIPS in general cases?

Issues (6)

Advance and Drawback Advance – Refer to the contribution Drawback – Some important details are skipped, such as how to get data of oracle policy – How to keep power saving / performance degradation ratio 3:1 – The authors fail to reveal the relationship between number of power mode and power management efficiency

Link btween this paper and the cse520 Explore the power and performance relationship in a CMP system The optimization thought can extend to the architecture design

Project My project plans to explore how the number of power modes can influence the efficiency of power management policy

Q&A The End