-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.

Slides:



Advertisements
Similar presentations
International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Advertisements

OCV-Aware Top-Level Clock Tree Optimization
Timing Margin Recovery With Flexible Flip-Flop Timing Model
Minimum Implant Area-Aware Gate Sizing and Placement
UC San Diego / VLSI CAD Laboratory NOLO: A No-Loop, Predictive Useful Skew Methodology for Improved Timing in IC Implementation Tuck-Boon Chan, Andrew.
SLA-aware Virtual Resource Management for Cloud Infrastructures
Dual Graph-Based Hot Spot Detection Andrew B. Kahng 1 Chul-Hong Park 2 Xu Xu 1 (1) Blaze DFM, Inc. (2) ECE, University of California at San Diego.
On Modeling the Lifetime Reliability of Homogeneous Manycore Systems Lin Huang and Qiang Xu CUhk REliable computing laboratory (CURE) The Chinese University.
Background: Scan-Based Delay Fault Testing Sequentially apply initialization, launch test vector pairs that differ by 1-bit shift A vector pair induces.
Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department.
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Detailed Placement for Leakage Reduction Using Systematic Through-Pitch Variation Andrew B. Kahng †‡ Swamy Muddu ‡ Puneet Sharma ‡ CSE † and ECE ‡ Departments,
Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,
-1- UC San Diego / VLSI CAD Laboratory Methodology for Electromigration Signoff in the Presence of Adaptive Voltage Scaling Wei-Ting Jonas Chan, Andrew.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
Fair Real-time Traffic Scheduling over Wireless Local Area Networks Insik Shin Joint work with M. Adamou, S. Khanna, I. Lee, and S. Zhou Dept. of Computer.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Optimization for Operation of Power Systems with Performance Guarantee
Baoxian Zhao Hakan Aydin Dakai Zhu Computer Science Department Computer Science Department George Mason University University of Texas at San Antonio DAC.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Rensselaer Polytechnic Institute Rajagopal Iyengar Combinatorial Approaches to QoS Scheduling in Multichannel Wireless Systems Rajagopal Iyengar Rensselaer.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.
26 th International Conference on VLSI January 2013 Pune,India Optimum Test Schedule for SoC with Specified Clock Frequencies and Supply Voltages Vijay.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
The 32nd IEEE Real-Time Systems Symposium Meeting End-to-End Deadlines through Distributed Local Deadline Assignment Shengyan Hong, Thidapat Chantem, X.
A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,
Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.
Solving the Maximum Cardinality Bin Packing Problem with a Weight Annealing-Based Algorithm Kok-Hua Loh University of Maryland Bruce Golden University.
Outline Introduction: BTI Aging and AVS Signoff Problem
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
DTM and Reliability High temperature greatly degrades reliability
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Evaluating the Impact of Job Scheduling and Power Management on Processor Lifetime for Chip Multiprocessors (SIGMETRICS 2009) Authors: Ayse K. Coskun,
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
Mixed Cell-Height Implementation for Improved Design Quality in Advanced Nodes Sorin Dobre +, Andrew B. Kahng * and Jiajia Li * * UC San Diego VLSI CAD.
Outline Motivation and Contributions Related Works ILP Formulation
-1- UC San Diego / VLSI CAD Laboratory On Potential Design Impacts of Electromigration Awareness Andrew B. Kahng, Siddhartha Nath and Tajana S. Rosing.
Review for E&CE Find the minimal cost spanning tree for the graph below (where Values on edges represent the costs). 3 Ans. 18.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
11 Yibo Lin 1, Xiaoqing Xu 1, Bei Yu 2, Ross Baldick 1, David Z. Pan 1 1 ECE Department, University of Texas at Austin 2 CSE Department, Chinese University.
Data Driven Resource Allocation for Distributed Learning
Han Zhao Advisor: Prof. Lei He TA: Fang Gong
Maximum Lifetime of Sensor Networks with Adjustable Sensing Range
Presentation transcript:

-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha Nath VLSI CAD LABORATORY, UC San Diego

-2- Outline Motivation Motivation Previous Work Previous Work Our Work Our Work Problem Formulation Problem Formulation Optimal (Discretized) Solution Flow Optimal (Discretized) Solution Flow Results Results Conclusions Conclusions

-3- Reliability in MultiCore Systems Modern multicore processors operate at multiple operating modes Modern multicore processors operate at multiple operating modes –E.g., nominal, supply voltage scaling, turbo, etc. Reliability is a key processor design consideration at leading-edge technology nodes to guarantee a prescribed system lifetime Reliability is a key processor design consideration at leading-edge technology nodes to guarantee a prescribed system lifetime Task scheduling affects how cores are used Task scheduling affects how cores are used –A subset of cores can fail before others

-4- Scheduling in Multicore Systems Scheduler packs tasks using some or all the available processing cores Scheduler packs tasks using some or all the available processing cores Application B Application A Time #Cores

-5- Core Wearout Mean time to failure (MTTF) is a measure of the lifetime of a core Mean time to failure (MTTF) is a measure of the lifetime of a core Reliability mechanisms degrade MTTF of a core Reliability mechanisms degrade MTTF of a core –E.g., electromigration (EM), stress migration, hot carrier injection, bias temperature instability, etc. When all cores are not simultaneously active When all cores are not simultaneously active –Adjust task scheduling on a subset of active cores for balanced wearout

-6- Impact of Overdrive Frequency Frequency due to overclocking the cores to meet performance and throughput requirements Frequency due to overclocking the cores to meet performance and throughput requirements Overdrive frequencies cause faster MTTF degradation Overdrive frequencies cause faster MTTF degradation Two challenges Two challenges –Can violate “acceptable throughput” for tasks Cores fail before all assigned tasks are completed Cores fail before all assigned tasks are completed –Can violate minimum “acceptable performance” for tasks Cores operate at lower frequencies Cores operate at lower frequencies

-7- Terminology

-8- Outline Motivation Motivation Previous Work Previous Work Our Work Our Work Problem Formulation Problem Formulation Optimal (Discretized) Solution Flow Optimal (Discretized) Solution Flow Results Results Conclusions Conclusions

-9- Classification of Existing Works WorkType Reiss12NRC, NLG, NPG Karpuzcu09RC, NLG, NPG Mihic04RC, LG (Dynamic power management), NPG Rosing07RC, LG (Dynamic power management), NPG Rong06RC, LG (Dynamic power management), NPG Coskun09RC, LG (Dynamic thermal management), NPG Srinivasan04RC, LG (Dynamic reliability management), NPG Karl08RC, LG (Dynamic reliability management), NPG (N)RC – (Non-) Reliability Constrained (N)LG – (No) Lifetime Guarantee (N)PG – (No) Performance Guarantee

-10- Counterexample to NRC Policies Task schedule Task schedule Max frequency = 3GHz Max frequency = 3GHz Min acceptable frequency = 1.8GHz Min acceptable frequency = 1.8GHz Initial lifetime = 7 years (61320h) Initial lifetime = 7 years (61320h) #Active cores (m) Nominal execution time (AF = 1) Overdrive execution time (AF = 9.77) 11000h3000h 22000h5000h 33000h8000h 42000h5000h All cores operate always at 3GHz All cores operate always at 3GHz –From HotSpot simulations, AF = 9.77 Lifetime after nominal tasks requiring m = 3 is h Lifetime after nominal tasks requiring m = 3 is h –Tasks requiring m = 3 cannot complete overdrive execution –Tasks requiring m = 4 cannot complete at all Cannot guarantee “acceptable throughput” !!!

-11- Counterexample to RC-LG Policies Task schedule Task schedule Max frequency = 3GHz Max frequency = 3GHz Min acceptable frequency = 1.8GHz Min acceptable frequency = 1.8GHz Initial lifetime = 61320h Initial lifetime = 61320h #Active cores (m) Nominal execution time (AF = 1) Overdrive execution time (AF = 9.77) 11000h3000h 22000h5000h 33000h8000h 42000h5000h All cores operate initially at 3GHz, and then at 1.6GHz All cores operate initially at 3GHz, and then at 1.6GHz –From HotSpot simulations, AF = 9.77 All tasks are completed but All tasks are completed but –Tasks requiring m = 3, 4 operate at 1.6GHz < 1.8GHz (acceptable performance) !!! Cannot guarantee “acceptable performance” !!!

-12- Outline Motivation Motivation Previous Work Previous Work Our Work Our Work Problem Formulation Problem Formulation Optimal (Discretized) Solution Flow Optimal (Discretized) Solution Flow Results Results Conclusions Conclusions

-13- What Do We Do Differently? We formulate a new Maximum-Value Reliability- Constrained Overdrive Frequencies (MVRCOF) optimization (offline) problem We formulate a new Maximum-Value Reliability- Constrained Overdrive Frequencies (MVRCOF) optimization (offline) problem Important because Important because –Overdrive frequencies are our optimization variables –User experience is the value We guarantee prescribed levels of “acceptable performance” and “acceptable throughput” We guarantee prescribed levels of “acceptable performance” and “acceptable throughput”

-14- Comparison of Ours vs. Existing Works WorkType Reiss12NRC, NLG, NPG Karpuzcu09RC, NLG, NPG Mihic04RC, LG (Dynamic power management), NPG Rosing07RC, LG (Dynamic power management), NPG Rong06RC, LG (Dynamic power management), NPG Coskun09RC, LG (Dynamic thermal management), NPG Srinivasan04RC, LG (Dynamic reliability management), NPG Karl08RC, LG (Dynamic reliability management), NPG Our WorkRC, LG (Dynamic reliability management, PG (N)RC – (Non-) Reliability Constrained (N)LG – (No) Lifetime Guarantee (N)PG – (No) Performance Guarantee

-15- What is the Optimal Solution? Task schedule Task schedule Max frequency = 3GHz Max frequency = 3GHz Min acceptable frequency = 1.8GHz Min acceptable frequency = 1.8GHz Initial lifetime = 61320h Initial lifetime = 61320h #Active cores (m) Nominal execution time (AF = 1) Overdrive execution time (AF = 9.77) 11000h3000h 22000h5000h 33000h8000h 42000h5000h Optimal (discretized) solution from exhaustive search Optimal (discretized) solution from exhaustive search #Active cores (m) Nominal frequency Overdrive frequency 11.5GHz2.85GHz 21.5GHz2.3GHz 31.5GHz1.8GHz 41.5GHz1.8GHz We guarantee both “acceptable performance” and “acceptable throughput” if a solution exists!!!

-16- Our Key Contributions We develop a new MVRCOF formulation to maximize the value of operating multiple cores at overdrive frequencies We develop a new MVRCOF formulation to maximize the value of operating multiple cores at overdrive frequencies Our solutions provide guarantees for prescribed lower bounds on “acceptable performance” and “acceptable throughput” Our solutions provide guarantees for prescribed lower bounds on “acceptable performance” and “acceptable throughput” We propose optimal (discretized) solution using exhaustive search as well as an approximate heuristic flow We propose optimal (discretized) solution using exhaustive search as well as an approximate heuristic flow Our solutions determine optimal overdrive frequencies as well as execution times for each active core Our solutions determine optimal overdrive frequencies as well as execution times for each active core We empirically determine that our optimal solutions improve the objective function value by up to 17.4% versus existing works We empirically determine that our optimal solutions improve the objective function value by up to 17.4% versus existing works

-17- Outline Motivation Motivation Previous Work Previous Work Our Work Our Work Problem Formulation Problem Formulation Optimal (Discretized) Solution Flow Optimal (Discretized) Solution Flow Results Results Conclusions Conclusions

-18- Formulation

-19- Formulation In English

-20- Formulation In English Guarantees “acceptable throughput”, i.e., all tasks complete within lifetime and cores wearout in a balanced manner Upper bound on instantaneous power dissipated by any core Upper bound on instantaneous temperature of all actives cores

-21- MVRCOF Inputs: Task Description App 1 App 2 App X Scheduler E l,m w l,m f nom,m Execution times in nominal and overdrive modes with different number of active cores Weights in nominal and overdrive modes with different number of active cores Nominal frequencies at different number of active cores

-22- MVRCOF Inputs: System Description SoC Designer N P max f max T max T nom MTTF Number of available symmetric cores Maximum power of any core Maximum frequency of any core Maximum die temperature Nominal temperature Initial MTTF of any core

-23- MVRCOF Outputs MVRCOF solver f OD,m v j,m,l u i,l Optimal overdrive frequencies for each set of active cores %lifetime each core operates at nominal and overdrive modes

-24- MVRCOF Inputs and Outputs App 1 App 2 App X Scheduler SoC Designer N P max f max T max T nom MTTF E l,m w l,m f nom,m System Description Task Description MVRCOF solver f OD,m v j,m,l u i,l Outputs

-25- Outline Motivation Motivation Previous Work Previous Work Our Work Our Work Problem Formulation Problem Formulation Optimal (Discretized) Solution Flow Optimal (Discretized) Solution Flow Results Results Conclusions Conclusions

-26- Optimal (Discretized) Solution Flow

-27- Heuristic Flow

-28- Outline Motivation Motivation Previous Work Previous Work Our Work Our Work Problem Statement Problem Statement Optimal (Discretized) Solution Flow Optimal (Discretized) Solution Flow Results Results Conclusions Conclusions

-29- Experimental Setup Each core is simulated with 72 copies of jpeg_encoder from OpenCores Each core is simulated with 72 copies of jpeg_encoder from OpenCores –SP&R implementation with commercial tools and foundry 45nm libraries Power simulation using Synopsys PrimeTime-PX Power simulation using Synopsys PrimeTime-PX –Increase voltage from 0.8V to 1.2V in steps of 10mV –Increase frequency from 1.5GHz to 3GHz in steps of 50MHz Thermal simulation using HotSpot Thermal simulation using HotSpot LP solver is lp_solve LP solver is lp_solve Baseline policy is RC-LG from existing works Baseline policy is RC-LG from existing works

-30- Testcases Name (Kh) 4-I1, 2 3, 4 1, 2 3, 2 3, 5 8, 5 0.5, , , , 0.6

-31- Optimal, Heuristic vs. RC-LG -12% -9% sw

-32- Runtime Comparison

-33- Outline Motivation Motivation Previous Work Previous Work Our Work Our Work Problem Statement Problem Statement Optimal (Discretized) Solution Flow Optimal (Discretized) Solution Flow Results Results Conclusions Conclusions

-34- Conclusions We formulate and solve a new MVRCOF problem under lifetime reliability constraints We develop MVRCOF solver that implements our optimal (discretized) and heuristic flows Our optimal solutions guarantee both “acceptable performance” and “acceptable throughput” We empirically demonstrate that our optimal solutions achieve up to 17.4% greater value of the objective function than existing works Our future works include – –Application of our methods to traces from actual server workloads – –Expand our methods to handle other objectives – –Achieve solutions that are temperature history-aware

-35- Thank You!

-36- Back up

-37- Notation

-38- Optimal Solution Flow f OD,m Power(f OD,m ) Power simulation Thermal simulation (f OD,m, temp, AF) LUT (m, j)Core Temp f OD,m AF Exhaustive Search For each core i, f OD,m and combination j of m Optimal obj fn value, f OD,m and t j,m,l LP 1