PowerMixer IP : IP-Level Power Modeling for Processors Shan-Chien Fang 1 Jia-Lu Liao 2 Chen-Wei Hsu 2 Chia-Chien Weng 2 Shi-Yu Huang 2 Wen-Tsan Hsieh 3.

Slides:



Advertisements
Similar presentations
Chungki Oh, Jianfeng Liu, Seokhoon Kim, Kyung-Tae Do,
Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
& Microelectronics and Embedded Systems M 2 μP - Multithreading Microprocessor Thesis Presentation Embedded Systems Research Group Department of Industrial.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Evaluating Performance and Power of Object-oriented vs. Procedural Programming in Embedded Processors A. Chatzigeorgiou, G. Stephanides Department of Applied.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Chia-Yen Hsieh Laboratory for Reliable Computing Microarchitecture-Level Power Management Iyer, A. Marculescu, D., Member, IEEE IEEE Transaction on VLSI.
Source Code Optimization and Profiling of Energy Consumption in Embedded System Simunic, T.; Benini, L.; De Micheli, G.; Hans, M.; Proceedings on The 13th.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Presenter: Jyun-Yan Li Multiprocessor System-on-Chip Profiling Architecture: Design and Implementation Po-Hui Chen, Chung-Ta King, Yuan-Ying Chang, Shau-Yin.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
ECE 510 Brendan Crowley Paper Review October 31, 2006.
1 Targeted execution enabling increased power efficiency John Goodacre Director, Program Management ARM Processor Division August 2009 MPSoC 2009 Anirban.
Hierarchical Physical Design Methodology for Multi-Million Gate Chips Session 11 Wei-Jin Dai.
2013 DAC Designer/User Track Presentation Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor Visvesh Sathe 3, Padelis Papadopoulos.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
1 VERILOG Fundamentals Workshop סמסטר א ' תשע " ה מרצה : משה דורון הפקולטה להנדסה Workshop Objectives: Gain basic understanding of the essential concepts.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Energy saving in multicore architectures Assoc. Prof. Adrian FLOREA, PhD Prof. Lucian VINTAN, PhD – Research.
Extreme Makeover for EDA Industry
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Developing Power-Aware Strategies for the Blackfin Processor Steven VanderSanden Giuseppe Olivadoti David Kaeli Richard Gentile Northeastern University.
Why Low Power Testing? 台大電子所 李建模.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Bypass Aware Instruction Scheduling for Register File Power Reduction Sanghyun Park, Aviral Shrivastava Nikil Dutt, Alex Nicolau Yunheung Paek Eugene Earlie.
An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications Amol Bakshi, Jingzhao Ou, and Viktor K. Prasanna.
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
E X C E E D I N G E X P E C T A T I O N S VLIW-RISC CSIS Parallel Architectures and Algorithms Dr. Hoganson Kennesaw State University Instruction.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
Runtime Software Power Estimation and Minimization Tao Li.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
JouleTrack - A Web Based Tool for Software Energy Profiling Amit Sinha and Anantha Chandrakasan Massachusetts Institute of Technology June 19, 2001.
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.
Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware WU DI NOV. 3, 2015.
Learning-Based Power Modeling of System-Level Black-Box IPs Dongwook Lee, Taemin Kim, Kyungtae Han, Yatin Hoskote, Lizy K. John, Andreas Gerstlauer.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Presenter: Darshika G. Perera Assistant Professor
Lecture: Pipelining Basics
STUDY AND IMPLEMENTATION
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
A High Performance SoC: PkunityTM
Suhas Chakravarty, Zhuoran Zhao, Andreas Gerstlauer
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Martin Croome VP Business Development GreenWaves Technologies.
Presentation transcript:

PowerMixer IP : IP-Level Power Modeling for Processors Shan-Chien Fang 1 Jia-Lu Liao 2 Chen-Wei Hsu 2 Chia-Chien Weng 2 Shi-Yu Huang 2 Wen-Tsan Hsieh 3 Jen-Chieh Yeh 3 1 TinnoTek Inc, Taiwan 2 Dept. of Electrical Engineering, National Tsing Hua University, Taiwan 3 Industrial Technology Research Institute, Taiwan

Introduction Power dissipation has become a major design metric IR drop, signal integrity power budgeting, power tradeoff, battery lifetime power grid design, thermal analysis, packaging High-level power estimation enable power optimization in early stage achieve higher power saving fast but often suffer from inadequate accuracy PowerMixer IP IP-based power modeling/analysis tool bottom-up power modeling/analysis methodology fast and accurate power analysis for large SoC designs

Power Modeling Strategies Processor Model General IP Model PowerMixer IP 1.For general IP 2.Adopt operation-mode-based model 3.By observing user-defined operation mode and key signals 1.For general IP 2.Adopt operation-mode-based model 3.By observing user-defined operation mode and key signals 1.Specific for processor 2.Adopt instruction-level or stage-accurate model 3.By observing the program counter register and the instruction registers 1.Specific for processor 2.Adopt instruction-level or stage-accurate model 3.By observing the program counter register and the instruction registers

IP-Based Power Simulation μProcessor (3) Essential VCD (1) SoC Netlist(2) IP Power Models (.PMF) CacheBusDMAASICs …… (4) Std. Cell Power Library (4) Std. Cell Power Library PowerMixer IP (IP-Based Power Simulation) PowerMixer IP (IP-Based Power Simulation) Power Profile PowerMixer IP can significantly speed up the simulation process!

Processor Modeling Example: PAC-DSP Core Architecture  PACDSP core is a VLIW processor with 8 pipeline stages and 5 issues  ISA supports 206 instructions

Energy Model Complexity  Enumerate all possible instruction combinations 206 is total number of instruction 5 is number of instructions per issue O(206 5 )  Divide all instructions into instruction classes instructions with similar behaviors in one class divide instructions into 13 types O(206 5 )  O(13 5 )  Sum up the individual power of each instruction in a issue O(13 5 )  O(13)  Consider power consumption of an instruction in eight different stages O(13)  O(13*8) = O(104)

Divide the execution time of training programs into a number of basic periods Basic period the time period during which the program counter’s value is not changed calculate energy E i of each basic period i Basic Period of Processor Energy Model CLK PC e E1E1 E2E2 E3E3 E4E4 E5E5

Generate Energy Matrix Energy Equation for Each Basic Period Energy Matrix E i : energy consumption of the basic period i N i,s : number of times the s -th stage is executed in basic period i J s : one-time execution energy of the stage s s : pipeline stage id in each instruction class Solve the energy matrix to obtain J vector N i,1 x J 1 + N i,2 x J 2 + …… + N i,104 x J 104 = E i

Experimental Results Accuracy and Runtime Comparisons of IP-level Power Analysis General IP model Gate-level CLK STATUS ENDE AES Power Waveform IP DesignGate Count Accuracy ComparisonRun Time Comparison Gate-levelIP-levelErrorGate-levelIP-level Speedup AES88K63.20 mW63.29 mW0.14%120.1 sec0.37 sec324 X PACDSP248K18.9 mW18.3 mW-3.1%937.0sec2 sec 468 X AndesCore490K215.9 mW220.2 mW1.9% sec32.21sec314 X K1.53 mW1.54 mW0.7%316.7sec6.02sec52 X

Power Exploration & Design Trade-Off Application: H.264 (100K instructions) Specification: PAC-DSP with various 240MHz T Target : Execution time of different cache sizes T reference : Execution time of 32K cache size E Target : Energy of different cache sizes E reference : Energy of 32K cache size

Summary PowerMixer IP : IP-based power analysis tool Construct the power models of processors and other various IPs automatically Explore potential power-performance trade-offs at an early SoC design stage ~100X power simulation speedup with high estimation accuracy