Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.

Slides:



Advertisements
Similar presentations
Computer Architecture Instruction-Level Parallel Processors
Advertisements

CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.
A scheme to overcome data hazards
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Exploring Wakeup-Free Instruction Scheduling Jie S. Hu, N. Vijaykrishnan, and Mary Jane Irwin Microsystems Design Lab The Pennsylvania State University.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Low power Design Strategies Daniele Folegnani. Talk outline Why Low Power is Important Power Consumption in CMOS Circuits New Trends for Future Microprocessors.
CS 7810 Lecture 2 Complexity-Effective Superscalar Processors S. Palacharla, N.P. Jouppi, J.E. Smith U. Wisconsin, WRL ISCA ’97.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Instruction Level Parallelism (ILP) Colin Stevens.
Complexity-Effective Issue Queue Design Under Load-Hit Speculation Tali Moreshet and R. Iris Bahar Brown University Division of Engineering.
Very low power pipelines using significance compression Canal, R. Gonzalez, A. Smith, J.E. Dept. d'Arquitectura de Computadors, Univ. Politecnica de Catalunya,
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.
Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland Presented by:
UPC Reducing Power Consumption of the Issue Logic Daniele Folegnani and Antonio González Universitat Politècnica de Catalunya.
CS 7810 Lecture 3 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings.
Multiscalar processors
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
ISLPED’03 1 Reducing Reorder Buffer Complexity Through Selective Operand Caching *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk,
Low Power Techniques in Processor Design
CS Lecture 4 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.
2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large.
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.
Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.
Chapter One Introduction to Pipelined Processors
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
12/03/2001 MICRO’01 Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources* *supported in part.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年.
PipeliningPipelining Computer Architecture (Fall 2006)
Graduate Seminar Using Lazy Instruction Prediction to Reduce Processor Wakeup Power Dissipation Houman Homayoun April 2005.
CS 352H: Computer Systems Architecture
Computer Architecture Chapter (14): Processor Structure and Function
Computer architecture and computer organization
William Stallings Computer Organization and Architecture 8th Edition
Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs
SECTIONS 1-7 By Astha Chawla
Lynn Choi Dept. Of Computer and Electronics Engineering
CS203 – Advanced Computer Architecture
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Microprocessor Microarchitecture Dynamic Pipeline
Microarchitectural Techniques for Power Gating of Execution Units
Hyperthreading Technology
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Power-Aware Microprocessors
Simultaneous Multithreading in Superscalar Processors
* From AMD 1996 Publication #18522 Revision E
Computer Architecture
Program Phase Directed Dynamic Cache Way Reconfiguration
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Sizing Structures Fixed relations Empirical (simulation-based)
Presentation transcript:

Energy-Effective Issue Logic Hasan Hüseyin Yılmaz

Outline Introduction Introduction Why is Energy Consumption a big issue ? Why is Energy Consumption a big issue ? What can be done ? What can be done ? Objectives ? Objectives ? Related works Related works Definitions Definitions What is issue logic What is issue logic Register Rename Logic Register Rename Logic Wake-up logic Wake-up logic Selection logic Selection logic Solution Solution Architectural Design Architectural Design Changes made to wake-up logic Changes made to wake-up logic Changes made to selection logic Changes made to selection logic Results Results Experiments Experiments Conclusions Conclusions

Why is power consumption is a big issue More than %95 of the current microprocessor are used in embedded systems More than %95 of the current microprocessor are used in embedded systems Battery life is bottleneck of mobile systems Battery life is bottleneck of mobile systems Begining to reach the limits of conventional cooling technique Begining to reach the limits of conventional cooling technique Power saving vs performance Power saving vs performance

Objectives To show the potential in power saving through dynamically reconfiguring the issue logic To show the potential in power saving through dynamically reconfiguring the issue logic To maximize the flexibility for a system to carry out reconfigurations in an effective and efficient manner To maximize the flexibility for a system to carry out reconfigurations in an effective and efficient manner Minimize the impact on performance Minimize the impact on performance

Related Works Scaling voltage and frequency dynamically Scaling voltage and frequency dynamically Programmable thermal threshold Programmable thermal threshold Disable unused part of the processor Disable unused part of the processor Reducing cache power Reducing cache power Smart branch predictions mechanisms Smart branch predictions mechanisms

Pipeline Organization More flexibility in reconfiguring the pipeline due to different types of instructions More flexibility in reconfiguring the pipeline due to different types of instructions Associated signals are disabled when a cluster is disabled  power saving !! Associated signals are disabled when a cluster is disabled  power saving !! All instructions within an enabled cluster are “visible” to the selection logic  power inefficient !! Shrinking in issue queue size  limit exposure to ILP

Pipeline organization 2 pipeline stages pipeline stages

Experimental system

Energy consumption of units in microprocessor for fp programs

Energy consumption of units in microprocessor for integer programs

Concerns Concantrate on Issue Logic Concantrate on Issue Logic Issue Logic consist of 2 sub parts Issue Logic consist of 2 sub parts Wake-up Logic Wake-up Logic Selection Logic Selection Logic

Wake-up Logic Updates source dependences for instructions in the issue window waiting for their source operands to become available Updates source dependences for instructions in the issue window waiting for their source operands to become available

Selection Logic Chooses instructions for execution from the pool of ready instructions Chooses instructions for execution from the pool of ready instructions

Power consumptions Instruction queue and its associated issue logic are responsible for %25 of the total power consumption of microprocessor Instruction queue and its associated issue logic are responsible for %25 of the total power consumption of microprocessor Empty entries and ready entries wastes power on wake-up logic Empty entries and ready entries wastes power on wake-up logic Dynamically resize instruction queue to utilize energy usage Dynamically resize instruction queue to utilize energy usage

Solution Wake-up logic is disabled for Empty and ready entries in issue queue Wake-up logic is disabled for Empty and ready entries in issue queue An Algorithm developed to dynamically resize the instruction queue An Algorithm developed to dynamically resize the instruction queue

Energy analysis for wake-up logic Energy consumption of empty entries is on average %74.9 of issue logic Energy consumption of empty entries is on average %74.9 of issue logic Energy consumption of ready entries is on average %14 of issue logic Energy consumption of ready entries is on average %14 of issue logic Wake-up logic wastes %89 of total issue logic Wake-up logic wastes %89 of total issue logic

Issue Queue Structure Non-ready instructions are hidden from the selection logic  power efficient Non-ready instructions are hidden from the selection logic  power efficient Exposure to potential ILP is maximized by maintaining the size of issue queue at all times Exposure to potential ILP is maximized by maintaining the size of issue queue at all times Broadcasting of computed results are restricted to non-ready instructions only  power efficient Broadcasting of computed results are restricted to non-ready instructions only  power efficient

Dynamically resized issue queue

Algorithm Add a bit to reorder buffer Add a bit to reorder buffer Divide instruction queue to smaller portions Divide instruction queue to smaller portions Set bit zero when instruction dispatched Set bit zero when instruction dispatched When an instruction is issued this bit is set if it is from the youngest portion of instruction queue When an instruction is issued this bit is set if it is from the youngest portion of instruction queue İncrease a counter İncrease a counter Check counter in certain number of cycles which is referred as “ quantum” Check counter in certain number of cycles which is referred as “ quantum” İf it is smaller than a determined number reduce instruction queue size by 1 portion İf it is smaller than a determined number reduce instruction queue size by 1 portion Every 5 quantum we increase queue size 1 portion if it is not at maximum size Every 5 quantum we increase queue size 1 portion if it is not at maximum size

Dynamic issue queue Analysis Performance hardly effected Performance hardly effected On average %1.7 IPC lost On average %1.7 IPC lost

Conclusion Energy Consumption reduced %15 on average by whole processor Energy Consumption reduced %15 on average by whole processor Lost %1.7 IPC on average Lost %1.7 IPC on average

Conclusion 2 “ A good design strategy should be flexible enough to dynamically reconfigure available resources according to the program’s needs” [2]. “ A good design strategy should be flexible enough to dynamically reconfigure available resources according to the program’s needs” [2]. Most energy cosuming part of the processor is out of order intruction issuing mechanism. Most energy cosuming part of the processor is out of order intruction issuing mechanism. Much unneeded activity and checks are done in issue logic Much unneeded activity and checks are done in issue logic

References Folegnani, D. and González, A Energy-effective issue logic. In Proceedings of the 28th Annual international Symposium on Computer Architecture (Göteborg, Sweden, June 30 - July 04, 2001). ISCA '01. Bai, Y. and Bahar, R. I A low-power in-order/out-of- order issue queue. ACM Trans. Archit. Code Optim. 1, 2 (Jun. 2004), Canal, R. and González, A A low-complexity issue logic. In Proceedings of the 14th international Conference on Supercomputing (Santa Fe, New Mexico, United States, May , 2000). Palacharla, S., Jouppi, N. P., and Smith, J. E Complexity-effective superscalar processors. In Proceedings of the 24th Annual international Symposium on Computer Architecture (Denver, Colorado, United States, June , 1997).

THANKS Questions?