Computer Architecture Principles Dr. Mike Frank

Slides:



Advertisements
Similar presentations
Software Exploits for ILP We have already looked at compiler scheduling to support ILP – Altering code to reduce stalls – Loop unrolling and scheduling.
Advertisements

Computer Architecture Instruction-Level Parallel Processors
CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines Adopted from Siddhartha Chatterjee Spring 2009.
CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Compiler techniques for exposing ILP
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
COMP4611 Tutorial 6 Instruction Level Parallelism
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,
EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
COMP4211 Seminar Intro to Instruction-Level Parallelism 04S1 Week 02 Oliver Diessel.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
Instruction Level Parallelism (ILP) Colin Stevens.
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
EECC551 - Shaaban #1 Winter 2011 lec# Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
Final Review Prof. Mike Schulte Advanced Computer Architecture ECE 401.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Use of Pipelining to Achieve CPI < 1
Instruction-Level Parallelism and Its Dynamic Exploitation
CS 352H: Computer Systems Architecture
Computer Architecture
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Simultaneous Multithreading
The University of Adelaide, School of Computer Science
5.2 Eleven Advanced Optimizations of Cache Performance
CS203 – Advanced Computer Architecture
Chapter 14 Instruction Level Parallelism and Superscalar Processors
/ Computer Architecture and Design
Levels of Parallelism within a Single Processor
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Computer Architecture
Siddhartha Chatterjee Spring 2008
How to improve (decrease) CPI
Adapted from the slides of Prof
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Instruction Level Parallelism (ILP)
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Levels of Parallelism within a Single Processor
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
How to improve (decrease) CPI
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CMSC 611: Advanced Computer Architecture
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Loop-Level Parallelism
The University of Adelaide, School of Computer Science
Presentation transcript:

Computer Architecture Principles Dr. Mike Frank CDA 5155 Summer 2003 Module #17 Introduction to Advanced Pipelining: Instruction-Level Parallelism

Advanced Pipelining Techniques: More Instruction-Level Parallelism

Chapter 4 of 2rd edition, appendix A-8 & chapters 3&4 of 3rd edition Advanced Pipelining Chapter 4 of 2rd edition, appendix A-8 & chapters 3&4 of 3rd edition Focus on Instruction-Level Parallelism (ILP): Executing multiple instructions (within a single program execution thread) simultaneously. Note even ordinary pipelining does some ILP (overlapping execution of multiple instructions). Increase ILP further using multiple-issue data-paths to initiate multiple instructions at once. Such microarchitectures are called superscalar. Examples: RS/6000, PowerPC, Pentium, etc.

Pipeline Performance Ideal pipeline CPI is minimum number of cycles per instruction issued, if no stalls occur. May be <1 in superscalar machines. E.g., Ideal CPI=1/3 in 3-way superscalar (e.g. IA-64) Real pipeline CPI = Ideal pipeline CPI + structural stalls + RAW stalls + WAR stalls + WAW stalls + control stalls (average values). Maximize performance using various techniques to eliminate stalls and reduce ideal CPI. Note: Real pipeline CPI still doesn’t account for cache misses (return to this in chapter 5).

Advanced Pipelining Techniques Technique Reduces Loop unrolling Control stalls Basic pipeline scheduling RAW stalls Dynamic scheduling w. scoreboarding RAW stalls Dyn. sched. w. register renaming WAR & WAW stalls Dynamic branch prediction Control stalls Issuing multiple instructions per cycle Ideal pipeline CPI Compiler dependence analysis Ideal CPI & data stalls Software pipelining & trace scheduling Ideal CPI & data stalls Speculation All data & control stalls Dynamic memory disambiguation RAW stalls involving mem.

Basic Blocks & I.L.P. A basic block is a straight-line code segment with no branches in or out of it. Tend to be small: 6-7 instructions on average. ILP within a basic block is limited. Need ways to parallelize execution across multiple basic blocks! If Loop

Loop-Level Parallelism (LLP) Perform multiple loop iterations in parallel. Works for some loops, but not others. Examples: for (I=1; I<=1000; I++) x[I] = x[I] + y[I]; for (I=1; I<=1000; I++) sum = sum + x[I] Early vector-based supercomputers (e.g. Crays) relied on this technique extensively. Compiling FORTRAN loops to vector operations.  

Converting LLP to ILP Technique of loop unrolling: Transform multiple loop iterations into a single instruction stream, without branches. Multiple loop iterations are merged into a single basic block. Other ILP techniques can then be used to parallelize execution within this block. Loop unrolling can either be done statically by the compiler, or dynamically by the hardware.