Computer Architecture Principles Dr. Mike Frank

Slides:

Advertisements

Similar presentations

Software Exploits for ILP We have already looked at compiler scheduling to support ILP – Altering code to reduce stalls – Loop unrolling and scheduling.

Advertisements

Computer Architecture Instruction-Level Parallel Processors

CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines Adopted from Siddhartha Chatterjee Spring 2009.

CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Compiler techniques for exposing ILP

ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

COMP4611 Tutorial 6 Instruction Level Parallelism

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.

1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,

EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.

DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

COMP4211 Seminar Intro to Instruction-Level Parallelism 04S1 Week 02 Oliver Diessel.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.

Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.

Instruction Level Parallelism (ILP) Colin Stevens.

EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.

EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

Chapter 2 Instruction-Level Parallelism and Its Exploitation

EECC551 - Shaaban #1 Winter 2011 lec# Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level.

EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

Final Review Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

Use of Pipelining to Achieve CPI < 1

Instruction-Level Parallelism and Its Dynamic Exploitation

CS 352H: Computer Systems Architecture

Computer Architecture

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

Simultaneous Multithreading

The University of Adelaide, School of Computer Science

5.2 Eleven Advanced Optimizations of Cache Performance

CS203 – Advanced Computer Architecture

Chapter 14 Instruction Level Parallelism and Superscalar Processors

/ Computer Architecture and Design

Levels of Parallelism within a Single Processor

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Computer Architecture

Siddhartha Chatterjee Spring 2008

How to improve (decrease) CPI

Adapted from the slides of Prof

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Instruction Level Parallelism (ILP)

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Levels of Parallelism within a Single Processor

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CSC3050 – Computer Architecture

Dynamic Hardware Prediction

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

How to improve (decrease) CPI

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CMSC 611: Advanced Computer Architecture

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Loop-Level Parallelism

The University of Adelaide, School of Computer Science

Presentation transcript:

Computer Architecture Principles Dr. Mike Frank CDA 5155 Summer 2003 Module #17 Introduction to Advanced Pipelining: Instruction-Level Parallelism

Advanced Pipelining Techniques: More Instruction-Level Parallelism

Chapter 4 of 2rd edition, appendix A-8 & chapters 3&4 of 3rd edition Advanced Pipelining Chapter 4 of 2rd edition, appendix A-8 & chapters 3&4 of 3rd edition Focus on Instruction-Level Parallelism (ILP): Executing multiple instructions (within a single program execution thread) simultaneously. Note even ordinary pipelining does some ILP (overlapping execution of multiple instructions). Increase ILP further using multiple-issue data-paths to initiate multiple instructions at once. Such microarchitectures are called superscalar. Examples: RS/6000, PowerPC, Pentium, etc.

Pipeline Performance Ideal pipeline CPI is minimum number of cycles per instruction issued, if no stalls occur. May be <1 in superscalar machines. E.g., Ideal CPI=1/3 in 3-way superscalar (e.g. IA-64) Real pipeline CPI = Ideal pipeline CPI + structural stalls + RAW stalls + WAR stalls + WAW stalls + control stalls (average values). Maximize performance using various techniques to eliminate stalls and reduce ideal CPI. Note: Real pipeline CPI still doesn’t account for cache misses (return to this in chapter 5).

Advanced Pipelining Techniques Technique Reduces Loop unrolling Control stalls Basic pipeline scheduling RAW stalls Dynamic scheduling w. scoreboarding RAW stalls Dyn. sched. w. register renaming WAR & WAW stalls Dynamic branch prediction Control stalls Issuing multiple instructions per cycle Ideal pipeline CPI Compiler dependence analysis Ideal CPI & data stalls Software pipelining & trace scheduling Ideal CPI & data stalls Speculation All data & control stalls Dynamic memory disambiguation RAW stalls involving mem.

Basic Blocks & I.L.P. A basic block is a straight-line code segment with no branches in or out of it. Tend to be small: 6-7 instructions on average. ILP within a basic block is limited. Need ways to parallelize execution across multiple basic blocks! If Loop

Loop-Level Parallelism (LLP) Perform multiple loop iterations in parallel. Works for some loops, but not others. Examples: for (I=1; I<=1000; I++) x[I] = x[I] + y[I]; for (I=1; I<=1000; I++) sum = sum + x[I] Early vector-based supercomputers (e.g. Crays) relied on this technique extensively. Compiling FORTRAN loops to vector operations.  

Converting LLP to ILP Technique of loop unrolling: Transform multiple loop iterations into a single instruction stream, without branches. Multiple loop iterations are merged into a single basic block. Other ILP techniques can then be used to parallelize execution within this block. Loop unrolling can either be done statically by the compiler, or dynamically by the hardware.