Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

Slides:

Advertisements

Similar presentations

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

Advertisements

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 3, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Introduction)

A scheme to overcome data hazards

CPE 631: ILP, Static Exploitation Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,

COMP4611 Tutorial 6 Instruction Level Parallelism

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.

1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.

COMP25212 Advanced Pipelining Out of Order Processors.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.

Instruction-Level Parallelism (ILP)

CS 6461: Computer Architecture Instruction Level Parallelism

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.

COMP4211 Seminar Intro to Instruction-Level Parallelism 04S1 Week 02 Oliver Diessel.

COMP4211 (Seminar) Intro to Instruction-Level Parallelism

1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.

Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.

EENG449b/Savvides Lec /22/05 March 22, 2005 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.

EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

Chapter 2 Instruction-Level Parallelism and Its Exploitation

EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)

EECC551 - Shaaban #1 Winter 2011 lec# Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level.

EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.

EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

EECC551 - Shaaban #1 Fall 2001 lec# Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations.

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)

CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

Instruction-Level Parallelism and Its Dynamic Exploitation

Dynamic Scheduling Why go out of style?

Instruction Level Parallelism

CS203 – Advanced Computer Architecture

Advantages of Dynamic Scheduling

Pipelining: Advanced ILP

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CS 704 Advanced Computer Architecture

How to improve (decrease) CPI

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Instruction Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Dynamic Hardware Prediction

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

How to improve (decrease) CPI

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Lecture 7 Dynamic Scheduling

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Presentation transcript:

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline

A.7. Another View: MIPS bit embedded processor –Used in Nintendo game box, colour laser printers, network router Still uses the “classic” five-stage pipeline FP has out-of-order completion –Hybrid scheme for providing precise exceptions (Instructions only issued when certain that preceding instructions will not cause an exception; may stall the pipeline)

A.8. Cross-Cutting Issues Instruction Sets and Pipelining Simple instruction sets make pipelining easier Also allow scheduling of code –Instructions reordered for maximum efficiency Statically, by compiler Dynamically, by hardware –E.g. addition on VAX: one instruction; on RISC four instructions (load, load, add, store)

Dynamic Scheduling Simple pipelines: –Fetch instruction and issue it (unless stalled for data hazard prevention) Dynamic scheduling: processor rearranges instructions to minimise stalls! –Out of order execution Out of order completion!

Dynamic Scheduling Normal pipeline: –Stalling add stalls sub Dynamic scheduling: –Hardware lets sub execute while add stalls fdivd %f2, %f4, %f0 faddd %f0, %f8, %f10 fsubd %f8, %f14, %f12 ! Independent

Dynamic Scheduling Issuing instructions: –Issue: decode, check for structural hazards –Read operands: check for data hazards As soon as operands are available, start execution –Out-of-order execution –Out-of-order completion –Complicates exception handling! –Introduces WAR and WAW hazards

Dynamic Scheduling Two approaches –Scoreboarding Centralised control –Tomasulo’s Algorithm Distributed control

Scoreboarding Developed for CDC 6600 (1964) Issues instructions if they do not depend on any active or stalled instruction Requires multiple (or pipelined) functional units –E.g. CDC: 4 FP units, 7 integer units and 5 memory units –Assume MIPS with one integer unit, 2 FP multipliers, 1 adder and 1 divide

Scoreboard Determines data dependences, then determines when an instruction can read its operands and begin Tracks when an instruction can write results Centralises all hazard detection and resolution

Scoreboard Instructions may stall at: –Issue (first stage of old ID) Resolves WAW and structural hazards –Read Operands (second stage of old ID) Resolves RAW hazards –Write Results Resolves WAR hazards

Scoreboard Keeps track of: –Instruction status Where is instruction? (Issue, Operands, Exec, Write) –Functional unit status Busy? Destination for result. Sources for operands. –Register result status For each active instruction, which unit will write to register

Scoreboard Relatively simple to implement –Main “cost” is extra busses connecting multiple functional units Limited by: –Amount of parallelism (independent instructions) –Number of scoreboard entries (window used for instruction look ahead ) –Number and type of functional units –Dependences (causing WAR and WAW stalls)

A.9. Fallacies and Pitfalls Pitfall: Unexpected execution sequences may cause unexpected hazards –E.g. WAW caused by compiler filling delay slots Pitfall: Extensive pipelining can lead to poor price/performance –E.g. VAX microprogram pipeline

Fallacies and Pitfalls Pitfall: Evaluating a code scheduler with unoptimised code

Concluding Remarks Before 1980 pipelining was only used in expensive supercomputers and high-end mainframes Mid-1980’s: adopted by high-end microprocessors –Displaced minicomputers and mainframes 1990’s: desktop processors using sophisticated pipelines –dynamic scheduling, multiple-issue, etc.

Chapter 3 Instruction Level Parallelism and Its Dynamic Exploitation

Introduction Chapter 3: dynamic techniques using hardware –Pentium, Athlon, MIPS, SPARC, etc. Chapter 4: static techniques using software –Itanium

3.1. Instruction-Level Parallelism Pipelining overlaps independent instructions –Instruction-level parallelism (ILP) Extend basic concept of pipelining: –Reducing hazards –Increasing performance by exploiting further parallelism

Pipeline Performance Ideal CPI is maximum performance Seek to reduce each term as far as possible

Finding ILP Basic block –A single straight-line code sequence without branches in or out –Very little ILP –Branch frequency 15% (int)  six/seven instructions per block, largely dependent Need more ILP! –Need to look across basic blocks

Finding ILP Loop-level Parallelism –Often loop iterations are independent for (int k = 0; k < 1000; k++) x[k] = x[k] + y[k]; –1000 independent “blocks” –Use “loop unrolling”

Dependences Which instructions are dependent on each other? Three types of dependence: –Data dependence –Name dependence Antidependence Output dependence –Control dependences

Data Dependence One instruction requires result from another –directly or indirectly –through registers or memory Order must be maintained Data dependences are determined by the program Whether this is a RAW hazard and causes a stall is determined by the pipeline

Name Dependence Two instructions use the same register or memory location (i.e. name), but there is no data flow between them Antidependence –Instr i reads, instr j (after i) writes (WAR) Output dependence –Instr i writes, instr j (after i) writes (WAW) Use “register renaming”

Avoiding Data Hazards Techniques for exploiting parallelism that preserve program order only where it affects results

Control Dependences Determine the order of execution of instructions S1 is control dependent on c1 if (c1) { S1; }

Control Dependences If instruction i is control dependent on a branch it cannot be moved before the branch If instruction i is not control dependent on a branch it cannot be moved after the branch

Control Dependences Can violate control dependences, if we maintain: –Exception behaviour –Data flow