EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer.

Slides:



Advertisements
Similar presentations
Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
Advertisements

A scheme to overcome data hazards
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
COMP25212 Advanced Pipelining Out of Order Processors.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Instruction-Level Parallelism (ILP)
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
EENG449b/Savvides Lec /22/05 March 22, 2005 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
EENG449b/Savvides Lec 4.1 1/25/05 January 25 and 25, 2005 Prof. Andreas Savvides Spring g449b EENG 449b/CPSC.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Appendix A Pipelining: Basic and Intermediate Concepts
CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
Out-of-order execution: Scoreboarding and Tomasulo Week 2
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining.
Pipeline Extensions prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University MIPS Extensions1May 2015.
Instruction-Level Parallelism Dynamic Scheduling
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
COMP25212 Advanced Pipelining Out of Order Processors.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
Instruction-Level Parallelism
Images from Patterson-Hennessy Book
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
CS203 – Advanced Computer Architecture
Appendix C Pipeline implementation
Exceptions & Multi-cycle Operations
Advantages of Dynamic Scheduling
Pipelining: Advanced ILP
High-level view Out-of-order pipeline
Lecture 6: Advanced Pipelines
A Dynamic Algorithm: Tomasulo’s
Pipelining Multicycle, MIPS R4000, and More
CSC 4250 Computer Architectures
CS 704 Advanced Computer Architecture
CSCE430/830 Computer Architecture
Project Instruction Scheduler Assembler for DLX
CS152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards.
High-level view Out-of-order pipeline
Lecture 7 Dynamic Scheduling
CMSC 611: Advanced Computer Architecture
Lecture 5: Pipeline Wrap-up, Static ILP
Presentation transcript:

EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer Systems Lecture 5 FP Pipelining & Dynamically Scheduled Pipelines and Overview of ARM Architecture Part I

EENG449b/Savvides Lec 5.2 1/27/04 Floating-Point Support in Pipelines Floating point operations will take more than 1 or 2 cycles to complete –Structural hazards –Data hazards Multiple functional units required –Loads, stores and integer ALUs –FP and integer multiplier –FP adder that handles FP add, subtract and conversion –FP and integer divider Initiation interval – number of cycles that must elapse before issuing two operations of a given type

EENG449b/Savvides Lec 5.3 1/27/04 Multiple FUs and Latencies Functional UnitLate ncy Initiation Interval Integer ALU01 Data memory (integer and FP Loads) 11 FP add31 FP Multiply61 FP Divide2425

EENG449b/Savvides Lec 5.4 1/27/04 Support for Multiple Outstanding Operations Additional pipeline registers needed

EENG449b/Savvides Lec 5.5 1/27/04 Hazards in Longer Pipelines 1.Divide unit is not fully pipelined - structural hazards can occur 2.Instructions have varying running times so the number of register writes required in a cycle can be larger than 1. 3.WAW hazards are possible, since instructions don’t reach WB in order 4.Instructions can complete in different order than the one they were issued causing problems with exceptions 5.Because of longer latency of operations, stalls for RAW hazards will be more frequent

EENG449b/Savvides Lec 5.6 1/27/04 FP Pipeline Hazards Example Figure A.34 Simultaneous writeback Stall an instruction in the ID stage Stall the instruction when it tries to enter WB

EENG449b/Savvides Lec 5.7 1/27/04 Checks for Detecting Hazards Three checks to be performed before a multicycle instruction can issue in the ID stage: Check for structural hazards –A structural unit is not busy and a write register port is available when needed Check for a RAW data hazard –Wait until the source registers are not listed as pending destinations Check for WAW data hazard –Determine an instruction that already issued has the same destination as this instruction. If so stall the instruction issue in ID.

EENG449b/Savvides Lec 5.8 1/27/04 MIPS R4000 Pipeline Decompose the 5-stage pipeline to a deeper 8-stage pipeline(superpipeline) –achieve higher clock rates => better performance Extra stages come from decomposing memory accesses Longer pipelines increase the amount of forwarding and branch delays

EENG449b/Savvides Lec 5.9 1/27/04 Branch Delay Cycles Branch outcome needs 3 cycles

EENG449b/Savvides Lec /27/04 Dynamic Scheduled Pipelines Simple pipelines result in hazards that require stalling. Static scheduling – compilers rearrange instructions to avoid stalls. Dynamic scheduling – processor executes instructions out-of-order to minimize stalls Dynamic scheduling requires splitting the ID stage into stages: –Issue – Decode instructions, check for structural hazards –Read operands – Wait until there are no data hazards, then read operands –Also need to know when each instruction begins and ends execution Requires a lot more bookkeeping! More when we discuss Tomasulo’s algorithm in chapter 3…

EENG449b/Savvides Lec /27/04 Scoreboarding Scoreboarding – a technique that allows out- of-order execution when resources are available and there are no data dependencies – originated in CDC6600 in the mid 60s. Scoreboard fully responsible for instruction execution and hazard detection –Requires changes in # of functional units and latency of operations –Needs to keep track of status of all instructions in execution

EENG449b/Savvides Lec /27/04 Scoreboarding II

EENG449b/Savvides Lec /27/04 More Hazards WAR and WAW hazards are now possible! DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F8, F8, F14 DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F10, F8, F14 WAR! If SUB.D Executes first WAW! If SUB.D Executes first

EENG449b/Savvides Lec /27/04 Refer to figures A.52 – A.54 for example scoreboard tables Scoreboarding is limited by: Amount of parallelism among instructions The number of scoreboard entries The number and types of functional units Presence of antidependencies and output dependencies

EENG449b/Savvides Lec /27/04 Announcements Example on page 44 of the textbook is wrong –CPI for FPSQR not included in the computation of CPI… –Everything after that is affected… Midterm I, Thursday Feb, 19 –Chapters 1, 2, Appendix A and microcontroller material from class. Readings for next class and project related material posted on the class website

EENG449b/Savvides Lec /27/04 ARM Architecture Part I

EENG449b/Savvides Lec /27/04 Where is ARM Today?

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04 Not the case when you have loads and stores!!!!

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04

EENG449b/Savvides Lec /27/04 Microcontroller View

EENG449b/Savvides Lec /27/04 Price/Performance/Peripheral Tradeoffs For many consumer electronics cost is an issue –ARM7TDMI cores have less HW and cost less –With today’s prices you can get an ARM7 based chip for < $5.00 Power Tradeoffs –Power performance is given in Watts/MIPS but –Lifetime is a bandwidth vs. throughput issue »Bandwidth vs. thoughput of battery life

EENG449b/Savvides Lec /27/04 Features ARM7TDMI ROM-less (ML675001) 256KB MCP Flash (ML67Q5002) 512KB MCP Flash (ML67Q5003) 8KB Unified Cache 32KB RAM Interrupts FIQ I2C (1-ch x master) DMA (2-ch) Timers (7 x 16-bit) WDT (16-bit) PWM (2 x 16-bit) UART (2-ch)/ SIO (1-ch) GPIO (5 x 8-bit) ADC (4-ch x 10-bit) up to 66MHz -40 ~ +85  C Package 144 LFBGA 144 QFP ML675001/67Q5002/67Q5003

EENG449b/Savvides Lec /27/04 Next Time Power Metrics Dynamic Voltage Scaling Microcontroller Programming Cycle