Computer Architecture Lec 1: Introduction Dr. Eng. Amr T. Abdel-Hamid CSEN 601 Spring 2011 Computer Architecture Text book slides: Computer Architec ture:

Slides:



Advertisements
Similar presentations
Pipeline Hazards CSCE430/830 Pipeline: Hazards CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U. of Maine Fall,
Advertisements

COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Pipelining - Hazards.
CIS 570 Advanced Computer Systems University of Massachusetts Dartmouth Instructor: Dr. Michael Geiger Fall 2008 Lecture 1: Fundamentals of Computer Design.
Ch1. Fundamentals of Computer Design 3. Principles (5) ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department University of Massachusetts.
COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.
CPSC 614 Computer Architecture Lec 3 Pipeline Review EJ Kim Dept. of Computer Science Texas A&M University Adapted from CS 252 Spring 2006 UC Berkeley.
CSCE 430/830 Computer Architecture Basic Pipelining & Performance
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter 5 Pipelining and Hazards
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 Roman Japanese Chinese (compute in hex?). 2 COMP 206: Computer Architecture and Implementation Montek Singh Thu, Jan 22, 2009 Lecture 3: Quantitative.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
CS136, Advanced Architecture Basics of Pipelining.
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
CPE 731 Advanced Computer Architecture Pipelining Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
EECS 252 Graduate Computer Architecture Lecture 3  0 (continued) Review of Instruction Sets, Pipelines, Caches and Virtual Memory January 25 th, 2012.
Appendix A - Pipelining CSCI/ EENG – W01 Computer Architecture 1 Prof. Babak Beheshti Slides based on the PowerPoint Presentations created by David.
Pipeline Review. 2 Review from last lecture Tracking and extrapolating technology part of architect’s responsibility Expect Bandwidth in disks, DRAM,
CS 5513: Computer Architecture Lecture 1: Introduction Daniel A. Jiménez The University of Texas at San Antonio
CSC 7080 Graduate Computer Architecture Lec 3 – Pipelining: Basic and Intermediate Concepts (Appendix A) Dr. Khalaf Notes adapted from: David Patterson.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Computer Architecture Lec 1 - Introduction. 01/19/10Lec 01-intro 2 Outline Computer Science at a Crossroads Computer Architecture v. Instruction Set Arch.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Appendix A Pipelining: Basic and Intermediate Concept
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CS 3853/3851: Computer Architecture Lecture 1: Introduction Daniel A. Jiménez The University of Texas at San Antonio
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Introduction Outline Computer Science at a Crossroads Computer Architecture v. Instruction Set Arch. What Computer Architecture brings to table.
Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #4 (9/3/15) Course.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
CMPUT Computer Systems and Architecture1 CMPUT429/CMPE382 Winter 2001 Topic3-Pipelining José Nelson Amaral (Adapted from David A. Patterson’s CS252.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization
Review: Instruction Set Evolution
Ch1. Fundamentals of Computer Design 3. Principles (5)
Pipelining: Hazards Ver. Jan 14, 2014
CDA 3101 Spring 2016 Introduction to Computer Organization
CMSC 611: Advanced Computer Architecture
5 Steps of MIPS Datapath Figure A.2, Page A-8
Single Clock Datapath With Control
Appendix A - Pipelining
School of Computing and Informatics Arizona State University
Chapter 4 The Processor Part 3
CMSC 611: Advanced Computer Architecture
The Processor Lecture 3.6: Control Hazards
Instruction Execution Cycle
Electrical and Computer Engineering
Control unit extension for data hazards
Pipelining Appendix A and Chapter 3.
Control unit extension for data hazards
Throughput = #instructions per unit time (seconds/cycles etc.)
Presentation transcript:

Computer Architecture Lec 1: Introduction Dr. Eng. Amr T. Abdel-Hamid CSEN 601 Spring 2011 Computer Architecture Text book slides: Computer Architec ture: A Quantitative Approach 4th E dition, John L. Hennessy & David A. Patterso with modifications.

Dr. Amr Talaat Elect 707 Computer Architecture

Dr. Amr Talaat Elect 707 Computer Architecture CPU History in a Flash  Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm 2 chip Processor is the new transistor? RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm 2 chip 125 mm 2 chip, micron CMOS = 2312 RISC II+FPU+Icache+Dcache – RISC II shrinks to ~ 0.02 mm 2 at 65 nm – Caches via DRAM or 1 transistor SRAM

Dr. Amr Talaat Elect 707 Computer Architecture Instruction Set Architecture: Critical Interface instruction set software hardware  Properties of a good abstraction  Lasts through many generations (portability)  Used in many different ways (generality)  Provides convenient functionality to higher levels  Permits an efficient implementation at lower levels

Dr. Amr Talaat Elect 707 Computer Architecture ISA vs. Computer Architecture  Old definition of computer architecture = instruction set design  Other aspects of computer design called implementation  Insinuates implementation is uninteresting or less challengi ng  Our view is: computer architecture >> ISA  Architect’s job much more than instruction set design; te chnical hurdles today more challenging than those in ins truction set design

Dr. Amr Talaat Elect 707 Computer Architecture Computer Architecture is Design and Analysis Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Creativity Good Ideas Mediocre Ideas Bad Ideas Cost / Performance Analysis

Dr. Amr Talaat Elect 707 Computer Architecture Administrivia

Dr. Amr Talaat Elect 707 Computer Architecture Course Focus Understanding the design techniques, machine structu res, technology factors, evaluation methods that will determine the form of computers in 21st Century Technology Programming Languages Operating Systems History Applications Interface Design (ISA) Measurement & Eval uation Parallelism Computer Architecture: Organization Hardware/Software Boundary Compilers

Dr. Amr Talaat Elect 707 Computer Architecture Why to study Computer Architecture?  Culture of anticipating and exploiting advances in techn ology  Careful, quantitative comparisons  Define, quantity, and summarize relative performance  Define and quantity relative cost  Define and quantity dependability  Define and quantity power  Culture of well-defined interfaces that are carefully impl emented and thoroughly checked  Quantitative Principles of Design 1. Take Advantage of Parallelism 2. Principle of Locality 3. Focus on the Common Case 4. Amdahl’s Law 5. The Processor Performance Equation

Dr. Amr Talaat Elect 707 Computer Architecture 1) Taking Advantage of Parallelism  Increasing throughput of server computer via multiple processors or multiple disks  Detailed HW design (DSD course shortly)  Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic in number of bits per operand  Multiple memory banks searched in parallel in set-associative ca ches  Pipelining: overlap instruction execution to reduce the total time to c omplete an instruction sequence.  Not every instruction depends on immediate predecessor  exe cuting instructions completely/partially in parallel possible  Classic 5-stage pipeline: 1) Instruction Fetch (Ifetch), 2) Register Read (Reg), 3) Execute (ALU), 4) Data Memory Access (Dmem), 5) Register Write (Reg)

Dr. Amr Talaat Elect 707 Computer Architecture 2) The Principle of Locality  The Principle of Locality:  Program access a relatively small portion of the address spa ce at any instant of time.  Two Different Types of Locality:  Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse)  Spatial Locality (Locality in Space): If an item is referenced, ite ms whose addresses are close by tend to be referenced soo n (e.g., straight-line code, array access)  Last 30 years, HW relied on locality for memory perf. P MEM $

Dr. Amr Talaat Elect 707 Computer Architecture Levels of the Memory Hierarchy CPU Registers 100s Bytes 300 – 500 ps ( ns) L1 and L2 Cache 10s-100s K Bytes ~1 ns - ~10 ns $1000s/ GByte Main Memory G Bytes 80ns- 200ns ~ $100/ GByte Disk 10s T Bytes, 10 ms (10,000,000 ns) ~ $1 / GByte Capacity Access Time Cost Tape infinite sec-min ~$1 / GByte Registers L1 Cache Memory Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler 1-8 bytes cache cntl bytes OS 4K-8K bytes user/operator Mbytes Upper Level Lower Level faster Larger L2 Cache cache cntl bytes Blocks

Dr. Amr Talaat Elect 707 Computer Architecture 3) Focus on the Common Case  Common sense guides computer design  Since its engineering, common sense is valuable  In making a design trade-off, favor the frequent case o ver the infrequent case  E.g., Instruction fetch and decode unit used more frequen tly than multiplier, so optimize it 1st  E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimiz e it 1st  Frequent case is often simpler and can be done faster t han the infrequent case  E.g., overflow is rare when adding 2 numbers, so improve performance by optimizing more common case of no over flow  May slow down overflow, but overall performance improve d by optimizing for the normal case  What is frequent case and how much performance impr oved by making case faster => Amdahl’s Law

Dr. Amr Talaat Elect 707 Computer Architecture 4) Amdahl’s Law Best you could ever hope to do:

Dr. Amr Talaat Elect 707 Computer Architecture Amdahl’s Law example  New CPU 10X faster  I/O bound server, so 60% time waiting for I/O Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster

Dr. Amr Talaat Elect 707 Computer Architecture 5) Processor performance equation CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X inst count CPI Cycle time

Dr. Amr Talaat Elect 707 Computer Architecture 5 Steps of MIPS Datapath Figure A.2, Page A-8 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc LMDLMD ALU MUX Memory Reg File MUX Data Memory MUX Sign Extend 4 Adder Zero? Next SEQ PC Address Next PC WB Data Inst RD RS1 RS2 Imm

Dr. Amr Talaat Elect 707 Computer Architecture 5 Steps of MIPS Datapath Figure A.3, Page A-9 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc ALU Memory Reg File MUX Data Memory MUX Sign Extend Zero? IF/ID ID/EX MEM/WB EX/MEM 4 Adder Next SEQ PC RD WB Data Next PC Address RS1 RS2 Imm MUX

Dr. Amr Talaat Elect 707 Computer Architecture 5 Steps of MIPS Datapath Figure A.3, Page A-9 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc ALU Memory Reg File MUX Data Memory MUX Sign Extend Zero? IF/ID ID/EX MEM/WB EX/MEM 4 Adder Next SEQ PC RD WB Data Data stationary control – local decode for each instruction phase / pipeline stage Next PC Address RS1 RS2 Imm MUX

Dr. Amr Talaat Elect 707 Computer Architecture Visualizing Pipelining Figure A.2, Page A-8 I n s t r. O r d e r Time (clock cycles) Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5

Dr. Amr Talaat Elect 707 Computer Architecture Pipelining is not quite that easy!  Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle  Structural hazards: HW cannot support this combination of ins tructions (single person to fold and put clothes away)  Data hazards: Instruction depends on result of prior instructio n still in the pipeline (missing sock)  Control hazards: Caused by delay between the fetching of ins tructions and decisions about changes in control flow (branch es and jumps).

Dr. Amr Talaat Elect 707 Computer Architecture One Memory Port/Structural Hazards Figure A.4, Page A-14 I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMem Ifetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg

Dr. Amr Talaat Elect 707 Computer Architecture One Memory Port/Structural Hazards (Similar to Figure A.5, Page A-15) I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Stall Instr 3 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg Bubble How do you “bubble” the pipe?

Dr. Amr Talaat Elect 707 Computer Architecture Speed Up Equation for Pipelining For simple RISC pipeline, CPI = 1:

Dr. Amr Talaat Elect 707 Computer Architecture Example: Dual-port vs. Single-port  Machine A: Dual ported memory (“Harvard Architecture”)  Machine B: Single ported memory, but its pipelined implement ation has a 1.05 times faster clock rate  Ideal CPI = 1 for both  Loads are 40% of instructions executed SpeedUp A = Pipeline Depth/(1 + 0) x (clock unpipe /clock pipe ) = Pipeline Depth SpeedUp B = Pipeline Depth/( x 1) x (clock unpipe /(clock unpipe / 1.0 5) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUp A / SpeedUp B = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33  Machine A is 1.33 times faster

Dr. Amr Talaat Elect 707 Computer Architecture I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Data Hazard on R1 Figure A.6, Page A-17 Time (clock cycles) IFID/RF EX MEM WB

Dr. Amr Talaat Elect 707 Computer Architecture  Read After Write (RAW) Instr J tries to read operand before Instr I writes it  Caused by a “Dependence” (in compiler nomenclature). Thi s hazard results from an actual need for communication. Three Generic Data Hazards I: add r1,r2,r3 J: sub r4,r1,r3

Dr. Amr Talaat Elect 707 Computer Architecture  Write After Read (WAR) Instr J writes operand before Instr I reads it  Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”.  Can’t happen in MIPS 5 stage pipeline because:  All instructions take 5 stages, and  Reads are always in stage 2, and  Writes are always in stage 5 I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards

Dr. Amr Talaat Elect 707 Computer Architecture Three Generic Data Hazards  Write After Write (WAW) Instr J writes operand before Instr I writes it.  Called an “output dependence” by compiler writers This also results from the reuse of name “r1”.  Can’t happen in MIPS 5 stage pipeline because:  All instructions take 5 stages, and  Writes are always in stage 5  Will see WAR and WAW in more complicated pipes I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

Dr. Amr Talaat Elect 707 Computer Architecture Time (clock cycles) Forwarding to Avoid Data Hazard Figure A.7, Page A-19 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg

Dr. Amr Talaat Elect 707 Computer Architecture HW Change for Forwarding Figure A.23, Page A-37 MEM/WR ID/EX EX/MEM Data Memory ALU mux Registers NextPC Immediate mux What circuit detects and resolves this hazard?

Dr. Amr Talaat Elect 707 Computer Architecture 32 Time (clock cycles) Forwarding to Avoid LW-SW Data Hazard Figure A.8, Page A-20 I n s t r. O r d e r add r1,r2,r3 lw r4, 0(r1) sw r4,12(r1) or r8,r6,r9 xor r10,r9,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg

Dr. Amr Talaat Elect 707 Computer Architecture Time (clock cycles) I n s t r. O r d e r lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9 Data Hazard Even with Forwarding Figure A.9, Page A-21 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg

Dr. Amr Talaat Elect 707 Computer Architecture Data Hazard Even with Forwarding (Similar to Figure A.10, Page A-21) Time (clock cycles) or r8,r1,r9 I n s t r. O r d e r lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 Reg ALU DMemIfetch Reg Ifetch ALU DMem Reg Bubble Ifetch ALU DMem Reg Bubble Reg Ifetch ALU DMem Bubble Reg H ow is this detected?

Dr. Amr Talaat Elect 707 Computer Architecture 11/6/2015CS252-s06, Lec 0 2-intro 35 Control Hazard on Branches Three Stage Stall 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg What do you do with the 3 instructions in between? How do you do it? Where is the “commit”?

Dr. Amr Talaat Elect 707 Computer Architecture Branch Stall Impact  If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!  Two part solution:  Determine branch taken or not sooner, AND  Compute taken branch address earlier  MIPS branch tests if register = 0 or  0  MIPS Solution:  Move Zero test to ID/RF stage  Adder to calculate new PC in ID/RF stage  1 clock cycle penalty for branch versus 3

Dr. Amr Talaat Elect 707 Computer Architecture Adder IF/ID Pipelined MIPS Datapath Figure A.24, page A-38 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc ALU Memory Reg File MUX Data Memory MUX Sign Extend Zero? MEM/WB EX/MEM 4 Adder Next S EQ PC RD WB Data Interplay of instruction set design and cycle time. Next PC Address RS1 RS2 Imm MUX ID/EX