Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January 24 2008 Session 2.

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

PIPELINING AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Lecture 12 Reduce Miss Penalty and Hit Time

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

CMPT 334 Computer Organization

Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.

1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

Computing Systems Memory Hierarchy.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

CMPE 421 Parallel Computer Architecture

In1210/01-PDS 1 TU-Delft The Memory System. in1210/01-PDS 2 TU-Delft Organization Word Address Byte Address

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 3.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

Speeding up of pipeline segments © Fr Dr Jaison Mulerikkal CMI.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.

CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.

The original MIPS I CPU ISA has been extended forward three times The practical result is that a processor implementing MIPS IV is also able to run MIPS.

Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Principles of Linear Pipelining

The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.

Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }

Chapter One Introduction to Pipelined Processors

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Performance improvements ( 1 ) How to improve performance ? Reduce the number of cycles per instruction and/or Simplify the organization so that the clock.

COSC3330 Computer Architecture

CS2100 Computer Organization

Appendix B. Review of Memory Hierarchy

5.2 Eleven Advanced Optimizations of Cache Performance

Cache Memory Presentation I

Morgan Kaufmann Publishers Memory & Cache

Pipelining and Vector Processing

Chapter 5 Memory CSE 820.

Systems Architecture II

Ka-Ming Keung Swamy D Ponpandi

CMSC 611: Advanced Computer Architecture

Lecture 20: OOO, Memory Hierarchy

Lecture 20: OOO, Memory Hierarchy

CSC3050 – Computer Architecture

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Ka-Ming Keung Swamy D Ponpandi

Overview Problem Solution CPU vs Memory performance imbalance

Pipelining and Superscalar Techniques

Presentation transcript:

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2

Computer Science and Engineering Copyright by Hesham El-Rewini Contents (Memory)  Memory Hierarchy  Cache Memory  Placement Policies n Direct Mapping n Fully Associative n Set Associative  Replacement Policies n FIFO, Random, Optimal, LRU, MRU  Cache Write Policies

Computer Science and Engineering Copyright by Hesham El-Rewini Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit

Computer Science and Engineering Copyright by Hesham El-Rewini Sequence of events 1.Processor makes a request for X 2.X is sought in the cache 3.If it exists  hit (hit ratio h) 4.Otherwise  miss (miss ratio m = 1-h) 5.If miss  X is sought in main memory 6.It can be generalized to more levels

Computer Science and Engineering Copyright by Hesham El-Rewini Cache Memory  The idea is to keep the information expected to be used more frequently in the cache.  Locality of Reference n Temporal Locality n Spatial Locality  Placement Policies  Replacement Policies

Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache

Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies n Direct Mapping n Fully Associative n Set Associative

Computer Science and Engineering Copyright by Hesham El-Rewini Direct Mapping  Simplest  A memory block is mapped to a fixed cache block frame (many to one mapping)  J = I mod N n J  Cache block frame number n I  Memory block number n N  number of cache block frames

Computer Science and Engineering Copyright by Hesham El-Rewini Address Format  Memory  M blocks  Block size  B words  Cache  N blocks  Address size log 2 (M * B) TagBlock frameWord log 2 Blog 2 NRemaining bits log 2 M/N

Computer Science and Engineering Copyright by Hesham El-Rewini Example  Memory  4K blocks  Block size  16 words  Address size log 2 (4K * 16) = 16  Cache  128 blocks TagBlock frameWord 475

Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) MemoryTagcache bits

Computer Science and Engineering Copyright by Hesham El-Rewini Fully Associative  Most flexible  A memory block is mapped to any available cache block frame  (many to many mapping)  Associative Search

Computer Science and Engineering Copyright by Hesham El-Rewini Address Format  Memory  M blocks  Block size  B words  Cache  N blocks  Address size log 2 (M * B) TagWord log 2 BRemaining bits log 2 M

Computer Science and Engineering Copyright by Hesham El-Rewini Example  Memory  4K blocks  Block size  16 words  Address size log 2 (4K * 16) = 16  Cache  128 blocks TagWord 412

Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) Memory Tagcache 12 bits

Computer Science and Engineering Copyright by Hesham El-Rewini Set Associative  Compromise between the other two  Cache  number of sets  Set  number of blocks  A memory block is mapped to any available cache block frame within a specific set  Associative Search within a set

Computer Science and Engineering Copyright by Hesham El-Rewini Address Format  Memory  M blocks  Block size  B words  Cache  N blocks  Number of sets S  N/num of blocks per set  Address size log 2 (M * B) log 2 B TagSetWord log 2 S Remaining bits log 2 M/S

Computer Science and Engineering Copyright by Hesham El-Rewini Example  Memory  4K blocks  Block size  16 words  Address size log 2 (4K * 16) = 16  Cache  128 blocks  Num of blocks per set = 4  Number of sets = 32 4 TagSetWord 57

Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) Set 0 Tag cache 7 bits Set Memory

Computer Science and Engineering Copyright by Hesham El-Rewini Comparison  Simplicity  Associative Search  Cache Utilization  Replacement

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise The instruction set for your architecture has 40-bit addresses, with each addressable item being a byte. You elect to design a four-way set-associative cache with each of the four blocks in a set containing 64 bytes. Assume that you have 256 sets in the cache. Show the Format of the address

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.)  Address size = 40  Block size  64 words  Num of blocks per set = 4  Number of sets = 256  Cache  256*4 blocks 6 TagSetWord 826

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) Consider the following sequence of addresses. (All are hex numbers) 0E1B01AA05 0E1B01AA07 0E1B2FE305 0E1B4FFD8F 0E1B01AA0E In your cache, what will be the tags in the sets(s) that contain these references at the end of the sequence? Assume that the cache is initially flushed (empty).

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B01AA05 0E1B E1B01AA07 0E1B E1B2FE305 0E1B2F

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B4FFD8F 0E1B4F E1B01AA0E 0E1B

Computer Science and Engineering Copyright by Hesham El-Rewini Replacement Techniques  FIFO  LRU  MRU  Random  Optimal

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise Suppose that your cache can hold only three blocks and the block requests are as follows: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 Show the contents of the cache if the replacement policy is a) LRU, b) FIFO, c) Optimal

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) FIFO MRU

Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) OPT LRU 7 0 1

Computer Science and Engineering Copyright by Hesham El-Rewini Cache Write Policies  Cache Hit n Write Through n Write Back  Cache Miss n Write-allocate n Write-no-allocate

Computer Science and Engineering Copyright by Hesham El-Rewini Read Policy -- Cache Miss  Missed block is brought to cache – required word forwarded immediately to the CPU  Missed block is entirely stored in the cache and the required word is then forwarded to the CPU

Computer Science and Engineering Copyright by Hesham El-Rewini Pentium IV two-level cache Cache Level 1 L1 Cache Level 2 L2 Main Memory Processor

Computer Science and Engineering Copyright by Hesham El-Rewini Cache L1 Cache organizationSet-Associative Block Size64 bytes Cache L1 size 8KB Number of blocks per setFour CPU AddressingByte addressable

Computer Science and Engineering Copyright by Hesham El-Rewini CPU and Memory Interface MAR MDR CPU n - 1 b n lines b lines R / W Main Memory

Computer Science and Engineering Copyright by Hesham El-Rewini Pipelining

Computer Science and Engineering Copyright by Hesham El-Rewini Contents  Introduction  Linear Pipelines  Nonlinear pipelines

Computer Science and Engineering Copyright by Hesham El-Rewini Basic Idea  Assembly Line  Divide the execution of a task among a number of stages  A task is divided into subtasks to be executed in sequence  Performance improvement compared to sequential execution

Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks

Computer Science and Engineering Copyright by Hesham El-Rewini 5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task Time

Computer Science and Engineering Copyright by Hesham El-Rewini Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1

Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline  Processing Stages are linearly connected  Perform fixed function  Synchronous Pipeline  Clocked latches between Stage i and Stage i+1  Equal delays in all stages  Asynchronous Pipeline (Handshaking)

Computer Science and Engineering Copyright by Hesham El-Rewini Latches S1 S2 S3 L1 L2 Equal delays  clock period Slowest stage determines delay

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table X X X X S1 S2 S3 S4 Time

Computer Science and Engineering Copyright by Hesham El-Rewini 5 tasks on 4 stages XXXXX XXXXX XXXXX XXXXX S1 S2 S3 S4 Time

Computer Science and Engineering Copyright by Hesham El-Rewini Non Linear Pipelines  Variable functions  Feed-Forward  Feedback

Computer Science and Engineering Copyright by Hesham El-Rewini 3 stages & 2 functions S1 S2 S3 Y X

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Tables for X & Y XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3

Computer Science and Engineering Copyright by Hesham El-Rewini Linear Instruction Pipelines Assume the following instruction execution phases: n Fetch (F) n Decode (D) n Operand Fetch (O) n Execute (E) n Write results (W)

Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Instruction Execution I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 F D E W O

Computer Science and Engineering Copyright by Hesham El-Rewini Dependencies nData Dependency (Operand is not ready yet) nInstruction Dependency (Branching) Will that Cause a Problem?

Computer Science and Engineering Copyright by Hesham El-Rewini Data Dependency I 1 -- Add R1, R2, R3 I 2 -- Sub R4, R1, R5 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O

Computer Science and Engineering Copyright by Hesham El-Rewini Solutions  STALL  Forwarding  Write and Read in one cycle  ….

Computer Science and Engineering Copyright by Hesham El-Rewini Instruction Dependency I 1 – Branch o I 2 – I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O

Computer Science and Engineering Copyright by Hesham El-Rewini Solutions  STALL  Predict Branch taken  Predict Branch not taken  ….

Computer Science and Engineering Copyright by Hesham El-Rewini Floating Point Multiplication  Inputs (Mantissa 1, Exponenet 1 ), (Mantissa 2, Exponent 2 )  Add the two exponents  Exponent-out  Multiple the 2 mantissas  Normalize mantissa and adjust exponent  Round the product mantissa to a single length mantissa. You may adjust the exponent

Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point multiplication Add Exponents Multiply Mantissa Normalize Round Partial Products Accumulator Add Exponents Normalize Round Re normalize

Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point Addition Partial Shift Add Mantissa Subtract Exponents Find Leading 1 Round Re normalize Partial Shift

Computer Science and Engineering Copyright by Hesham El-Rewini Combined Adder and Multiplier Partial Shift Add Mantissa Exponents Subtract / ADD Find Leading 1 Round Re normalize Partial Shift Partial Products C A B ED F G H