Computer Architecture Key Points

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Processor - Memory Interface
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
COMPSYS 304 Computer Architecture Memory Management Units Reefed down - heading for Great Barrier Island.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Lecture 19: Virtual Memory
Computer Architecture Memory Management Units Iolanthe II - Reefed down, heading for Great Barrier Island.
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture Key Points John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe II drifts off Waiheke.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
Computer Architecture Key Points John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe II drifts off Waiheke.
Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots on Cockburn.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
ECE232: Hardware Organization and Design
Yu-Lun Kuo Computer Sciences and Information Engineering
CS161 – Design and Architecture of Computer
Memory and cache CPU Memory I/O.
CS352H: Computer Systems Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
Memory and cache CPU Memory I/O.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSC3050 – Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Presentation transcript:

Computer Architecture Key Points John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe II drifts off Waiheke Island

Memory Bottleneck State-of-the-art processor Bulk semiconductor RAM f = 3 GHz tclock = 330ps 1-2 instructions per cycle ~25% memory reference Memory response 4 instructions x 330ps ~1.2ns needed! Bulk semiconductor RAM 100ns+ for a ‘random’ access! Processor will spend most of its time waiting for memory!

Memory Bottleneck Assume * Clock speed, f = 3GHz Cycle time, tcyc = 1/f = 330ps * 32-bit = 4 byte machine word Internal bandwidth = (bytes per word) * f = 4 * f = 12 GB/s * 64-bit PCI bus, fbus = 32 MHz !! Clearly a bottleneck! Needs to be 3 GB/s for 25% load store instructions! Arrow width (roughly) indicates data bandwidth

Cache Small, fast memory 2nd level of the memory hierarchy Typically ~50kbytes (1998) 2 cycle access time Same die as processor “Off-chip” cache possible Custom cache chip closely coupled to processor Use fast static RAM (SRAM) rather than slower dynamic RAM Several levels possible 2nd level of the memory hierarchy “Caches” most recently used memory locations “closer” to the processor closer = closer in time

Memory Bottleneck Assume * Clock speed, f = 3GHz Cycle time, tcyc = 1/f = 330ps * 32-bit = 4 byte machine word Internal bandwidth = (bytes per word) * f = 4 * f = 12 GB/s * 64-bit PCI bus, fbus = 32 MHz Cache provides small, very fast memory ‘close’ to processor ‘Close’ = close in time, ie with high bandwidth connection Arrow width (roughly) indicates data bandwidth

Memory Bottleneck Assume Small  fast * Clock speed, f = 3GHz Cycle time, tcyc = 1/f = 330ps * 32-bit = 4 byte machine word Internal bandwidth = (bytes per word) * f = 4 * f = 12 GB/s * 64-bit PCI bus, fbus = 32 MHz Small  fast Larger  slower Very large  very slow Arrow width (roughly) indicates data bandwidth

Memory hierarchy & performance Usual metric is machine cycle time, tcyc = 1/f Visible to programmer Registers < 1 cycle latency (respond in same cycle) Transparent to programmer Level 1 (L1) cache 2 cycle latency L2 cache 5-6 cycles L3 cache about 10 cycles Main memory 100+ cycles for a random access Disc > 1 ms or >106 cycles Effective memory access time, teff = S fi ti where fi = fraction of hits at level i, ti = access time at level i

Cache - organisation Direct-mapped cache Each word in the cache has a tag Assume cache size - 2k words machine words - p bits byte-addressed memory m = log2 ( p/8 ) bits not used to address words m = 2 for 32-bit machines Address format p bits p-k-m k m tag cache address byte address

Cache - organisation Direct-mapped cache A cache line data tag 2k lines memory p p-k-m Hit? p-k-m k m CPU tag cache address byte address Memory address

Cache - Conflicts Conflicts p bits p-k-m k m Two addresses separated by 2k+m will hit the same cache location p bits p-k-m k m cache address tag byte address Addresses in which these k bits are the same will map to the same cache line

Cache - Conflicts When a word is modified in cache Write-back cache Only writes data back when needed Misses Two memory accesses Write modified word back Read new word Write-through cache Low priority write to main memory is queued Processor is delayed by read only Memory write occurs in parallel with other work Instruction and necessary data fetches take priority

Cache - Write-through or write-back? Seems a good idea! but ... Multiple writes to the same location waste memory bus bandwidth Typical programs better with write-back caches however Often you can easily predict which will be best Some processors (eg PowerPC) allow you to classify memory regions as write-back or write-through

Cache - more bits Cache lines need some status bits Tag bits + .. Valid All set to false on power up Set to true as words are loaded into cache Dirty Needed by write-back cache Write- through cache always queues the write, so lines are never ‘dirty’ Cache line Tag V M Data p-k-m 1 1 p

Cache – Improving Performance Conflicts ( addresses 2k+m bytes apart ) Degrade cache performance Lower hit rate Murphy’s Law operates Addresses are never random! Some locations ‘thrash’ in cache Continually replaced and restored Alternatively Ideal cache performance depends on uniform access to all parts of memory Never happens in real programs!

Cache - Fully Associative All tags are compared at the same time Words can use any cache line

Cache - Fully Associative Each tag is compared at the same time Any match  hit Avoids ‘unnecessary’ flushing Replacement Least Recently Used - LRU Needs extra status bits Cycles since last accessed Hardware cost high Extra comparators Wider tags p-m bits vs p-k-m bits

Cache - Set Associative 2-way set associative Each line - two words two comparators only

Cache - Set Associative n-way set associative caches n can be small: 2, 4, 8 Best performance Reasonable hardware cost Most high performance processors Replacement policy LRU choice from n Reasonable LRU approximation 1 or 2 bits Set on access Cleared / decremented by timer Choose cleared word for replacement

Cache - Locality of Reference Temporal Locality Same location will be referenced again soon Access same data again Program loops - access same instruction again Caches described so far exploit temporal locality Spatial Locality Nearby locations will be referenced soon Next element of an array Next instruction of a program

Cache - Line Length Spatial Locality Use very long cache lines Fetch one datum Neighbours fetched also PowerPC 601 (Motorola/Apple/IBM) first of the single chip Power processors 64 sets 8-way set associative 32 bytes per line 32 bytes (8 instructions) fetched into instruction buffer in one cycle 64 x 8 x 32 = 16k byte total

Cache - Separate I- and D-caches Unified cache Instructions and Data in same cache Two caches - * Instructions * Data Increases total bandwidth MIPS R10000 32Kbyte Instruction; 32Kbyte Data Instruction cache is pre-decoded! (32  36bits) Data 8-word (64byte) line, 2-way set associative 256 sets Replacement policy?

Computer Architecture Memory Management Units COMPSYS 304 Computer Architecture Memory Management Units Reefed down - heading for Great Barrier Island

Memory Management Unit Virtual Address Space Each user has a “private” address space User D’s Address Space

Virtual Addresses Mappings between user space and physical memory created by OS

Memory Management Unit (MMU) Responsible for VIRTUAL  PHYSICAL address mapping Sits between CPU and cache Cache operates on Physical Addresses (mostly - some research on VA cache) VA PA MMU PA Main Mem CPU Cache D or I D or I

MMU - operation q-k

MMU - Virtual memory space Page Table Entries can also point to disc blocks Valid bit Set: page in memory address is physical page address Cleared: page “swapped out” address is disc block address MMU hardware generates page fault when swapped out page is requested Allows virtual memory space to be larger than physical memory Only “working set” is in physical memory Remainder on paging disc

Page Fault q-k

MMU – Page faults Very expensive! Gap in access times Main memory ~100+ ns Disc ~1+ ms A factor of 104 slower!! May require write-back of old (but modified) page May require reading of Page Table Entries from disc! Good way to make a system thrash!

MMU – Access control Provides additional protection to programmer Pages can be marked Read only Execute only Can prevent wayward programmes from corrupting their own programme code or vital data Protection is hardware! MMU will raise exception if illegal access attempted OS traps the exception and process it

MMU Inverted page tables Sharing Scheme which saves memory for page tables One PTE per page of physical memory Hash function used Collisions probable Possibly slower Sharing Map virtual pages for several users to same physical page Good for sharing program code Data also (read/write control provided by OS) Saves physical memory Reduces pressure on main memory

MMU TLB TLB Coverage Cache for page table entries Enables MMU to translate VA  PA in time! Can be quite small: 50-100 entries Often fully associative Small size avoids one ‘cost’ of FA cache Only 50-100 comparators needed TLB Coverage Amount of memory covered by TLB entries Size of a program for which VA  PA translation will be fast

Memory Hierarchy - Operation