University of Washington Memory and Caches I The Hardware/Software Interface CSE351 Winter 2013.

Slides:



Advertisements
Similar presentations
Cache Memory Organization
Advertisements

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Today Memory hierarchy, caches, locality Cache organization
Cache Memories September 30, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
1 Memory Hierarchy Professor Jennifer Rexford
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Systems I Locality and Caching
ECE Dept., University of Toronto
Memory Hierarchy 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
University of Washington Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);
Lecture 19 Today’s topics Types of memory Memory hierarchy.
– 1 – , F’02 Caching in a Memory Hierarchy Larger, slower, cheaper storage device at level k+1 is partitioned into blocks.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
Introduction to Computer Systems Topics: Theme Five great realities of computer systems (continued) “The class that bytes”
CS 105 Tour of the Black Holes of Computing
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Memory hierarchy. – 2 – Memory Operating system and CPU memory management unit gives each process the “illusion” of a uniform, dedicated memory space.
The Memory Hierarchy Topics Storage technologies and trends Locality of reference Caching in the memory hierarchy CS 105 Tour of the Black Holes of Computing.
1 Memory Hierarchy ( Ⅲ ). 2 Outline The memory hierarchy Cache memories Suggested Reading: 6.3, 6.4.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Carnegie Mellon Introduction to Computer Systems /18-243, spring th Lecture, Feb. 19 th Instructors: Gregory Kesden and Markus Püschel.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Lecture 5: Memory Performance. Types of Memory Registers L1 cache L2 cache L3 cache Main Memory Local Secondary Storage (local disks) Remote Secondary.
Advanced Topics: Prefetching ECE 454 Computer Systems Programming Topics: UG Machine Architecture Memory Hierarchy of Multi-Core Architecture Software.
University of Washington Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);
University of Washington Today Midterm topics end here. HW 2 is due Wednesday: great midterm review. Lab 3 is posted. Due next Wednesday (July 31)  Time.
Memory Hierarchy 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 26 Memory Hierarchy Design (Concept of Caching and Principle of Locality)
COSC3330 Computer Architecture
CSE 351 Section 9 3/1/12.
Final exam: Wednesday, March 20, 2:30pm
Optimization III: Cache Memories
Cache Memories CSE 238/2038/2138: Systems Programming
How will execution time grow with SIZE?
The Hardware/Software Interface CSE351 Winter 2013
Today How’s Lab 3 going? HW 3 will be out today
CS 105 Tour of the Black Holes of Computing
Local secondary storage (local disks)
The Memory Hierarchy : Memory Hierarchy - Cache
Memory hierarchy.
ReCap Random-Access Memory (RAM) Nonvolatile Memory
Cache Memories September 30, 2008
Memory hierarchy.
Introduction to Computer Systems
CS 105 Tour of the Black Holes of Computing
Caches I CSE 351 Winter 2018 Instructor: Mark Wyse
Feb 11 Announcements Memory hierarchies! How’s Lab 3 going?
Caches I CSE 351 Winter 2018 Instructor: Mark Wyse
CSE 153 Design of Operating Systems Winter 2018
Instructors: Majd Sakr and Khaled Harras
Caches I CSE 351 Autumn 2018 Instructors: Max Willsey Luis Ceze
Cache Memory and Performance
Memory Management Jennifer Rexford.
Presentation transcript:

University of Washington Memory and Caches I The Hardware/Software Interface CSE351 Winter 2013

University of Washington Roadmap 2 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100); c.setGals(17); float mpg = c.getMPG(); get_mpg: pushq %rbp movq %rsp, %rbp... popq %rbp ret Java: C: Assembly language: Machine code: Computer system: OS: Data & addressing Integers & floats Machine code & C x86 assembly programming Procedures & stacks Arrays & structs Memory & caches Processes Virtual memory Memory allocation Java vs. C Winter 2013 Memory and Caches I

University of Washington Themes of CSE 351 Interfaces and abstractions  So far: data type abstractions in C; x86 instruction set architecture (interface to hardware)  Today: abstractions of memory  Soon: process and virtual memory abstractions Representation  Integers, floats, addresses, arrays, structs Translation  Understand the assembly code that will be generated from C code Control flow  Procedures and stacks; buffer overflows Winter Memory and Caches I

University of Washington Making memory accesses fast! Cache basics Principle of locality Memory hierarchies Cache organization Program optimizations that consider caches 4 Winter 2013 Memory and Caches I

University of Washington How does execution time grow with SIZE? Winter Memory and Caches I int array[SIZE]; int A = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { A += array[j]; } SIZE TIME Plot

University of Washington Actual Data Winter SIZE Time Memory and Caches I

University of Washington Problem: Processor-Memory Bottleneck Main Memory CPUReg Processor performance doubled about every 18 months Bus bandwidth evolved much slower Core 2 Duo: Can process at least 256 Bytes/cycle Core 2 Duo: Bandwidth 2 Bytes/cycle Latency 100 cycles Problem: lots of waiting on memory 7 Winter 2013 Memory and Caches I

University of Washington Problem: Processor-Memory Bottleneck Main Memory CPUReg Processor performance doubled about every 18 months Bus bandwidth evolved much slower Core 2 Duo: Can process at least 256 Bytes/cycle Core 2 Duo: Bandwidth 2 Bytes/cycle Latency 100 cycles Solution: caches 8 Winter 2013 Memory and Caches I Cache

University of Washington Cache English definition: a hidden storage space for provisions, weapons, and/or treasures CSE definition: computer memory with short access time used for the storage of frequently or recently used instructions or data (i-cache and d-cache) more generally, used to optimize data transfers between system elements with different characteristics (network interface cache, I/O cache, etc.) 9 Winter 2013 Memory and Caches I

University of Washington General Cache Mechanics Winter Memory and Caches I Cache Memory Larger, slower, cheaper memory viewed as partitioned into “blocks” Data is copied in block-sized transfer units Smaller, faster, more expensive memory caches a subset of the blocks

University of Washington General Cache Concepts: Hit Cache Memory Data in block b is needed Request: Block b is in cache: Hit! Winter 2013 Memory and Caches I 11

University of Washington General Cache Concepts: Miss Cache Memory Data in block b is needed Request: 12 Block b is not in cache: Miss! Block b is fetched from memory Request: Block b is stored in cache Placement policy: determines where b goes Replacement policy: determines which block gets evicted (victim) Winter 2013 Memory and Caches I 12

University of Washington Cost of Cache Misses Huge difference between a hit and a miss  Could be 100x, if just L1 and main memory Would you believe 99% hits is twice as good as 97%?  Consider: Cache hit time of 1 cycle Miss penalty of 100 cycles  Average access time:  97% hits: 1 cycle * 100 cycles = 4 cycles  99% hits: 1 cycle * 100 cycles = 2 cycles This is why “miss rate” is used instead of “hit rate” Winter Memory and Caches I

University of Washington Why Caches Work Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently Temporal locality:  Recently referenced items are likely to be referenced again in the near future  Why is this important? Spatial locality:  Items with nearby addresses tend to be referenced close together in time  How do caches take advantage of this? block 14 Winter 2013 Memory and Caches I

University of Washington Example: Locality? Data:  Temporal: sum referenced in each iteration  Spatial: array a[] accessed in stride-1 pattern Instructions:  Temporal: cycle through loop repeatedly  Spatial: reference instructions in sequence; number of instructions is small Being able to assess the locality of code is a crucial skill for a programmer int sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; 15 Winter 2013 Memory and Caches I

University of Washington Locality Example #1 int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } 16 a[0][0]a[0][1]a[0][2]a[0][3] a[1][0]a[1][1]a[1][2]a[1][3] a[2][0]a[2][1]a[2][2]a[2][3] 1: a[0][0] 2: a[0][1] 3: a[0][2] 4: a[0][3] 5: a[1][0] 6: a[1][1] 7: a[1][2] 8: a[1][3] 9: a[2][0] 10: a[2][1] 11: a[2][2] 12: a[2][3] stride-1 Winter 2013 Memory and Caches I

University of Washington Locality Example #2 int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } 17 a[0][0]a[0][1]a[0][2]a[0][3] a[1][0]a[1][1]a[1][2]a[1][3] a[2][0]a[2][1]a[2][2]a[2][3] 1: a[0][0] 2: a[1][0] 3: a[2][0] 4: a[0][1] 5: a[1][1] 6: a[2][1] 7: a[0][2] 8: a[1][2] 9: a[2][2] 10: a[0][3] 11: a[1][3] 12: a[2][3] stride-N Winter 2013 Memory and Caches I

University of Washington Memory Hierarchies Some fundamental and enduring properties of hardware and software systems:  Faster storage technologies almost always cost more per byte and have lower capacity  The gaps between memory technology speeds are widening  True for: registers ↔ cache, cache ↔ DRAM, DRAM ↔ disk, etc.  Well-written programs tend to exhibit good locality These properties complement each other beautifully They suggest an approach for organizing memory and storage systems known as a memory hierarchy Winter Memory and Caches I

University of Washington An Example Memory Hierarchy Winter Memory and Caches I registers on-chip L1 cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, cheaper per byte remote secondary storage (distributed file systems, web servers) Local disks hold files retrieved from disks on remote network servers Main memory holds disk blocks retrieved from local disks off-chip L2 cache (SRAM) L1 cache holds cache lines retrieved from L2 cache CPU registers hold words retrieved from L1 cache L2 cache holds cache lines retrieved from main memory Smaller, faster, costlier per byte

University of Washington Memory Hierarchies Fundamental idea of a memory hierarchy:  For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Why do memory hierarchies work?  Because of locality, programs tend to access the data at level k more often than they access the data at level k+1.  Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit. Big Idea: The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top. Winter 2013 Memory and Caches I 20

University of Washington Cache Performance Metrics Miss Rate  Fraction of memory references not found in cache (misses / accesses) = 1 - hit rate  Typical numbers (in percentages):  3% - 10% for L1  Can be quite small (e.g., < 1%) for L2, depending on size, etc. Hit Time  Time to deliver a line in the cache to the processor  Includes time to determine whether the line is in the cache  Typical hit times: clock cycles for L1; clock cycles for L2 Miss Penalty  Additional time required because of a miss  Typically cycles for L2 (trend: increasing!) Winter Memory and Caches I

University of Washington Hardware0On-Chip TLBAddress translationsTLB Web browser10,000,000Local diskWeb pagesBrowser cache Web cache Network cache Buffer cache Virtual Memory L2 cache L1 cache Registers Cache Type Web pages Parts of files 4-KB page 64-bytes block 4/8-byte words What is Cached? Web server1,000,000,000Remote server disks OS100Main memory Hardware1On-Chip L1 Hardware10Off-Chip L2 File system client10,000,000Local disk Hardware+OS100Main memory Compiler0CPU core Managed By Latency (cycles) Where is it Cached? 22 Winter 2013 Memory and Caches I Examples of Caching in the Hierarchy

University of Washington Memory Hierarchy: Core 2 Duo Disk Main Memory L2 unified cache L1 I-cache L1 D-cache CPUReg 2 B/cycle8 B/cycle16 B/cycle1 B/30 cyclesThroughput: Latency:100 cycles14 cycles3 cyclesmillions ~4 MB 32 KB ~4 GB~500 GB Not drawn to scale L1/L2 cache: 64 B blocks 23 Winter 2013 Memory and Caches I

University of Washington Winter Memory and Caches I