Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy & Storage CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Fall, 2008 Portions.

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Virtual Memory. The Limits of Physical Addressing CPU Memory A0-A31 D0-D31 “Physical addresses” of memory locations Data All programs share one address.
Appendix C: Review of Memory Hierarchy David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley
CPE 731 Advance Computer Architecture Memory Hierarchy Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U.
Now, Review of Memory Hierarchy
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
EEM 486 EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches.
Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Storage HierarchyCS510 Computer ArchitectureLecture Lecture 12 Storage Hierarchy.
CSC 7080 Graduate Computer Architecture Lec 12 – Advanced Memory Hierarchy 2 Dr. Khalaf Notes adapted from: David Patterson Electrical Engineering and.
Lecture 19: Virtual Memory
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Memory Hierarchy Review. 2 Review from last lecture Quantify and summarize performance –Ratios, Geometric Mean, Multiplicative Standard Deviation F&P:
Memory Hierarchy. Since 1980, CPU has outpaced DRAM... CPU 60% per yr 2X in 1.5 yrs DRAM 9% per yr 2X in 10 yrs 10 DRAM CPU Performance (1/latency) 100.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
Review (1/2) °Caches are NOT mandatory: Processor performs arithmetic Memory stores data Caches simply make data transfers go faster °Each level of memory.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Improving Memory Access 2/3 The Cache and Virtual Memory
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS 5513 Computer Architecture Lecture 4 – Memory Hierarchy Review.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CMSC 611: Advanced Computer Architecture
Memory COMPUTER ARCHITECTURE
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Lecture 23: Cache, Memory, Virtual Memory
5 Basic Cache Optimizations
CPE 631 Lecture 05: Cache Design
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
CSC3050 – Computer Architecture
Cache Memory Rabi Mahapatra
CPE 631 Lecture 04: Review of the ABC of Caches
Presentation transcript:

Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy & Storage CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Fall, 2008 Portions of these slides are derived from: Dave Patterson © UCB

Review of Mem. HierarchyCSCE430/830 The Principle of Locality The Principle of Locality: –Program access a relatively small portion of the address space at any instant of time. Two Different Types of Locality: –Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) –Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Last 15 years, HW relied on locality for speed It is a property of programs which is exploited in machine design.

Review of Mem. HierarchyCSCE430/830 Memory Hierarchy - the Big Picture Problem: memory is too slow and too small Solution: memory hierarchy Control Datapath Secondary Storage (Disk) Processor Registers L2 Off-Chip Cache Main Memory (DRAM) L1 On-Chip Cache ,000,000 (5ms)Speed (ns): <1K Size (bytes):>100G <16G<16M

Review of Mem. HierarchyCSCE430/830 Fundamental Cache Questions Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

Review of Mem. HierarchyCSCE430/830 Q1: Where can a block be placed in the upper level? Block 12 placed in 8 block cache: –Fully associative, direct mapped, 2-way set associative –S.A. Mapping = (Block Number) Modulo (Number Sets) Cache Memory Full Mapped Direct Mapped (12 mod 8) = 4 2-Way Assoc (12 mod 4) = 0

Review of Mem. HierarchyCSCE430/830 Q2: How is a block found if it is in the upper level? Tag on each block –No need to check index or block offset Increasing associativity shrinks index, expands tag Block Offset Block Address IndexTag

Review of Mem. HierarchyCSCE430/830 Q3: Which block should be replaced on a miss? Easy for Direct Mapped Set Associative or Fully Associative: –Random –LRU (Least Recently Used) Assoc: 2-way 4-way 8-way Size LRU Ran LRU Ran LRU Ran 16 KB5.2%5.7% 4.7%5.3%4.4%5.0% 64 KB1.9%2.0% 1.5%1.7% 1.4%1.5% 256 KB1.15%1.17% 1.13% 1.13% 1.12% 1.12%

Review of Mem. HierarchyCSCE430/830 Q4: What happens on a write? Write-ThroughWrite-Back Policy Data written to cache block also written to lower- level memory Write data only to the cache Update lower level when a block falls out of the cache DebugEasyHard Do read misses produce writes? NoYes Do repeated writes make it to lower level? YesNo Additional option (on miss)-- let writes to an un-cached address allocate a new cache line (“write-allocate”).

Review of Mem. HierarchyCSCE430/830 Set Associative Cache Design Key idea: –Divide cache into sets –Allow block anywhere in a set Advantages: –Better hit rate Disadvantage: –More tag bits –More hardware –Higher access time A Four-Way Set-Associative Cache

Review of Mem. HierarchyCSCE430/830 Cache Performance Measures Hit rate: fraction found in the cache –So high that we usually talk about Miss rate = 1 - Hit Rate Hit time: time to access the cache Miss penalty: time to replace a block from lower level, including time to replace in CPU –access time : time to acccess lower level –transfer time : time to transfer block Average memory-access time (AMAT) = Hit time + Miss rate x Miss penalty (ns or clocks)

Review of Mem. HierarchyCSCE430/830 Miss-oriented Approach to Memory Access: –CPI Execution includes ALU and Memory instructions Cache performance Separating out Memory component entirely –AMAT = Average Memory Access Time –CPI ALUOps does not include memory instructions

Review of Mem. HierarchyCSCE430/830 Physical Memory Space Page table maps virtual page numbers to physical frames ( “PTE” = Page Table Entry) Virtual memory => treat memory  cache for disk Details of Page Table Virtual Address Page Table index into page table Page Table Base Reg V Access Rights PA V page no.offset 12 table located in physical memory P page no.offset 12 Physical Address frame virtual address Page Table

Review of Mem. HierarchyCSCE430/830 Page tables may not fit in memory! A table for 4KB pages for a 32-bit address space has 1M entries Each process needs its own address space! P1 indexP2 indexPage Offset bit virtual address Top-level table wired in main memory Subset of 1024 second-level tables in main memory; rest are on disk or unallocated Two-level Page Tables

Review of Mem. HierarchyCSCE430/830 V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 “Page fault” Physical and virtual pages must be the same size! The TLB caches page table entries TLB Page Table virtual address page off 2 framepage 2 50 physical address page off TLB caches page table entries. MIPS handles TLB misses in software (random replacement). Other machines use hardware. for ASID Physical frame address

Review of Mem. HierarchyCSCE430/830 Virtually Indexed, Physically Tagged Cache What motivation? Fast cache hit by parallel TLB access No virtual cache shortcomings How could it be correct? Require cache way size <= page size; now physical index is from page offset Then virtual and physical indices are identical ⇒ works like a physically indexed cache!

Review of Mem. HierarchyCSCE430/830 Virtually Indexed, Physically Tagged Cache 28

Review of Mem. HierarchyCSCE430/830 Summary #1/3: The Cache Design Space Several interacting dimensions –cache size –block size –associativity –replacement policy –write-through vs write-back –write allocation The optimal choice is a compromise –depends on access characteristics »workload »use (I-cache, D-cache, TLB) –depends on technology / cost Simplicity often wins Associativity Cache Size Block Size Bad Good LessMore Factor AFactor B

Review of Mem. HierarchyCSCE430/830 Summary #2/3: Caches The Principle of Locality: –Program access a relatively small portion of the address space at any instant of time. »Temporal Locality: Locality in Time »Spatial Locality: Locality in Space Three Major Categories of Cache Misses: –Compulsory Misses: sad facts of life. Example: cold start misses. –Capacity Misses: increase cache size –Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! Write Policy: Write Through vs. Write Back Today CPU time is a function of (ops, cache misses) vs. just f(ops): affects Compilers, Data structures, and Algorithms

Review of Mem. HierarchyCSCE430/830 Summary #3/3: TLB, Virtual Memory Page tables map virtual address to physical address TLBs are important for fast translation TLB misses are significant in processor performance –funny times, as most systems can’t access all of 2nd level cache without TLB misses! Caches, TLBs, Virtual Memory all understood by examining how they deal with 4 questions: 1) Where can block be placed? 2) How is block found? 3) What block is replaced on miss? 4) How are writes handled? Today VM allows many processes to share single memory without having to swap all processes to disk; today VM protection is more important than memory hierarchy benefits, but computers insecure Prepare for debate + quiz on Wednesday

Review of Mem. HierarchyCSCE430/830 Summary of Virtual Machine Monitor Virtual Machine Revival –Overcome security flaws of modern OSes –Processor performance no longer highest priority –Manage Software, Manage Hardware “… VMMs give OS developers another opportunity to develop functionality no longer practical in today’s complex and ossified operating systems, where innovation moves at geologic pace.” [Rosenblum and Garfinkel, 2005] Virtualization challenges for processor, virtual memory, I/O –Paravirtualization, ISA upgrades to cope with those difficulties Xen as example VMM using paravirtualization –2005 performance on non-I/O bound, I/O intensive apps: 80% of native Linux without driver VM, 34% with driver VM Opteron memory hierarchy still critical to performance

Review of Mem. HierarchyCSCE430/830 Disk Device Performance Platter Arm Actuator HeadSector Inner Track Outer Track Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead Seek Time? depends no. tracks move arm, seek speed of disk Rotation Time? depends on speed disk rotates, how far sector is from head Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request Controller Spindle

Review of Mem. HierarchyCSCE430/830 Redundant Arrays of (Inexpensive) Disks Files are "striped" across multiple disks Redundancy yields high data availability –Availability: service still provided to user, even if some components failed Disks will still fail Contents reconstructed from data redundantly stored in the array  Capacity penalty to store redundant info  Bandwidth penalty to update redundant info

Review of Mem. HierarchyCSCE430/830 Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes