Feb. 2011Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.

Slides:

Advertisements

Similar presentations

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Nov. 2014Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.

Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.

July 2005Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.

Modified from notes by Saeid Nooshabadi

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.

Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

Mar. 2007Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson

Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.

Data Storage Technology

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Systems I Locality and Caching

Lecture 11: DMBS Internals

CS1104: Computer Organisation School of Computing National University of Singapore.

Lecture 19: Virtual Memory

1 CSCI 2510 Computer Organization Memory System I Organization.

FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner Module #17—Main Memory Concepts.

July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Lecture 19 Today’s topics Types of memory Memory hierarchy.

Chapter Twelve Memory Organization

Part V Memory System Design

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

Computer Organization & Programming

Slide 1 Hitting the Memory Wall Memory density and capacity have grown along with the CPU power and complexity, but memory speed has not kept pace.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

Cache Memory Chapter 17 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.

Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?

FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner Module #18—Cache Memory Organization.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

Mar. 2006Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.

CMSC 611: Advanced Computer Architecture

Memory COMPUTER ARCHITECTURE

The Goal: illusion of large, fast, cheap memory

Part V Memory System Design

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Part V Memory System Design

Lecture 23: Cache, Memory, Virtual Memory

Adapted from slides by Sally McKee Cornell University

Part V Memory System Design

Cache Memory and Performance

Presentation transcript:

Feb. 2011Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design

Feb. 2011Computer Architecture, Memory System DesignSlide 2 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers, Oxford University Press, 2005, ISBN X. It is updated regularly by the author as part of his teaching of the upper-division course ECE 154, Introduction to Computer Architecture, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. © Behrooz Parhami EditionReleasedRevised First July 2003July 2004July 2005Mar. 2006Mar Mar. 2008Feb. 2009Feb. 2011

Computer Architecture, Memory System DesignSlide 3 V Memory System Design Topics in This Part Chapter 17 Main Memory Concepts Chapter 18 Cache Memory Organization Chapter 19 Mass Memory Concepts Chapter 20 Virtual Memory and Paging Design problem – We want a memory unit that: Can keep up with the CPU’s processing speed Has enough capacity for programs and data Is inexpensive, reliable, and energy-efficient

Feb. 2011Computer Architecture, Memory System DesignSlide 4 17 Main Memory Concepts Technologies & organizations for computer’s main memory SRAM (cache), DRAM (main), and flash (nonvolatile) Interleaving & pipelining to get around “memory wall” Topics in This Chapter 17.1 Memory Structure and SRAM 17.2 DRAM and Refresh Cycles 17.3 Hitting the Memory Wall 17.4 Interleaved and Pipelined Memory 17.5 Nonvolatile Memory 17.6 The Need for a Memory Hierarchy

Feb. 2011Computer Architecture, Memory System DesignSlide Memory Structure and SRAM Fig Conceptual inner structure of a 2 h  g SRAM chip and its shorthand representation.

Feb. 2011Computer Architecture, Memory System DesignSlide 6 Multiple-Chip SRAM Fig Eight 128K  8 SRAM chips forming a 256K  32 memory unit.

Jan. 2011Computer Architecture, Background and MotivationSlide 7 SRAM Figure 2.10 SRAM memory is simply a large, single-port register file.

Feb. 2011Computer Architecture, Memory System DesignSlide DRAM and Refresh Cycles DRAM vs. SRAM Memory Cell Complexity Fig Single-transistor DRAM cell, which is considerably simpler than SRAM cell, leads to dense, high-capacity DRAM memory chips.

Feb. 2011Computer Architecture, Memory System DesignSlide 9 Fig Variations in the voltage across a DRAM cell capacitor after writing a 1 and subsequent refresh operations. DRAM Refresh Cycles and Refresh Rate

Feb. 2011Computer Architecture, Memory System DesignSlide 10 Loss of Bandwidth to Refresh Cycles Example 17.2 A 256 Mb DRAM chip is organized as a 32M  8 memory externally and as a 16K  16K array internally. Rows must be refreshed at least once every 50 ms to forestall data loss; refreshing a row takes 100 ns. What fraction of the total memory bandwidth is lost to refresh cycles? Solution Refreshing all 16K rows takes 16  1024  100 ns = 1.64 ms. Loss of 1.64 ms every 50 ms amounts to 1.64/50 = 3.3% of the total bandwidth.

Feb. 2011Computer Architecture, Memory System DesignSlide 11 DRAM Organization Fig A 256 Mb DRAM chip organized as a 32M  8 memory.

Feb. 2011Computer Architecture, Memory System DesignSlide 12 DRAM Packaging Fig Typical DRAM package housing a 4M  4 memory. 24-pin dual in-line package (DIP)

Feb. 2011Computer Architecture, Memory System DesignSlide Hitting the Memory Wall Fig Memory density and capacity have grown along with the CPU power and complexity, but memory speed has not kept pace.

Feb. 2011Computer Architecture, Memory System DesignSlide Pipelined and Interleaved Memory Fig Pipelined cache memory. Memory latency may involve other supporting operations besides the physical access itself Virtual-to-physical address translation (Chap 20) Tag comparison to determine cache hit/miss (Chap 18)

Feb. 2011Computer Architecture, Memory System DesignSlide 15 Memory Interleaving Fig Interleaved memory is more flexible than wide-access memory in that it can handle multiple independent accesses at once. Addresses 0, 4, 8, … Addresses 1, 5, 9, … Addresses 2, 6, 10, … Addresses 3, 7, 11, …

Feb. 2011Computer Architecture, Memory System DesignSlide Nonvolatile Memory Fig Flash memory organization.

Feb. 2011Computer Architecture, Memory System DesignSlide The Need for a Memory Hierarchy The widening speed gap between CPU and main memory Processor operations take of the order of 1 ns Memory access requires 10s or even 100s of ns Memory bandwidth limits the instruction execution rate Each instruction executed involves at least one memory access Hence, a few to 100s of MIPS is the best that can be achieved A fast buffer memory can help bridge the CPU-memory gap The fastest memories are expensive and thus not very large A second intermediate cache level is thus often used

Feb. 2011Computer Architecture, Memory System DesignSlide 18 Typical Levels in a Hierarchical Memory Fig Names and key characteristics of levels in a memory hierarchy.

Feb. 2011Computer Architecture, Memory System DesignSlide Cache Memory Organization Processor speed is improving at a faster rate than memory’s Processor-memory speed gap has been widening Cache is to main as desk drawer is to file cabinet Topics in This Chapter 18.1 The Need for a Cache 18.2 What Makes a Cache Work? 18.3 Direct-Mapped Cache 18.4 Set-Associative Cache

Feb. 2011Computer Architecture, Memory System DesignSlide The Need for a Cache Single-cycleMulticycle Pipelined 125 MHz CPI = MHz CPI  MHz CPI  1.1 All three of our MicroMIPS designs assumed 2-ns data and instruction memories; however, typical RAMs are times slower

Feb. 2011Computer Architecture, Memory System DesignSlide 21 Desktop, Drawer, and File Cabinet Analogy Fig Items on a desktop (register) or in a drawer (cache) are more readily accessible than those in a file cabinet (main memory). Once the “working set” is in the drawer, very few trips to the file cabinet are needed.

Feb. 2011Computer Architecture, Memory System DesignSlide 22 Cache, Hit/Miss Rate, and Effective Access Time One level of cache with hit rate h C eff = hC fast + (1 – h)(C slow + C fast ) = C fast + (1 – h)C slow CPU Cache (fast) memory Main (slow) memory Reg file Word Line Data is in the cache fraction h of the time (say, hit rate of 98%) Go to main 1 – h of the time (say, cache miss rate of 2%) Cache is transparent to user; transfers occur automatically

Feb. 2011Computer Architecture, Memory System DesignSlide 23 Multiple Cache Levels Fig Cache memories act as intermediaries between the superfast processor and the much slower main memory. Cleaner and easier to analyze

Feb. 2011Computer Architecture, Memory System DesignSlide 24 Performance of a Two-Level Cache System Example 18.1 A system with L1 and L2 caches has a CPI of 1.2 with no cache miss. There are 1.1 memory accesses on average per instruction. What is the effective CPI with cache misses factored in? LevelLocal hit rateMiss penalty L1 95 % 8 cycles L2 80 % 60 cycles 8 cycles 60 cycles 95% 4% 1% Solution C eff = C fast + (1 – h 1 )[C medium + (1 – h 2 )C slow ] Because C fast is included in the CPI of 1.2, we must account for the rest CPI = (1 – 0.95)[8 + (1 – 0.8)60] =  0.05  20 = 2.3

Feb. 2011Computer Architecture, Memory System DesignSlide What Makes a Cache Work? Fig Assuming no conflict in address mapping, the cache will hold a small program loop in its entirety, leading to fast execution. Temporal locality Spatial locality

Feb. 2011Computer Architecture, Memory System DesignSlide Direct-Mapped Cache Fig Direct-mapped cache holding 32 words within eight 4-word lines. Each line is associated with a tag and a valid bit.

Feb. 2011Computer Architecture, Memory System DesignSlide 27 Direct-Mapped Cache Behavior Fig Address trace: 1, 7, 6, 5, 32, 33, 1, 2,... 1: miss, line 3, 2, 1, 0 fetched 7: miss, line 7, 6, 5, 4 fetched 6: hit 5: hit 32: miss, line 35, 34, 33, 32 fetched (replaces 3, 2, 1, 0) 33: hit 1: miss, line 3, 2, 1, 0 fetched (replaces 35, 34, 33, 32) 2: hit... and so on

Feb. 2011Computer Architecture, Memory System DesignSlide Set-Associative Cache Fig Two-way set-associative cache holding 32 words of data within 4-word lines and 2-line sets.

Feb. 2011Computer Architecture, Memory System DesignSlide 29 Cache Memory Design Parameters Cache size (in bytes or words). A larger cache can hold more of the program’s useful data but is more costly and likely to be slower. Block or cache-line size (unit of data transfer between cache and main). With a larger cache line, more data is brought in cache with each miss. This can improve the hit rate but also may bring low-utility data in. Placement policy. Determining where an incoming cache line is stored. More flexible policies imply higher hardware cost and may or may not have performance benefits (due to more complex data location). Replacement policy. Determining which of several existing cache blocks (into which a new cache line can be mapped) should be overwritten. Typical policies: choosing a random or the least recently used block. Write policy. Determining if updates to cache words are immediately forwarded to main (write-through) or modified blocks are copied back to main if and when they must be replaced (write-back or copy-back).

Feb. 2011Computer Architecture, Memory System DesignSlide Mass Memory Concepts Today’s main memory is huge, but still inadequate for all needs Magnetic disks provide extended and back-up storage Optical disks & disk arrays are other mass storage options Topics in This Chapter 19.1 Disk Memory Basics 19.2 Organizing Data on Disk 19.5 Disk Arrays and RAID 19.6 Other Types of Mass Memory

Feb. 2011Computer Architecture, Memory System DesignSlide Disk Memory Basics Fig Disk memory elements and key terms.

Feb. 2011Computer Architecture, Memory System DesignSlide 32 Disk Drives Typically 2-8 cm Typically 2-8 cm

Feb. 2011Computer Architecture, Memory System DesignSlide 33 Access Time for a Disk The three components of disk access time. Disks that spin faster have a shorter average and worst-case access time.

Feb. 2011Computer Architecture, Memory System DesignSlide 34 Representative Magnetic Disks Table 19.1 Key attributes of three representative magnetic disks, from the highest capacity to the smallest physical size (ca. early 2003). [More detail (weight, dimensions, recording density, etc.) in textbook.] Manufacturer and Model Name Seagate Barracuda 180 Hitachi DK23DA IBM Microdrive Application domainServerLaptopPocket device Capacity180 GB40 GB1 GB Platters / Surfaces12 / 242 / 41 / 2 Cylinders Sectors per track, avg Buffer size16 MB2 MB1/8 MB Seek time, min,avg,max 1, 8, 17 ms3, 13, 25 ms1, 12, 19 ms Diameter 3.5  2.5  1.0  Rotation speed, rpm Typical power14.1 W2.3 W0.8 W

Feb. 2011Computer Architecture, Memory System DesignSlide Disk Caching Same idea as processor cache: bridge main-disk speed gap Read/write an entire track with each disk access: “Access one sector, get 100s free,” hit rate around 90% Disks listed in Table 19.1 have buffers from 1/8 to 16 MB Rotational latency eliminated; can start from any sector Need back-up power so as not to lose changes in disk cache (need it anyway for head retraction upon power loss)

Feb. 2011Computer Architecture, Memory System DesignSlide Disk Arrays and RAID The need for high-capacity, high-throughput secondary (disk) memory Processor speed RAM size Disk I/O rate Number of disks Disk capacity Number of disks 1 GIPS1 GB100 MB/s1100 GB1 1 TIPS1 TB100 GB/s TB100 1 PIPS1 PB100 TB/s1 Million100 PB EIPS1 EB100 PB/s1 Billion100 EB100 Million Amdahl’s rules of thumb for system balance 1 RAM byte for each IPS 100 disk bytes for each RAM byte 1 I/O bit per sec for each IPS

Feb. 2011Computer Architecture, Memory System DesignSlide 37 Redundant Array of Independent Disks (RAID) Fig RAID levels 0-6, with a simplified view of data organization. A  B  C  D  P = 0  B = A  C  D  P ABCD P

Feb. 2011Computer Architecture, Memory System DesignSlide 38 RAID Product Examples IBM ESS Model 750

Feb. 2011Computer Architecture, Memory System DesignSlide Other Types of Mass Memory Fig Magnetic and optical disk memory units. Flash drive Thumb drive Travel drive

Feb. 2011Computer Architecture, Memory System DesignSlide 40 Fig Simplified view of recording format and access mechanism for data on a CD-ROM or DVD-ROM. Optical Disks Spiral, rather than concentric, tracks

Feb. 2011Computer Architecture, Memory System DesignSlide 41 Automated Tape Libraries

Feb. 2011Computer Architecture, Memory System DesignSlide Virtual Memory and Paging Managing data transfers between main & mass is cumbersome Virtual memory automates this process Key to virtual memory’s success is the same as for cache Topics in This Chapter 20.1 The Need for Virtual Memory 20.2 Address Translation in Virtual Memory 20.4 Page Placement and Replacement 20.6 Improving Virtual Memory Performance

Feb. 2011Computer Architecture, Memory System DesignSlide The Need for Virtual Memory Fig Program segments in main memory and on disk.

Feb. 2011Computer Architecture, Memory System DesignSlide 44 Fig Data movement in a memory hierarchy. Memory Hierarchy: The Big Picture

Feb. 2011Computer Architecture, Memory System DesignSlide Address Translation in Virtual Memory Fig Virtual-to-physical address translation parameters.

Feb. 2011Computer Architecture, Memory System DesignSlide 46 Page Tables and Address Translation Fig The role of page table in the virtual-to-physical address translation process.

Feb. 2011Computer Architecture, Memory System DesignSlide Page Replacement Policies Least-recently used (LRU) policy Implemented by maintaining a stack Pages  ABAFBEA LRU stack MRU DABAFBEA BDABAFBE EBDDBAFB LRU CEEEDDAF

Feb. 2011Computer Architecture, Memory System DesignSlide Improving Virtual Memory Performance Table 20.1 Memory hierarchy parameters and their effects on performance Parameter variationPotential advantagesPossible disadvantages Larger main or cache size Fewer capacity missesLonger access time Larger pages or longer lines Fewer compulsory misses (prefetching effect) Greater miss penalty Greater associativity (for cache only) Fewer conflict missesLonger access time More sophisticated replacement policy Fewer conflict missesLonger decision time, more hardware Write-through policy (for cache only) No write-back time penalty, easier write-miss handling Wasted memory bandwidth, longer access time

Feb. 2011Computer Architecture, Memory System DesignSlide 49 Fig Data movement in a memory hierarchy. Summary of Memory Hierarchy Cache memory: provides illusion of very high speed Virtual memory: provides illusion of very large size Main memory: reasonable cost, but slow & small Locality makes the illusions work