Cosc 3P92 Week 9 Lecture slides

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Chapter 6 Computer Architecture
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Characteristics of Computer Memory
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Characteristics of Computer Memory
Overview Booth’s Algorithm revisited Computer Internal Memory Cache memory.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Computing Systems Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.
Memory Systems Architecture and Hierarchical Memory Systems
CS1104: Computer Organisation School of Computing National University of Singapore.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture Lecture 3 Cache Memory. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Computer system & Architecture
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Memory Hierarchy. Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape.
Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Chapter 8: System Memory Dr Mohamed Menacer Taibah University
Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
CS 704 Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Cache Memory.
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Andy Wang Operating Systems COP 4610 / CGS 5765
Andy Wang Operating Systems COP 4610 / CGS 5765
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter 6 Memory System Design
Chap. 12 Memory Organization
Cache Memory.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Jazan University, Jazan KSA
Memory & Cache.
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Overview Problem Solution CPU vs Memory performance imbalance
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Cosc 3P92 Week 9 Lecture slides The human brain starts working the moment you are born and never stops until you stand up to speak in public. George Jessel

Memory Organization In a typical computer system, the storage system is organized according to the following hierarchy: slow access (1-5 s.) and large capacity (almost unlimited) Archival Storage (magnetic tape or photographic) Moving head disk (magnetic or optical) High speed drum decreasing cost/bit Charge Coupled Device decreasing access time Main memory Cache Internal fast access (1-20 ns.) and small capacity (1-4K byte)

Memory speed Access time (Ta) Access rate (Ra) = 1/Ta (bits/second) the average time taken to read a unit of information e.g., 100 ns (100 x 10**-9 s) Access rate (Ra) = 1/Ta (bits/second) e.g., 1/100ns = 10 Mb/s Cycle time (Tc) the average time lapse between two successive read operations e.g., 500 ns (500 x 10**-9 s) Bandwidth or transfer rate (Rc) = 1/Tc (bits/second) e.g., 1/500ns = 2 Mb/s

Classes of Memory RAM (“normal memory”) Direct-access storage: HD, CD ROM, DVD Sequential access storage tapes: DAT Associative (content-addressable) memory: searches for data via bit patterns CAM (Content Addressable Memory) Includes comparison logic with each bit of storage. A data value is broadcast to all words of storage and compared with the values there. Words which match are flagged. Subsequent operations can then work on flagged words. (computing-dictionary.thefreedictionary.com) ROM

Categories of RAM and ROM primary memory Mask PROM EPROM, ROM EAROM RAM ROM magnetic semiconductor core static dynamic Bipolar MOS Mask PROM

Main Memory Design 1K x 4 RAM chip 10 4 A9-A0 WE CS D3-D0 CS WE MODE Status of the Power Bi-directional Datelines D3-D0 H X not selected High impedance Standby L L Write Acts as input bus Active L H Read Acts as output bus Active

Main Memory Design Q. How do we build a 4K x 4 RAM using four 1K x 4 RAM chips? Chip A11 A10 A9 A8 A7 . . . A0 Range 0 0 0 x x x . . . x 0000 to 1023 1 0 1 x x x . . . x 1024 to 2047 2 1 0 x x x . . . x 2048 to 3071 3 1 1 x x x . . . x 3072 to 4096

Main Memory Design Q. How do we build a 256KB RAM system with an 16-bit address bus and four 64KB RAM chips? Memory band-switching Processor log2 n 1-of-n Decoder Addr bus Memory bank Enable 1 Enable 2 Enable n 1 2 n (On bus in parallel )

Main memory design Memory address extension Data bus 4-bit Base base Processor Data bus base address 4-bit Base 16-bit Address bus Offset 20 bit Physical to memory

Cache Memory cache hit: Cache: fast-access memory buffer CPU c a h e Main memory external storage C Cache: fast-access memory buffer locality principle: programs usually use limited memory areas, in contrast to totally random access spatial: location, address temporal: time accessed if commonly used memory can be buffered in high-speed cache, overall performance enhanced cache takes form of small amount of store, with hardware support for maintenance and lookup each cache cell saves a cache line - block of main memory (4-64 words) cache hit: requested memory resides in cache

Cache cache miss: unified cache: split cache: parallel access: requested memory not in cache, and must be fetched from main memory and put into cache unified cache: instns, data share same cache split cache: separate instn, data caches parallel access: double the bandwidth level 2 cache: between instn/data cache and main memory Cache maintenance algorithms similar in spirit to virtual memory ideas at operating system level; main difference is that cache is hardware-supported, whereas v.m. is software implemented

Measuring cache performance c - cache access time m - main memory access time hr - hit ratio ( 0 <= hr <= 1) : # cache hits / total memory requests mr - miss ratio (1-hr) mean access time = c + (1-hr)m if hr --> 1 then m.a.t. = c if hr --> 0 then m.a.t. = c + m

Cache Example example: let c = 160 ns m = 960 ns h = .90 (common) mean = 160 + (1-.90)960 = 256 ns efficiency = c / mean = 160/256 = 62.5%

Direct mapping 1 i M-1 N-1 Cache Main memory i mod N

Direct Mapped Cache

Direct mapping use a hash function to find cache location normally, modulo some bit field of address, then just use low end field cache fields: valid bit tag - block # being held value - data block scheme: memory request: compute cache slot (low n bits) check block (tag) field hit: return value miss: fetch block from memory, give to CPU, and put into that computed slot (replace existing item if there) can occasionally produce thrashing eg. addresses that are multiple of cache size (64K) will reside at same entry split instn/data cache helps avoid thrashing

Set associative mapping 1 i M-1 Set 0 Set 1 Set N/S - 1 Cache Main memory Set i mod (N/S) S blocks per set

Set associative mapping

Set associative mapping [4.39] use same hash function as direct mapping, except that each cache slot holds multiple data blocks usually max. 4 blocks (“4-way”) searching blocks in a slot done associatively: simultaneous pattern matching more flexible than direct: multiple blocks in set use smaller tag than associative, therefore cheaper to implement associative matching commonly used in larger systems (VAX 11-780) which line should be replaced when slot full? eg. LRU (least recently used)

Writing back to the memory only write to memory if cache data modified. write back (write-deferred): (i) use a modified bit. When swapping a cache slot or ending job, write slot if its modified bit is set write through: (ii) whenever modifying data, always write it back to main memory have to do this if memory being shared in a DMA or multiprocessing system

Example: direct mapping 4 byte blocks 1 byte words 8 slots in cache

Example (cont)

The end