Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Virtual Memory. The Limits of Physical Addressing CPU Memory A0-A31 D0-D31 “Physical addresses” of memory locations Data All programs share one address.
Appendix C: Review of Memory Hierarchy David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley
CPE 731 Advance Computer Architecture Memory Hierarchy Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
5/18/2015CPE 731, 4-Principles 1 Define and quantify dependability (1/3) How decide when a system is operating properly? Infrastructure providers now offer.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
ECE 232 L26.Cache.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 26 Caches.
Now, Review of Memory Hierarchy
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
EEM 486 EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
Storage HierarchyCS510 Computer ArchitectureLecture Lecture 12 Storage Hierarchy.
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSIE30300 Computer Architecture Unit 9: Improving Cache Performance Hsin-Chou Chi [Adapted from material by and
CS.305 Computer Architecture Improving Cache Performance Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Improving Memory Access 2/3 The Cache and Virtual Memory
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
CS 5513 Computer Architecture Lecture 4 – Memory Hierarchy Review.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
Lecture slides originally adapted from Prof. Valeriu Beiu (Washington State University, Spring 2005, EE 334)
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
CS161 – Design and Architecture of Computer
Since 1980, CPU has outpaced DRAM ...
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Morgan Kaufmann Publishers
Lecture 21: Memory Hierarchy
Rose Liu Electrical Engineering and Computer Sciences
Lecture 23: Cache, Memory, Virtual Memory
5 Basic Cache Optimizations
Chapter 5 Memory CSE 820.
Lecture 08: Memory Hierarchy Cache Performance
Lecture 22: Cache Hierarchies, Memory
CPE 631 Lecture 05: Cache Design
CSC3050 – Computer Architecture
Cache - Optimization.
Cache Memory Rabi Mahapatra
CPE 631 Lecture 04: Review of the ABC of Caches
Principle of Locality: Memory Hierarchies
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department Computer System Architecture ESGD2204 Saturday, 16 th April 2010 Chapter 7 Lecture 13

Chapter 7 Memory Level

4 Questions for Memory Hierarchy Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

Q1: Where can a block be placed in the upper level? Memory block 12 placed in an 8-block cache: –Fully associative, direct mapped, 2-way set associative –S.A. Mapping = Block Number Modulo (Number of Sets) – (Allowed cache blocks for block 12 shown in blue.) Cache Memory Full Mapped Direct Mapped (12 mod 8) = 4 2-Way Set Assoc (12 mod 4) = 0

Q2: How find block if in upper level = cache? Bits = 18b: tag 8b index: 256 entries/cache (4b: 16 wds/block 2b: 4 Byte/wd) or ( 6b: 64 Bytes/block 6 offset bits) Bits: (One-way) Direct Mapped Data Capacity: 16KB Cache = 256 x 512 / 8 Index => cache set Location of all possible blocks Tag for each block: No need to check index, offset bits Increasing associativity: Shrinks index & expands tag size Bit Fields in Memory Address Used to Access “Cache” Word ______________________________________________________________ Virtual Memory “Cache Block” Offset Bits In Page Block (a.k.a. Page ) Address IndexTag 18

Q3: Which block to replace after a miss? (After start up, cache is nearly always full) Easy if Direct Mapped (only 1 block “1 way” per index) If Set Associative or Fully Associative, must choose: –Random (“Ran”) Easy to implement, but not best, if only 2-way: 1bit/way –LRU (Least Recently Used) LRU is best, but hard to implement if > 8- way Also other LRU approximations better than Random Miss Rates for 3 Cache Sizes & Associativities Associativity 2-way 4-way 8-way DataSize LRU Ran LRU Ran LRU Ran 16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0% 64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5% 256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12% Random picks => same low miss rate as LRU for large caches

Q4: Write policy: What happens on a write? Write-ThroughWrite-Back Policy Data written to cache block is also written to next lower-level memory Write new data only to the cache Update lower level just before a written block leaves cache, i.e., erasing true value DebuggingEasierHarder Can read misses force writes? No Yes (used to slow some reads; now write-buffer) Do repeated writes touch lower level? Yes, memory busierNo Additional option -- let writes to an un-cached address allocate a new cache line (“write-allocate”), else just Write- Through.

Write Buffers for Write-Through Caches Q. Why a write buffer ? Processor Cache Write Buffer Lower Level Memory Write buffer holds (addresses&) data awaiting write-through to lower levels A. So CPU not stall for writes Q. Why a buffer, why not just one register ? A. Bursts of writes are common. Q. Are Read After Write (RAW) hazards an issue for write buffer? A. Yes! Drain buffer before next read or check buffer addresses before read- miss.

5 Basic Cache Optimizations Reducing Miss Rate 1.Larger Block size (reduce Compulsory, “cold”, misses) 2.Larger Cache size (reduce Capacity misses) 3.Higher Associativity (reduce Conflict misses) (… and multiprocessors have cache Coherence misses) (4 Cs) Reducing Miss Penalty 4.Multilevel Caches {total miss rate = π (local miss rate k ), where π means product of all items k, for k = 1 to max. } Reducing Hit Time (minimal cache latency) 5.Giving Reads Priority over Writes, since CPU waiting Read completes before earlier writes in write buffer

Performance(X) Execution_time(Y) N == Performance(Y) Execution_time(X) Definition: Performance Performance is in units of things-done per second –bigger is better If we are primarily concerned with response time performance(x) = 1 execution_time(x) " X is N times faster than Y" means The Speedup = N The BIG Time “mushroom”: the little time

Performance: What to measure Usually rely on benchmarks vs. real workloads To increase predictability, collections of benchmark applications, called benchmark suites, are popular SPECCPU: popular desktop benchmark suite –CPU only, split between integer and floating point programs –SPECint2000 had 12 integer, SPECfp2000 had 14 integer codes –SPEC CPU2006 has 12 integer benchmarks (CINT2006) and 17 floating-point benchmarks (CFP2006) –SPECSFS (NFS file server) and SPECWeb (WebServer) have been added as server benchmarks

Performance: What to measure Transaction Processing Council measures server performance and cost-performance for databases –TPC-C Complex query for Online Transaction Processing –TPC-H models ad hoc decision support –TPC-W a transactional web benchmark –TPC-App application server and web services benchmark

Define and quantify dependability How to decide when a system is operating properly? Infrastructure providers now offer Service Level Agreements (SLA) which are guarantees how dependable their networking or power service will be Systems alternate between two states of service: 1.Service accomplishment (working), where the service is delivered as specified in SLA 2.Service interruption (not working), where the delivered service is different from the SLA Failure = transition from state 1 (working) to state 2 Restoration = transition from state 2 (not) to state 1 Fault Error Failure

Define and quantity dependability Module reliability = measure of continuous service accomplishment (or time to failure). 1.Mean Time To Failure (MTTF) measures Reliability 2.Failures In Time (FIT) = 1/MTTF, the failure rate Usually reported as failures per billion hours of operation Mean Time To Repair (MTTR) measures Service Interruption –Mean Time Between Failures (MTBF) = MTTF+MTTR Module availability measures service as alternate between the two states of accomplishment and interruption (number between 0 and 1, e.g. 0.9) Module availability = MTTF / ( MTTF + MTTR)

Example calculating reliability If modules have exponentially distributed lifetimes (the age of a module does not affect its probability of failure), the overall failure rate (FIT) is the sum of failure rates of the modules Calculate FIT (rate) and MTTF (1/rate) for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF): { x 10 9 }

The Cache Design Space Several interacting dimensions –cache size –block size –associativity –replacement policy –write-through vs write-back –write allocation The optimal choice is a compromise –depends on access characteristics »workload »use (I-cache, D-cache, TLB) –depends on technology / cost Simplicity often wins Associativity Cache Size Block Size Bad Good LessMore Factor AFactor B

The Cache Design The Principle of Locality: –Program access a relatively small portion of the address space at any instant of time. »Temporal Locality: Locality in Time »Spatial Locality: Locality in Space Three Major Uniprocessor Categories of Cache Misses: –Compulsory Misses: sad facts of life. Example: cold start misses. –Capacity Misses: increase cache size –Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! Write Policy: Write Through vs. Write Back Today CPU time is a function of (ops, cache misses) vs. just f(ops): Increasing performance affects Compilers, Data structures, and Algorithms