Download presentation
Published byCamron Cobb Modified over 9 years ago
1
CSL718 : Memory Hierarchy Cache Memories 6th Feb, 2006
Anshul Kumar, CSE IITD
2
Memory technologies Semiconductor Magnetic Optical Random + sequential
Registers SRAM Random Access DRAM FLASH Magnetic FDD HDD Optical Random + sequential CD DVD Anshul Kumar, CSE IITD
3
Hierarchical structure
p e e d C P U S i z e C o s t / b i t F a s t e s t M e m o r y S m a l l e s t H i g h e s t M e m o r y S l o w e s t M e m o r y B i g g e s t L o w e s t Anshul Kumar, CSE IITD
4
Monitor: 17" SAMSUNG 793S MONITOR
System Configuration: e-bay price: Rs. 37,500 Processor: Intel P4 3.2GHz (800FSB) 1024k CPU with Hyper Threading CPU Fan: P4 Heavy Duty Cooling Fan With Heat Sink Motherboard: D915G express chipset 800FSB (up to 3.6GHz support) Memory: 1GB DDR400 PC3200 DUAL CHANNEL RAM Video Card: GeForce FX MB 16x PCI-e video with TV out Hard drive: 160GB 7200RPM UDMA-150 SATA CD drive: 52x32x52x16x CDRW + DVD ROM drive Floppy drive: Sony 1.44MB 3.5" drive Sound: AC 97 6 ch 5.1 Full duplex digital sound, stereo speakers Network: 10/100 RJ45 onboard network (Ethernet, cable or DSL) Modem: 56k v92 modem Ports: Six USB 2.0 ports,1 serial, 1 parallel, 1 microphone jack Case: Black i BOX 522 Mid Tower 400w power supply (front USB) Keyboard: Black PS2 Windows Keyboard Mouse: Black PS2 Scroll Mouse Monitor: 17" SAMSUNG 793S MONITOR Anshul Kumar, CSE IITD
5
Main Memory for Pentium IV DDR (double data rate) DRAM
Size Interface Price 128 MB PC-333 Rs 256 MB Rs. 1,299 1 GB Rs. 4,999 PC-400 Rs, 5,299 Anshul Kumar, CSE IITD
6
Disk drives Seagate Baracuda 7200 RPM
Capacity Price 40 GB Rs. 2,999 80 GB Rs. 3,499 120 GB Rs. 4,499 160 GB Rs. 4,799 200 GB Rs. 5,500 250 GB Rs. 6,999 300 GB Rs. 9,900 400 GB Rs. 14,950 Anshul Kumar, CSE IITD
7
Data transfer between levels
hit P r o c e s s o r access miss D a t a t r a n s f e r unit of transfer = block Anshul Kumar, CSE IITD
8
Principle of locality Temporal Locality Spatial Locality
references repeated in time Spatial Locality references repeated in space Special case: Sequential Locality Anshul Kumar, CSE IITD
9
Memory Hierarchy Analysis
Memory Mi: M1, M2, …. , Mn Capacity si: s1< s2< …. < sn Unit cost ci: c1> c2> …. > cn Total cost Ctotal: i ci . si Access time ti : 1+ 2+ …. +i (i at level i) 1< 2< …. < n Hit ratios hi(si): h1< h2< …. < hn = 1 Effective time Teff: i mi . hi . ti = i mi . i Miss before level i, mi: (1-h1)(1-h2) …. (1-hi-1) Anshul Kumar, CSE IITD
10
Cache Types Instruction | Data | Unified | Split Split vs. Unified:
Split allows specializing each part Unified allows best use of the capacity On-chip | Off-chip on-chip : fast but small off-chip : large but slow Single level | Multi level Anshul Kumar, CSE IITD
11
Cache Policies Placement what gets placed where?
Read when? from where? Load order of bytes/words? Fetch when to fetch new block? Replacement which one? Write when? to where? Anshul Kumar, CSE IITD
12
Block placement strategies
Direct mapped Set associative Fully associative Block # 1 2 3 4 5 6 7 Set # 1 2 3 D a t a D a t a D a t a 1 1 1 T a g T a g T a g 2 2 2 S e a r c h S e a r c h S e a r c h Anshul Kumar, CSE IITD
13
Organization/placement policy
Set 1 Cache Set S Set Sector 1 Sector 2 Sector SE LRU Sector Tag Block 1 Block 2 Block B Block V D S AU 1 AU 2 AU A Anshul Kumar, CSE IITD
14
Sector Name Set Index Block Displacement
Addressing Cache Sector Name Set Index Block Displacement Address Selects set Compared to Tags Selects Block Selects AU Early select: access data after tag matching Late select: access data while tag matching Anshul Kumar, CSE IITD
15
Cache organization example
Sector Sector Block Block Block Block 1 Tag V D AU AU V D AU AU Tag V D AU AU V D AU AU 2 3 4 Sets 5 6 7 8 Anshul Kumar, CSE IITD
16
Cache access mechanism
Address 18 Hit 12 2 Tag Data byte offset index index v tag data 1 ... 4095 18 32 = Anshul Kumar, CSE IITD
17
Cache with 4 word blocks Mux Address 31 0 18 Hit 10 2 2 Data Tag
18 Hit 10 2 2 Data Tag byte offset index block offset index v tag data 1 ... 1023 18 32 32 32 32 = Mux Anshul Kumar, CSE IITD
18
4-way set associative cache
tag 20 8 2 2 byte offset index block offset v tag data v tag data v tag data v tag data ... 255 20 128 20 128 20 128 20 128 = = = = Mux Mux Mux Mux 32 32 32 32 Hit Mux Anshul Kumar, CSE IITD Data
19
Read policies Sequential or concurrent With or without forwarding
initiate memory access only after detecting a miss initiate memory access along with cache access in anticipation of a miss With or without forwarding give data to CPU after filling the missing block in cache forward data to CPU as it gets filled in cache Anshul Kumar, CSE IITD
20
Read Policies Sequential Simple: 1 1 1 Cache Teff=(1-pm).1 +
pm . (T+2) T Memory Concurrent Simple: 1 1 1 Cache Teff=(1-pm).1 + pm . (T+1) T Memory Sequential Forward: 1 1 Cache Teff=(1-pm).1 + pm . (T+1) T Memory Concurrent Forward: 1 1 Cache Teff=(1-pm).1 + pm . (T) T Memory Anshul Kumar, CSE IITD
21
Load policies 4 AU Block 2 1 3 Cache miss on AU 1 Block Load
1 2 3 Cache miss on AU 1 Block Load Load Forward Fetch Bypass (wrap around load) Anshul Kumar, CSE IITD
22
Fetch Policies Fetch on miss (demand fetching) Software prefetching
Hardware Prefetching Anshul Kumar, CSE IITD
23
Fetch Policies Demand fetching Hardware prefetching
fetch only when required (miss) Hardware prefetching automatically prefetch next block Software prefetching programmer decides to prefetch questions: how much ahead (prefetch distance) how often Anshul Kumar, CSE IITD
24
Software Control of Cache
Software visible cache mode selection (WT, WB etc) block flush block invalidate block prefetch Anshul Kumar, CSE IITD
25
Replacement Policies Least Recently Used (LRU)
Least Frequently Used (LFU) First In First Out (FIFO) Random Anshul Kumar, CSE IITD
26
Write Policies Write Hit Write Miss
Write Back Write Through Write Miss Write Through (with or without Write Allocate) Buffers are used in all cases to hide latencies Anshul Kumar, CSE IITD
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.