處理器設計與實作 CPU LAB for Computer Organization 1. LAB 7: MIPS Set-Associative D- cache.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
July 2005Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches 2 P & H Chapter 5.2 (writes), 5.3, 5.5.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic
The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
Review: The Memory Hierarchy
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.17.
ECE Dept., University of Toronto
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture Objectives: 1)Define set associative cache and fully associative cache. 2)Compare and contrast the performance of set associative caches, direct.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.
Lecture 5 Cache Operation ECE 463/521 Fall 2002 Edward F. Gehringer Based on notes by Drs. Eric Rotenberg & Tom Conte of NCSU.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Lecture 20 Last lecture: Today’s lecture: Types of memory
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic
CS203 – Advanced Computer Architecture Cache. Memory Hierarchy Design Memory hierarchy design becomes more crucial with recent multi-core processors:
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
Lecture slides originally adapted from Prof. Valeriu Beiu (Washington State University, Spring 2005, EE 334)
CSCI206 - Computer Organization & Programming
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Consider a Direct Mapped Cache with 4 word blocks
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Lecture 08: Memory Hierarchy Cache Performance
Adapted from slides by Sally McKee Cornell University
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Lecture 22: Cache Hierarchies, Memory
Cache - Optimization.
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
10/18: Lecture Topics Using spatial locality
Presentation transcript:

處理器設計與實作 CPU LAB for Computer Organization 1

LAB 7: MIPS Set-Associative D- cache

實驗目的 1. 認識 data-cache 的功能與運作原理 2. 了解 cache 之 write policies 3. 了解 Set-Associative Cache 結構 4. 實作簡易 Set-Associative Cache 修改操作

CPU has the demand for writing and reading data memory, while it has no requirement for writing instruction memory. (Modifying instruction memory is prohibited) As a result, d-cache will be writed and read by CPU and i-cache will be only read. D-cache vs. I-cache

Read hits – this is what we want! Read misses – stall the CPU, fetch block from memory, deliver to cache, restart Write hits: – can replace data in cache and memory (write-through) – write the data only into the cache (write-back the cache later) Write misses: – read the entire block into the cache, then write the word (write allocate) – or just write around the cache Hits vs. Misses

Write hit – Write-through (WT) – Write-back (WB) Write miss –Write-allocate (or write allocation) –Write-around Write Policy

Write-hit policies – Write-through (also called store-through) Write to main memory whenever a write is performed to the cache. – Write-back (also called store-in or copy-back) Write to main memory when the modified data in cache is evicted. Write Policy

Write-ThroughWrite-Back Policy Data written to cache block also written to lower- level memory Write data only to the cache Copyback when replacing a dirty copy DebugEasyHard Do read misses produce writes? NoYes Do repeated writes make it to lower level? YesNo

Write-miss policies – Write-allocate (or write allocation) Read the missing block from lower level memory into cache, then work as write hit(WT or WB). –Write-around Write the data into next level memory. Write Policy

N direct mapped caches in parallel An index gets N blocks N-way set associative

Direct Mapped Cache

Set-Associative Cache

Associativity Associativity is a trade-off. Cache operations with more associativity takes more power, chip area, and potentially time. However, caches with more associativity suffer fewer misses, so that the CPU wastes less time reading from main memory.

2-Way Associative Example Main MemoryCache The sequence of memory access: 00, 20, 00, 1c, 00 Mem BlockHit/Miss Index

2-Way Associative Example Main MemoryCache The sequence of memory access: 00, 20, 00, 1c, 00 Mem BlockHit/Miss 00Miss Mem[00] Index

2-Way Associative Example Main MemoryCache The sequence of memory access: 00, 20, 00, 1c, 00 Mem BlockHit/Miss 00Miss 20Miss Mem[00] Index Mem[20]

2-Way Associative Example Main MemoryCache The sequence of memory access: 00, 20, 00, 1c, 00 Mem BlockHit/Miss 00Miss 20Miss 00Hit Mem[00] Index Mem[20]

2-Way Associative Example Main MemoryCache The sequence of memory access: 00, 20, 00, 1c, 00 Mem BlockHit/Miss 00Miss 20Miss 00Hit 1cMiss Mem[00] Mem[1c] Index Mem[20]

2-Way Associative Example Main MemoryCache The sequence of memory access: 00, 20, 00, 1c, 00 Mem BlockHit/Miss 00Miss 20Miss 00Hit 1cMiss 00Hit Mem[00] Mem[1c] Index Mem[20] Hit Rate = 40%

Set-Associative Cache

1.Compusory misses: The block must be brought into the cache on the first access to a block;also called cold start misses. 2.Capacity misses: Blocks are being discarded from cache because cache can’t contain all blocks Needed for program execution. 3.Conflict misses: Conflict misses occur when mutiple blocks are mapped to the same set, and it could not happen in case of fully set associative cache. Types of cache misses

Cache optimization Types of cache misses Design changeEffect on miss ratePossible negative performance effect increase cache sizereduce capacity missespossibly increase access time increase associativityreduce conflict missespossibly increase access time increase block sizereduce compulsory misses increase miss penalty

When a line must be evicted from a cache to make room for incoming data, the replacement policy determines which line is evicted. The general goal of the replacement policy is to minimize future cache misses by evicting a line that will not be referenced often in the future. Replacement policy

Least recently used (LRU) Random Replacement policy

Least recently used (LRU) The cache ranks each of the lines in a set according to how recently they have been accessed. Evicts the least-recently used line from a set when an eviction is necessary. Replacement policy

Random A randomly selected line from the appropriate set is evicted to make room for incoming data. Studies have shown that LRU replacement generally gives slightly higher hit rates than random replacement, but that the differences are very small for caches of reasonable size. Replacement policy

In most systems, caches are meant to be transparent. CPU must stalls while cache fetchs blocks from memory,and CPU leaves the stall state when cache finished fetching. Cache in CPU system

實作 ( 一 ). Direct-mapped Cache

實作 ( 一 )  請同學完成 Direct-mapped cache module (dcache_system.v) 中未完成之接線, 並用 ModelSim 模擬 data cache 行為是否正確,結果會在 下方 transcrpit 視窗中顯示 。  此 cache size 為 16KB , line size 為 16 Byte  此 cache 之 Write policy 為 Write through 與 Write allocate

 依需求把 Addr 訊號線拆成 index, line, tag_in  Addr_mem 為傳給 memory 的 address 請同學完成 tag 之後的其他欄位 實作 ( 一 )

 跑模擬看波型來除錯  按下左邊小圖案或工具列 Simulate -> Start Simulation

實作 ( 一 )  打開 work 資料夾,找到 testbench( 如左圖 )  記得把下面的 Enable optimization 打勾取消

 對 dcache1 按右鍵,選擇 Add to -> Wave -> All item in region 加入波型  若想要看其他波型,也是對其他 module 做一樣步驟 實作 ( 一 )

 Parameter 之使用方法  Verilog 中,宣告一 module 時在前面加上 #( 數字 ) 便能依不同 module 需要 指定不同的數字來產生 module 。  下面兩圖為一使用範例,左圖中的 memc 加上 #(32) 便會在產生 module 時 ( 右圖 ) ,以 Size = 32 取代 Size = 1 ,如此便能靈活改變 port 的大小。 說明 :Parameter in Verilog

實作 ( 二 ). 2-way Set Associative Cache

實作 ( 二 )  修改實作一完成之 Direct-mapped cache ,改為 2-way set associative cache  Cache size 一樣為 16KB , line size 為 16 Byte (index 需為多少以符合規格 ?)

實作 ( 二 )  2 way set associative 之 DataOut , tag_out 都變為 2 個,請同學完成 ???? 的部分 以決定 CPU 看到的 DataOut 與 tag_out  numout 為決定寫入哪個 block 之訊號。 block 1 滿了就寫入 block 2 ,若都有資料則 輪流寫入新資料取代舊資料

實作 ( 二 )  模擬驗證的方式同實作一,但除錯時建議新增 2 個 dcache module 訊號以方便 除錯  輸出結果對錯一樣會顯示在 ModelSim 下方的 transcript 中  請同學們比較 hit rate 與實作一有何不同

挑戰題. Write Back Cache

挑戰題  請同學修改 writeback 資料夾的 control unit(dcache_ctrl.v) ,使 cache 之 Write policy 從 Write through 改為 Write back (write miss policy 都為 write around)  各 state 與各控制訊號意義能在 dcache_ctrl.v 裡的註解裡看到,請同學依後面 所提供的 Write-back cache state, 在 output logic 區塊中正確的位置加上 (cs==WB), 調整 write back state 狀態該有的控制訊號輸出  模擬後 transcript 會顯示結果正不正確, 波行圖中 cs 訊號線要出現 c 與 a(16 進位 ) 才代表有出現 write back

挑戰題 – Write through 狀態圖

挑戰題 – Write back 狀態圖