1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

A Case for Refresh Pausing in DRAM Memory Systems
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
1 Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, intro to multi-thread programming models.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Lecture 16: Virtual Memory Today: DRAM innovations, virtual memory (Sections )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM.
1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.
Survey of Existing Memory Devices Renee Gayle M. Chua.
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 5: Refresh, Chipkill Topics: refresh basics and innovations, error correction.
1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors.
1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.
1 Lecture: Memory Technology Innovations Topics: state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Multiprocessor.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
15-740/ Computer Architecture Lecture 25: Main Memory
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
COMP541 Memories II: DRAMs
SRAM Memory External Interface
COMP541 Memories II: DRAMs
Morgan Kaufmann Publishers Memory & Cache
Lecture 15: DRAM Main Memory Systems
Lecture: Memory, Multiprocessors
Staged-Reads : Mitigating the Impact of DRAM Writes on DRAM Reads
Lecture: DRAM Main Memory
Lecture 23: Cache, Memory, Virtual Memory
Lecture 22: Cache Hierarchies, Memory
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: Memory Technology Innovations
Lecture 20: OOO, Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 20: OOO, Memory Hierarchy
Lecture 15: Memory Design
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
Chapter 4: MEMORY.
Lecture 22: Cache Hierarchies, Memory
15-740/ Computer Architecture Lecture 19: Main Memory
DRAM Hwansoo Han.
Presentation transcript:

1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Midterm scores: 90+ is top 20, 85+ is top 40, 79+ is top 60, 69+ is top 80, Common errors: SWP, power equations

2 Row Buffers Each bank has a single row buffer Row buffers act as a cache within DRAM  Row buffer hit: ~20 ns access time (must only move data from row buffer to pins)  Empty row buffer access: ~40 ns (must first read arrays, then move data from row buffer to pins)  Row buffer conflict: ~60 ns (must first precharge the bitlines, then read new row, then move data to pins) In addition, must wait in the queue (tens of nano-seconds) and incur address/cmd/data transfer delays (~10 ns)

3 Open/Closed Page Policies If an access stream has locality, a row buffer is kept open  Row buffer hits are cheap (open-page policy)  Row buffer miss is a bank conflict and expensive because precharge is on the critical path If an access stream has little locality, bitlines are precharged immediately after access (close-page policy)  Nearly every access is a row buffer miss  The precharge is usually not on the critical path Modern memory controller policies lie somewhere between these two extremes (usually proprietary)

4 Problem 3 For the following access stream, estimate the finish times for each access with the following scheduling policies: Req Time of arrival Open Closed Oracular X 0 ns Y 10 ns X ns X ns Y ns X ns Note that X, X+1, X+2, X+3 map to the same row and Y, Y+1 map to a different row in the same bank. Ignore bus and queuing latencies. The bank is precharged at the start.

5 Problem 3 For the following access stream, estimate the finish times for each access with the following scheduling policies: Req Time of arrival Open Closed Oracular X 0 ns Y 10 ns X ns X ns Y ns X ns Note that X, X+1, X+2, X+3 map to the same row and Y, Y+1 map to a different row in the same bank. Ignore bus and queuing latencies. The bank is precharged at the start.

6 Problem 4 For the following access stream, estimate the finish times for each access with the following scheduling policies: Req Time of arrival Open Closed Oracular X 10 ns X+1 15 ns Y 100 ns Y ns X ns Y ns Note that X, X+1, X+2, X+3 map to the same row and Y, Y+1 map to a different row in the same bank. Ignore bus and queuing latencies. The bank is precharged at the start.

7 Problem 4 For the following access stream, estimate the finish times for each access with the following scheduling policies: Req Time of arrival Open Closed Oracular X 10 ns X+1 15 ns Y 100 ns Y ns X ns Y ns Note that X, X+1, X+2, X+3 map to the same row and Y, Y+1 map to a different row in the same bank. Ignore bus and queuing latencies. The bank is precharged at the start.

8 Address Mapping Policies Consecutive cache lines can be placed in the same row to boost row buffer hit rates Consecutive cache lines can be placed in different ranks to boost parallelism Example address mapping policies: row:rank:bank:channel:column:blkoffset row:column:rank:bank:channel:blkoffset

9 Reads and Writes A single bus is used for reads and writes The bus direction must be reversed when switching between reads and writes; this takes time and leads to bus idling Hence, writes are performed in bursts; a write buffer stores pending writes until a high water mark is reached Writes are drained until a low water mark is reached

10 Scheduling Policies FCFS: Issue the first read or write in the queue that is ready for issue First Ready - FCFS: First issue row buffer hits if you can Close page -- early precharge Stall Time Fair: First issue row buffer hits, unless other threads are being neglected

11 Refresh Every DRAM cell must be refreshed within a 64 ms window A row read/write automatically refreshes the row Every refresh command performs refresh on a number of rows, the memory system is unavailable during that time A refresh command is issued by the memory controller once every 7.8us on average

12 Problem 5 Consider a single 4 GB memory rank that has 8 banks. Each row in a bank has a capacity of 8KB. On average, it takes 40ns to refresh one row. Assume that all 8 banks can be refreshed in parallel. For what fraction of time will this rank be unavailable? How many rows are refreshed with every refresh command?

13 Problem 5 Consider a single 4 GB memory rank that has 8 banks. Each row in a bank has a capacity of 8KB. On average, it takes 40ns to refresh one row. Assume that all 8 banks can be refreshed in parallel. For what fraction of time will this rank be unavailable? How many rows are refreshed with every refresh command? The memory has 4GB/8KB = 512K rows There are 8K refresh operations in one 64ms interval. Each refresh operation must handle 512K/8K = 64 rows Each bank must handle 8 rows One refresh operation is issued every 7.8us and the memory is unavailable for 320ns, i.e., for 4% of time.

14 Error Correction For every 64-bit word, can add an 8-bit code that can detect two errors and correct one error; referred to as SECDED – single error correct double error detect A rank is now made up of 9 x8 chips, instead of 8 x8 chips Stronger forms of error protection exist: a system is chipkill correct if it can handle an entire DRAM chip failure

15 Modern Memory System PROC.. 4 DDR3 channels 64-bit data channels 800 MHz channels 1-2 DIMMs/channel 1-4 ranks/channel

16 Cutting-Edge Systems PROC SMB.. The link into the processor is narrow and high frequency The Scalable Memory Buffer chip is a “router” that connects to multiple DDR3 channels (wide and slow) Boosts processor pin bandwidth and memory capacity More expensive, high power

17 Title Bullet