Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee, Rajeev Balasubramonian,
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
CS Lecture 10 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers N.P. Jouppi Proceedings.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Lecture 16: Virtual Memory Today: DRAM innovations, virtual memory (Sections )
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.
1 Efficient Data Access in Future Memory Hierarchies Rajeev Balasubramonian School of Computing Research Buffet, Fall 2010.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Main Memory CS448.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
CS203 – Advanced Computer Architecture Main Memory Slides adapted from Onur Mutlu (CMU)
15-740/ Computer Architecture Lecture 25: Main Memory
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Reducing Memory Interference in Multicore Systems
CSE 502: Computer Architecture
Reducing Hit Time Small and simple caches Way prediction Trace caches
Lecture 23: Cache, Memory, Security
Lecture 15: DRAM Main Memory Systems
Lecture: Memory, Multiprocessors
Lecture 21: Memory Hierarchy
Lecture 12: Cache Innovations
Lecture: DRAM Main Memory
Lecture 23: Cache, Memory, Virtual Memory
Lecture 22: Cache Hierarchies, Memory
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: Memory Technology Innovations
Lecture 6: Reliability, PCM
Lecture 24: Memory, VM, Multiproc
Lecture 20: OOO, Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 15: Memory Design
Lecture 22: Cache Hierarchies, Memory
15-740/ Computer Architecture Lecture 19: Main Memory
DRAM Hwansoo Han.
Presentation transcript:

Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations

DRAM Main Memory Main memory is stored in DRAM cells that have much higher storage density DRAM cells lose their state over time – must be refreshed periodically, hence the name Dynamic DRAM access suffers from long access time and high energy overhead Since the pins on a processor chip are expected to not increase much, we will hit a memory bandwidth wall

On-chip Memory Controller DRAM Organization … Memory bus or channel Rank DRAM chip or device Bank Array 1/8th of the row buffer One word of data output DIMM On-chip Memory Controller 3

DRAM Array Access 16Mb DRAM array = 4096 x 4096 array of bits 12 row address bits arrive first Row Access Strobe (RAS) 4096 bits are read out Eight bits returned to CPU, one per cycle 12 column address bits arrive next Column decoder Column Access Strobe (CAS) Row Buffer

Salient Points I DIMM, rank, bank, array  form a hierarchy in the storage organization Because of electrical constraints, only a few DIMMs can be attached to a bus Ranks help increase the capacity on a DIMM Multiple DRAM chips are used for every access to improve data transfer bandwidth Multiple banks are provided so we can be simultaneously working on different requests

Salient Points II To maximize density, arrays within a bank are made large  rows are wide  row buffers are wide (8KB read for a 64B request) Each array provides a single bit to the output pin in a cycle (for high density and because there are few pins) DRAM chips are described as xN, where N refers to the number of output pins; one rank may be composed of eight x8 DRAM chips (the data bus is 64 bits) The memory controller schedules memory accesses to maximize row buffer hit rates and bank/rank parallelism

Salient Points III Banks and ranks offer memory parallelism Row buffers act as a cache within DRAM Row buffer hit: ~20 ns access time (must only move data from row buffer to pins) Empty row buffer access: ~40 ns (must first read arrays, then move data from row buffer to pins) Row buffer conflict: ~60 ns (must first writeback the existing row, then read new row, then move data to pins) In addition, must wait in the queue (tens of nano-seconds) and incur address/cmd/data transfer delays (~10 ns)

Technology Trends Improvements in technology (smaller devices)  DRAM capacities double every two years, but latency does not change much Power wall: 25-40% of datacenter power can be attributed to the DRAM system Will soon hit a density wall; may have to be replaced by other technologies (phase change memory, STT-RAM) Interconnects may have to be photonic to overcome the bandwidth limitation imposed by pins on the chip

Latency and Power Wall Latency and power can be both improved by employing smaller arrays; incurs a penalty in density and cost Latency and power can be both improved by increasing the row buffer hit rate; requires intelligent mapping of data to rows, clever scheduling of requests, etc. Power can be reduced by minimizing overfetch – either read fewer chips or read parts of a row; incur penalties in area or bandwidth

Overfetch Overfetch caused by multiple factors: Each array is large (fewer peripherals  more density) Involving more chips per access  more data transfer pin bandwidth More overfetch  more prefetch; helps apps with locality Involving more chips per access  less data loss when a chip fails  lower overhead for reliability

Re-Designing Arrays Udipi et al., ISCA’10

Selective Bitline Activation Additional logic per array so that only relevant bitlines are read out Essentially results in finer-grain partitioning of the DRAM arrays Two papers in 2010: Udipi et al., ISCA’10, Cooper-Balis and Jacob, IEEE Micro

Rank Subsetting Instead of using all chips in a rank to read out 64-bit words every cycle, form smaller parallel ranks Increases data transfer time; reduces the size of the row buffer But, lower energy per row read and compatible with modern DRAM chips Increases the number of banks and hence promotes parallelism (reduces queuing delays)

Title Bullet