1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
A Case for Refresh Pausing in DRAM Memory Systems
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee, Rajeev Balasubramonian,
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Lecture 16: Virtual Memory Today: DRAM innovations, virtual memory (Sections )
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Lecture 1: Introduction Course organization:  13 lectures on parallel architectures  ~5 lectures on cache coherence, consistency  ~3 lectures on TM.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.
1 Efficient Data Access in Future Memory Hierarchies Rajeev Balasubramonian School of Computing Research Buffet, Fall 2010.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Main Memory CS448.
CPEN Digital System Design
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
CS203 – Advanced Computer Architecture Main Memory Slides adapted from Onur Mutlu (CMU)
15-740/ Computer Architecture Lecture 25: Main Memory
1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
CS161 – Design and Architecture of Computer Main Memory Slides adapted from Onur Mutlu (CMU)
CSE 502: Computer Architecture
Lecture 23: Cache, Memory, Security
Samira Khan University of Virginia Oct 9, 2017
Lecture 15: DRAM Main Memory Systems
Lecture: DRAM Main Memory
Lecture 23: Cache, Memory, Virtual Memory
Lecture 1: Parallel Architecture Intro
Lecture 22: Cache Hierarchies, Memory
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
CS/EE 6810: Computer Architecture
Lecture: Memory Technology Innovations
Lecture 6: Reliability, PCM
Lecture 24: Memory, VM, Multiproc
Lecture 15: Memory Design
Lecture 22: Cache Hierarchies, Memory
15-740/ Computer Architecture Lecture 19: Main Memory
DRAM Hwansoo Han.
Cache Memory and Performance
Presentation transcript:

1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection networks  4 lectures on caches  3 lectures on core design  1 lecture on parallel algorithms  3 lectures: student paper presentations  3 lectures: student project presentations

2 Logistics Reference texts: Parallel Computer Architecture, Culler, Singh, Gupta (a more recent reference is Fundamentals of Parallel Computer Architecture, Yan Solihin) Principles and Practices of Interconnection Networks, Dally & Towles Introduction to Parallel Algorithms and Architectures, Leighton Memory Systems: Cache, DRAM, Disk, Jacob et al. A number of books in the Morgan and Claypool Synthesis Lecture series

3 More Logistics Projects: simulation-based, creative, teams of up to 4 students, be prepared to spend time towards middle and end of semester – more details in a few weeks Final project report due in late April (will undergo conference-style peer reviewing); also watch out for workshop deadlines for ISCA One assignment on memory scheduling due in early Feb Grading:  50% project  20% assignment  10% paper presentation  20% take-home final

4 DRAM Main Memory Main memory is stored in DRAM cells that have much higher storage density DRAM cells lose their state over time – must be refreshed periodically, hence the name Dynamic DRAM access suffers from long access time and high energy overhead Since the pins on a processor chip are expected to not increase much, we will hit a memory bandwidth wall

5 Memory Architecture Processor Memory Controller Address/Cmd Data DIMM Bank Row Buffer DIMM: a PCB with DRAM chips on the back and front Rank: a collection of DRAM chips that work together to respond to a request and keep the data bus full A 64-bit data bus will need 8 x8 DRAM chips or 4 x16 DRAM chips or.. Bank: a subset of a rank that is busy during one request Row buffer: the last row (say, 8 KB) read from a bank, acts like a cache

6 DRAM Array Access 16Mb DRAM array = 4096 x 4096 array of bits 12 row address bits arrive first Column decoder 12 column address bits arrive next Eight bits returned to CPU, one per cycle 4096 bits are read out Row Access Strobe (RAS) Column Access Strobe (CAS)Row Buffer

7 Salient Points I DIMM, rank, bank, array  form a hierarchy in the storage organization Because of electrical constraints, only a few DIMMs can be attached to a bus Ranks help increase the capacity on a DIMM Multiple DRAM chips are used for every access to improve data transfer bandwidth Multiple banks are provided so we can be simultaneously working on different requests

8 Salient Points II To maximize density, arrays within a bank are made large  rows are wide  row buffers are wide (8KB read for a 64B request) Each array provides a single bit to the output pin in a cycle (for high density and because there are few pins) DRAM chips are described as xN, where N refers to the number of output pins; one rank may be composed of eight x8 DRAM chips (the data bus is 64 bits) The memory controller schedules memory accesses to maximize row buffer hit rates and bank/rank parallelism

9 Salient Points III Banks and ranks offer memory parallelism Row buffers act as a cache within DRAM  Row buffer hit: ~20 ns access time (must only move data from row buffer to pins)  Empty row buffer access: ~40 ns (must first read arrays, then move data from row buffer to pins)  Row buffer conflict: ~60 ns (must first writeback the existing row, then read new row, then move data to pins) In addition, must wait in the queue (tens of nano-seconds) and incur address/cmd/data transfer delays (~10 ns)

10 Technology Trends Improvements in technology (smaller devices)  DRAM capacities double every two years, but latency does not change much Power wall: 25-40% of datacenter power can be attributed to the DRAM system Will soon hit a density wall; may have to be replaced by other technologies (phase change memory, STT-RAM) The pins on a chip are not increasing  bandwidth limitations

11 Power Wall Many contributors to memory power (Micron power calc):  Overfetch  Channel  Buffer chips and SerDes  Background power (output drivers)  Leakage and refresh

12 Overfetch Overfetch caused by multiple factors:  Each array is large (fewer peripherals  more density)  Involving more chips per access  more data transfer pin bandwidth  More overfetch  more prefetch; helps apps with locality  Involving more chips per access  less data loss when a chip fails  lower overhead for reliability

13 Re-Designing Arrays Udipi et al., ISCA’10

14 Selective Bitline Activation Additional logic per array so that only relevant bitlines are read out Essentially results in finer-grain partitioning of the DRAM arrays Two papers in 2010: Udipi et al., ISCA’10, Cooper-Balis and Jacob, IEEE Micro

15 Rank Subsetting Instead of using all chips in a rank to read out 64-bit words every cycle, form smaller parallel ranks Increases data transfer time; reduces the size of the row buffer But, lower energy per row read and compatible with modern DRAM chips Increases the number of banks and hence promotes parallelism (reduces queuing delays) Initial ideas proposed in Mini-Rank (MICRO 2008) and MC-DIMM (CAL 2008 and SC 2009)

16 DRAM Variants – LPDRAM and RLDRAM LPDDR (low power) and RLDRAM (low latency) Data from Chatterjee et al. (MICRO 2012)

17 LPDRAM Low power device operating at lower voltages and currents Efficient low power modes, fast exit from low power mode Lower bus frequencies Typically used in mobile systems (not in DIMMs)

18 Title Bullet