Presentation is loading. Please wait.

Presentation is loading. Please wait.

Samira Khan University of Virginia Oct 9, 2017

Similar presentations


Presentation on theme: "Samira Khan University of Virginia Oct 9, 2017"— Presentation transcript:

1 Samira Khan University of Virginia Oct 9, 2017
COMPUTER ARCHITECTURE CS 6354 Main Memory Samira Khan University of Virginia Oct 9, 2017 The content and concept of this course are adapted from CMU ECE 740

2 AGENDA Logistics Review from last lecture Main Memory

3 LOGISTICS Oct 11: First Student Paper Presentation Oct 13: Two papers
Neural Acceleration for General-Purpose Approximate Programs, MICRO 2012 RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization, MICRO 2013 Oct 13: Reviews due Presenters do not need to submit reviews

4 Presentation Guidelines
30 mins for presentation and 5 mins for Q&A Start with authors’ slide and then modify them to make yours Format is similar to reviews Spend significant time on the background and problem Key idea Mechanism Results. Pros, cons, what did you like, future work Send the slides at least one week before the presentation Practice at least 3 times before the class If necessary, first write down and then practice

5 Presentation Guidelines
Background and problem statement (10 mins) Key idea/mechanism (8-10 mins) Results (2-3 mins) Pros/cons/discussion (5-7 mins) Q&A (5 mins)

6 DRAM CELL OPERATION 1. A DRAM cell stores data as charge
2. A DRAM cell is refreshed every 64 ms Transistor Capacitor Bitline Bitline Contact Here in the left side I will show a logical view of a DRAM cell, and in the right side I will show a vertical cross section of a cell. DRAM cell stores data as charge in a capacitor. The amount of the charge indicates either data zero or one. A transistor acts as a switch to the capacitor connected to a line called bitline. When a DRAM cell is read, the transistor in turned on, and the capacitor charge is sensed through the bitline. Capacitor Transistor LOGICAL VIEW VERTICAL CROSS SECTION A DRAM cell

7 THE DRAM SCALING PROBLEM
DRAM stores charge in a capacitor (charge-based memory) Capacitor must be large enough for reliable sensing Access transistor should be large enough for low leakage and high retention time Scaling beyond 40-35nm (2013) is challenging [ITRS, 2009] DRAM capacity, cost, and energy/power hard to scale

8 SOLUTIONS TO THE DRAM SCALING PROBLEM
Two potential solutions Tolerate DRAM (by taking a fresh look at it) Enable emerging memory technologies to eliminate/minimize DRAM Do both Hybrid memory systems

9 SOLUTION 1: TOLERATE DRAM
Overcome DRAM shortcomings with System-DRAM co-design Novel DRAM architectures, interface, functions Better waste management (efficient utilization) Key issues to tackle Reduce refresh energy Improve bandwidth and latency Reduce waste Enable reliability at low cost

10 SOLUTION 2: EMERGING MEMORY TECHNOLOGIES
Some emerging resistive memory technologies seem more scalable than DRAM (and they are non-volatile) Example: Phase Change Memory Expected to scale to 9nm (2022 [ITRS]) Expected to be denser than DRAM: can store multiple bits/cell But, emerging technologies have shortcomings as well Can they be enabled to replace/augment/surpass DRAM? Even if DRAM continues to scale there could be benefits to examining and enabling these new technologies.

11 HYBRID MEMORY SYSTEMS CPU
DRAMCtrl PCM Ctrl DRAM Phase Change Memory (or Tech. X) Fast, durable Small, leaky, volatile, high-cost Large, non-volatile, low-cost Slow, wears out, high active energy Hardware/software manage data allocation and movement to achieve the best of multiple technologies

12 WHY MEMORY HIERARCHY? We want both fast and large
But we cannot achieve both with a single level of memory Idea: Have multiple levels of storage (progressively bigger and slower as the levels are farther from the processor) and ensure most of the data the processor needs is kept in the fast(er) level(s)

13 MEMORY HIERARCHY Fundamental tradeoff Idea: Memory hierarchy
Fast memory: small Large memory: slow Idea: Memory hierarchy Latency, cost, size, bandwidth Hard Disk CPU Cache RF Main Memory (DRAM)

14 A MODERN MEMORY HIERARCHY
Register File 32 words, sub-nsec L1 cache ~32 KB, ~nsec L2 cache 512 KB ~ 1MB, many nsec L3 cache, ..... Main memory (DRAM), GB, ~100 nsec Swap Disk 100 GB, ~10 msec manual/compiler register spilling Memory Abstraction Automatic HW cache management automatic demand paging

15 THE DRAM SUBSYSTEM

16 DRAM SUBSYSTEM ORGANIZATION
Channel DIMM Rank Chip Bank Row/Column

17 PAGE MODE DRAM A DRAM bank is a 2D array of cells: rows x columns
A “DRAM row” is also called a “DRAM page” “Sense amplifiers” also called “row buffer” Each address is a <row,column> pair Access to a “closed row” Activate command opens row (placed into row buffer) Read/write command reads/writes column in the row buffer Precharge command closes the row and prepares the bank for next access Access to an “open row” No need for activate command

18 DRAM BANK OPERATION Access Address: (Row 0, Column 0) Columns
Row address 0 Row address 1 Row decoder Rows Row 0 Empty Row 1 Row Buffer CONFLICT ! HIT HIT Column address 0 Column address 1 Column address 0 Column address 85 Column mux Data

19 THE DRAM CHIP Consists of multiple banks (2-16 in Synchronous DRAM)
Banks share command/address/data buses The chip itself has a narrow interface (4-16 bits per read)

20 DRAM RANK AND MODULE Rank: Multiple chips operated together to form a wide interface All chips comprising a rank are controlled at the same time Respond to a single command Share address and command buses, but provide different data A DRAM module consists of one or more ranks E.g., DIMM (dual inline memory module) This is what you plug into your motherboard If we have chips with 8-bit interface, to read 8 bytes in a single access, use 8 chips in a DIMM

21 A 64-BIT WIDE DIMM (ONE RANK)

22 A 64-BIT WIDE DIMM (ONE RANK)
Advantage: Acts like a high-capacity DRAM chip with a wide interface Flexibility: memory controller does not need to deal with individual chips Disadvantage: Granularity: Accesses cannot be smaller than the interface width

23 DRAM CHANNELS 2 Independent Channels: 2 Memory Controllers (Above)
2 Dependent/Lockstep Channels: 1 Memory Controller with wide interface (Not shown above)

24 THE DRAM SUBSYSTEM THE TOP DOWN VIEW

25 DRAM SUBSYSTEM ORGANIZATION
Channel DIMM Rank Chip Bank Row/Column

26 DIMM (Dual in-line memory module)
THE DRAM SUBSYSTEM “Channel” DIMM (Dual in-line memory module) Processor Memory channel Memory channel

27 DIMM (Dual in-line memory module)
BREAKING DOWN A DIMM DIMM (Dual in-line memory module) Side view Front of DIMM Back of DIMM

28 DIMM (Dual in-line memory module)
BREAKING DOWN A DIMM DIMM (Dual in-line memory module) Side view Front of DIMM Back of DIMM Rank 0: collection of 8 chips Rank 1

29 RANK Rank 0 (Front) Rank 1 (Back) <0:63> <0:63> Addr/Cmd
CS <0:1> Data <0:63> Memory channel

30 . . . BREAKING DOWN A RANK Chip 0 Chip 1 Chip 7 Rank 0 <0:63>
<0:7> <8:15> <56:63> Data <0:63>

31 BREAKING DOWN A CHIP ... 8 banks Chip 0 Bank 0 <0:7> <0:7>

32 BREAKING DOWN A BANK ... ... Row-buffer Bank 0 <0:7> 1B 1B 1B
1B (column) row 16k-1 Bank 0 ... row 0 <0:7> Row-buffer 1B 1B 1B ... <0:7>

33 DRAM SUBSYSTEM ORGANIZATION
Channel DIMM Rank Chip Bank Row/Column

34 EXAMPLE: TRANSFERRING A CACHE BLOCK
Physical memory space 0xFFFF…F Channel 0 ... DIMM 0 Mapped to 0x40 Rank 0 64B cache block 0x00

35 EXAMPLE: TRANSFERRING A CACHE BLOCK
Physical memory space Chip 0 Chip 1 Chip 7 Rank 0 0xFFFF…F . . . ... <0:7> <8:15> <56:63> 0x40 64B cache block Data <0:63> 0x00

36 EXAMPLE: TRANSFERRING A CACHE BLOCK
Physical memory space Chip 0 Chip 1 Chip 7 Rank 0 0xFFFF…F . . . Row 0 Col 0 ... <0:7> <8:15> <56:63> 0x40 64B cache block Data <0:63> 0x00

37 EXAMPLE: TRANSFERRING A CACHE BLOCK
Physical memory space Chip 0 Chip 1 Chip 7 Rank 0 0xFFFF…F . . . Row 0 Col 0 ... <0:7> <8:15> <56:63> 0x40 64B cache block Data <0:63> 8B 0x00 8B

38 EXAMPLE: TRANSFERRING A CACHE BLOCK
Physical memory space Chip 0 Chip 1 Chip 7 Rank 0 0xFFFF…F . . . Row 0 Col 1 ... <0:7> <8:15> <56:63> 0x40 64B cache block Data <0:63> 8B 0x00

39 EXAMPLE: TRANSFERRING A CACHE BLOCK
Physical memory space Chip 0 Chip 1 Chip 7 Rank 0 0xFFFF…F . . . Row 0 Col 1 ... <0:7> <8:15> <56:63> 0x40 64B cache block 8B Data <0:63> 8B 0x00 8B

40 EXAMPLE: TRANSFERRING A CACHE BLOCK
Physical memory space Chip 0 Chip 1 Chip 7 Rank 0 0xFFFF…F . . . Row 0 Col 1 ... <0:7> <8:15> <56:63> 0x40 64B cache block 8B Data <0:63> 8B 0x00 A 64B cache block takes 8 I/O cycles to transfer. During the process, 8 columns are read sequentially.

41 LATENCY COMPONENTS: BASIC DRAM OPERATION
CPU → controller transfer time Controller latency Queuing & scheduling delay at the controller Access converted to basic commands Controller → DRAM transfer time DRAM bank latency Simple CAS (column address strobe) if row is “open” OR RAS (row address strobe) + CAS if array precharged OR PRE + RAS + CAS (worst case) DRAM → Controller transfer time Bus latency (BL) Controller to CPU transfer time

42 MULTIPLE BANKS (INTERLEAVING) AND CHANNELS
Enable concurrent DRAM accesses Bits in address determine which bank an address resides in Multiple independent channels serve the same purpose But they are even better because they have separate data buses Increased bus bandwidth Enabling more concurrency requires reducing Bank conflicts Channel conflicts

43 HOW MULTIPLE BANKS HELP

44 DRAM REFRESH (I) DRAM capacitor charge leaks over time
The memory controller needs to read each row periodically to restore the charge Activate + precharge each row every N ms Typical N = 64 ms Implications on performance? -- DRAM bank unavailable while refreshed -- Long pause times: If we refresh all rows in burst, every 64ms the DRAM will be unavailable until refresh ends Burst refresh: All rows refreshed immediately after one another Distributed refresh: Each row refreshed at a different time, at regular intervals

45 DRAM REFRESH (II) Distributed refresh eliminates long pause times
How else we can reduce the effect of refresh on performance? Can we reduce the number of refreshes?

46 DOWNSIDES OF DRAM REFRESH
-- Energy consumption: Each refresh consumes energy -- Performance degradation: DRAM rank/bank unavailable while refreshed -- QoS/predictability impact: (Long) pause times during refresh -- Refresh rate limits DRAM density scaling Liu et al., “RAIDR: Retention-aware Intelligent DRAM Refresh,” ISCA 2012.

47 Samira Khan University of Virginia Oct 9, 2017
COMPUTER ARCHITECTURE CS 6354 Main Memory Samira Khan University of Virginia Oct 9, 2017 The content and concept of this course are adapted from CMU ECE 740


Download ppt "Samira Khan University of Virginia Oct 9, 2017"

Similar presentations


Ads by Google