Presentation is loading. Please wait.

Presentation is loading. Please wait.

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.

Similar presentations


Presentation on theme: "DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08."— Presentation transcript:

1 DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

2 DRAM Background

3 Typical Memory Busses: address, command, data, DIMM (Dual In-Line Memory Module) selection

4 DRAM cell

5 DRAM array

6 DRAM device or chip

7 Command/data movement: DRAM chip

8 Operations(commands) protocol, timing

9 Examples of DRAM operations(commands)

10

11 The purpose of a row access command is to move data from the DRAMarrays to the sense amplifiers. tRCD and tRAS

12 A column read command moves data from the array of sense amplifiers of a given bank to the memory controller. tCAS, tBurst

13 Precharge: separate phase that is a prerequisite for the subsequent phases of a row access operation (bitlines set to Vcc/2 or Vcc)

14 Organization, access, protocols

15 Logical Channels: set of physical channels connected to the same memory controller

16 Examples of Logical Channels

17 Rank = set of banks

18

19 Row = DRAM page

20 Width: aggregating DRAM chips

21 Scheduling: banks

22 Scheduling banks

23 Scheduling: ranks

24 Open x Close page Open-page: data access to and from cells requires separate row and column commands – Favors accesses on the same row (sense aps open) – Typical general purpose computers (desktop/laptop) Close-page: – Intense amount of requests, favors random accesses – Large multiprocessor/multicore systems

25 Available Parallelism in DRAM System Organization Channel: Pros: performance different logical channels, independent memory controllers schedulling strategies cons Number of pins, power to deliver Smart but not adaptive firmware

26 Available Parallelism in DRAM System Organization Rank pros accesses can proceed in parallel in different ranks (busses availability) cons Rank-to-rank switching penalties in high frequency Globally synchronous DRAM (global clock)

27 Available Parallelism in DRAM System Organization Bank Different banks (busses availability) Row Only 1 row/bank can be active at any time period Column Depends on management (close-page / open-page)

28 Paper: Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07

29

30 Issues parallel bus scaling: frequency, widths, length, depth (man hops => latency ) #memory controllers increased CPUs, GPUs – #DIMMs/channel (depth) decreases 4DIMMs/channel in DDRs 2 DIMMs/channel in DDR2 1 DIMM/channel in DDR3 scheduling

31 Contributions Applied DDR based memory controller policies in FBDIMM memory Evaluation of Performance Exploit FBDIMM depth: rank (DIMM) parallelism latency and bandwidth for FBDIMM and DDR – high utilization of the channels, FBDIMM 7% in latency 10% – low utilization of the channels 25% in latency 10 % in bandwidth

32 Northbound channel: reads / Southbound-channel: writes AMB: pass-through switch, buffer, serial/parallel converter

33 Methodology DRAMsim simulator Execution-driven simulator Detailed models of FBDIMM and DDR2 based on real standard configurations Standalone / coupled with M5/SS/Sesc Benchmarks: bandwidth-bound SVM from Bio-Parallel (r:90%) SPEC-mixed: 16 independent (r:w = 2:1) UA from NAS (r:w = 3:2) ART (SPEC-2000, OpenMP) (r:w = 2:1)

34 Methodology: cont Different scheduling policies: greedy, OBF, most/last pending and RIFF 16-way CMP, 8MB L2 Multi-threaded traces gathered with CMP$im SPEC traces using Simplescalar with 1MB L2, in-order core 1 rank/DIMM

35 High-bandwidth utilization: – Better bandwidth: FBDIMM – Larger latency

36 ART and UA: latency reduction

37 Low utilization: serialization cost Depth: FBDIMM scheduler offsets serialization

38 Overhead: queue, south and rank availability Single-rank: higher latency

39 Scheduling Best: RIFF, priority on reads than writes

40 Bandwidth is less sensitive th Higher latency in open-page mode More channels => decreases channel utilization


Download ppt "DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08."

Similar presentations


Ads by Google