Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.

Similar presentations


Presentation on theme: "1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah."— Presentation transcript:

1 1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah

2 2 Main Memory Problems PROCESSOR DIMM 1. Energy 2. High capacity at high bandwidth 3. Reliability

3 3 Motivation: Memory Energy Contributions of memory to overall system energy: 25-40%, IBM, Sun, and Google server data summarized by Meisner et al., ASPLOS’09 HP servers: 175 W out of ~785 W for 256 GB memory (HP power calculator) Intel SCC: memory controller contributes 19-69% of chip power, ISSCC’10

4 4 Motivation: Reliability DRAM data from Schroeder et al., SIGMETRICS’09:  25K-70K errors per billion device hours per Mbit  8% of DRAM DIMMs affected by errors every year DRAM error rates may get worse as scalability limits are reached; PCM (hard and soft) error rates expected to be high as well Primary concern: storage and energy overheads for error detection and correction ECC support is not too onerous; chip-kill is much worse

5 5 Motivation: Capacity, Bandwidth Processor DIMM

6 6 Motivation: Capacity, Bandwidth Processor Cores are increasing, but pins are not DIMM

7 7 Motivation: Capacity, Bandwidth Processor Cores are increasing, but pins are not DIMM High channel frequency  fewer DIMMs Will eventually need disruptive shifts: NVM, optics Can’t have high capacity, high bandwidth, and low energy Pick 2 of the 3!

8 8 Memory System Basics Processor M DIMM M M Multiple on-chip memory controllers that handle multiple 64-bit channels

9 9 Memory System Basics: FB-DIMM Processor M DIMM M FB-DIMM: Can boost capacity with narrow channels and buffering at each DIMM M DIMM M M

10 10 What’s a Rank? Processor M x8 64b DIMM Rank: DRAM chips required to provide the 64b output expected by a JEDEC standard bus For example: 8 x8 DRAM chips

11 11 What’s a Bank? Processor M x8 64b DIMM Bank: A portion of a rank that is tied up when servicing a request; multiple banks in a rank enable parallel handling of multiple requests BANK

12 12 What’s an Array? Processor M x8 64b DIMM Array: Matrix of cells One array provides 1 bit/cycle Each array reads out an entire row Large array  high density BANK

13 13 What’s a Row Buffer? … Array Wordline Bitlines Row Buffer RAS CAS Output pin

14 14 Row Buffer Management Row buffer: collection of rows read out by arrays in a bank Row buffer hits incur low latency and low energy Bitlines must be precharged before a new row can be read Open page policy: delays the precharge until a different row is encountered Close page policy: issues the precharge immediately

15 15 Primary Sources of Energy Inefficiency Overfetch: 8 KB of data read out for each cache line request Poor row buffer hit rates: diminished locality in multi-cores Electrical medium: bus speeds have been increasing Reliability measures: overhead in building a reliable system from inherently unreliable parts

16 16 SECDED Support 64-bit data word8-bit ECC One extra x8 chip per rank Storage and energy overhead of 12.5% Cannot handle complete failure in one chip

17 17 Chipkill Support I Use 72 DRAM chips to read out 72 bits Dramatic increase in activation energy and overfetch Storage overhead is still 12.5% 64-bit data word8-bit ECC At most one bit from each DRAM chip

18 18 Chipkill Support II Use 13 DRAM chips to read out 13 bits Storage and energy overhead: 62.5% Other options exist; trade-off between energy and storage 8-bit data word5-bit ECC At most one bit from each DRAM chip

19 19 Summary So Far We now understand… why memory energy is a problem - overfetch, row buffer miss rates why reliability incurs high energy overheads - chipkill support requires high activation per useful bit why capacity and bandwidth increases cost energy - need high frequency and buffering per hop

20 20 Crucial Timing Disruptive changes may be compelling today… Increasing role of memory energy Increasing role of memory errors Impact of multi-core: high bandwidth needs, loss of locality Emerging technologies (NVM, optics)  will require a revamp of memory architecture  ideas can be easily applied to NVM  role of DRAM may change

21 21 Attacking the Problem Find ways to maximize row buffer utility Find ways to reduce overfetch Treat reliability as a first-class design constraint Use photonics and 3D to boost capacity and bandwidth Solutions must be very cost-sensitive

22 22 Maximizing Row Buffer Locality Micro-pages (ASPLOS’10) Handling multiple memory controllers (PACT’10) On-going work: better write scheduling, better bank management (data mapping, row closure)

23 23 Micro-Pages Key observation: most accesses to a page are localized to a small region (micro-page)

24 24 Solution Identify hot micro-pages Co-locate hot micro-pages in reserved DRAM rows Memory controller keeps track of re-direction Low overheads if applications have few hot micro-pages that account for most memory accesses Processor M DIMM

25 25 Results Overall 9% improvement in performance and 15% reduction in energy

26 26 Handling Multiple Memory Controllers Data mapping across multiple memory controllers is key:  Must equalize load and queuing delays  Must minimize “distance”  Must maximize row buffer hit rates M DIMM M M

27 27 Solution Cost function to guide initial page placement Similar cost function to guide page migration Initial page placement improves performance by 7%, page migration by 9% Row buffer hit rates can be doubled

28 28 Reducing Overfetch Key idea: eliminate overfetch by employing smaller arrays and activating a single array in a single chip Single Subarray Access (SSA), ISCA’10 Positive effects: Minimizes activation energy Small activation footprint: more arrays can be asleep longer Enables higher parallelism and reduces queuing delays Negative effects: Longer transfer time Drop in density No row buffer hits Vulnerable to chip failure Change to standards

29 29 Energy Results Dynamic energy reduction of 6x In some cases, 3x reduction in leakage

30 30 Performance Results SSA better on half the programs (mem-intensive ones)

31 31 Support for Reliability Checksum support per row allows low-cost error detection Can build a 2 nd tier error-correction scheme, based on RAID DRAM chip Checksum Data row … Parity DRAM chip Reads: single array read Writes: two array reads and two array writes

32 32 Capacity and Bandwidth Silicon photonics to break the pin barrier at the processor But, several concerns at the DIMM:  Breaking the DRAM pin barrier will impact cost!  High capacity  daisy-chaining and loss of power  High static power for photonics; need high utilization  Scheduling for large capacities

33 33 Exploiting 3D Stacks (ISCA’11) Processor DIMM Waveguide DRAM chips Interface die + Stack controller Memory controller Interface die for photonic penetration Does not impact DRAM design Few photonic hops; high utilization Interface die schedules low-level operations

34 34 Packet-Based Scheduling Protocol High capacity  high scheduling complexity Move to a packet-based interface  Processor issues an address request  Processor reserves a slot for data return  Scheduling minutiae are handled by stack controller  Data is returned at the correct time  Back-up slot in case deadline is not met Better plug’n’play Reduced complexity at processor Can handle heterogeneity

35 35 Summary Treat reliability as a first-order constraint Possible to use photonics to break pin barrier and not disrupt memory chip design: boosts bandwidth and capacity ! Can reduce memory chip energy by reducing overfetch and with better row buffer management

36 36 Acks Terrific students in the Utah Arch group Prof. Al Davis (Utah) and collaborators at HP, Intel, IBM Funding from NSF, Intel, HP, University of Utah


Download ppt "1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah."

Similar presentations


Ads by Google