Download presentation
Presentation is loading. Please wait.
Published byElvin Holland Modified over 9 years ago
1
© 2007 IBM Corporation HPCA – 2010 Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin Qureshi Michele Franceschini and Luis Lastras IBM T. J. Watson Research Center, Yorktown Heights, NY
2
© 2007 IBM Corporation 2 Introduction More cores in system More concurrency Larger working set DRAM-based memory system hitting: power, cost, scaling wall Phase Change Memory (PCM): Emerging technology, projected to be more scalable, higher density, power-efficient
3
© 2007 IBM Corporation 3 PCM Operation T melt T cryst Time RESET SET Temperature Switching by heating using electrical pulses RESET state: amorphous (high resistance) SET state: crystalline (low resistance) Large Current SET Low resistance Photo Courtesy: Bipin Rajendran, IBM Read latency 2x-4x of DRAM. Write latency much higher Small Current RESET High resistance Access Device Memory Element
4
© 2007 IBM Corporation 4 Problem of Contention from Slow Writes PCM writes 4x-8x slower than reads Writes not latency critical. Typical response: Use large buffers and intelligent scheduling. But once write is scheduled to a bank, later arriving read waits Write request causes contention for reads increased read latency
5
© 2007 IBM Corporation 5 Outline Introduction Quantifying the Problem Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary
6
© 2007 IBM Corporation 6 Configuration: Hybrid Memory Processor Chip DRAM Cache PCM-Based Main Memory Baseline uses read priority scheduling if WRQ < 80% full. If WRQ>80% full, oldest-first policy “forced write” (rare <0.1%) Each bank has a separate RDQ and WRQ (32-entry) (256MB)
7
© 2007 IBM Corporation 7 Problem Writes significantly increase read latency (Problem only for asymmetric memories) Read Latency=1k cycles Write Latency=8k cycles (sensitivity in paper) 12 workloads: each with 8 benchmarks from SPEC06 Baseline No Read Priority Write Latency=1K Write Latency=0 Effective Read Latency (Cycles) Norm. Execution Time
8
© 2007 IBM Corporation 8 Outline Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary
9
© 2007 IBM Corporation 9 Write Cancellation Write Cancellation: “abort” on-going write to Improve read latency Line in non-deterministic state: read matching read request from WRQ Perform write cancellation as soon as a read request arrives at a bank (as long as the write is not done in forced-mode)
10
© 2007 IBM Corporation 10 Write Cancellation with Static Threshold WCST: Cancel write request only if less than K% service done Canceling a write request close to completion is wasteful and causes episodes of forced-writes (low performance) 2365 (NeverCancel) (AlwaysCancel)
11
© 2007 IBM Corporation 11 Adaptive Write Cancellation Best threshold depends on num pending entries in WRQ. Fewer entries Higher threshold (best read latency) More entries Lower threshold (reduces forced writes) Write Cancellation with Adaptive Threshold (WCAT) Threshold = 100 – (4*NumEntriesInWRQ) 100% 0% 10 20 30 50% Num Entries in WRQ Threshold High Low ForcedWrites
12
© 2007 IBM Corporation 12 Adaptivity of WCAT Num Entries in WRQLow (0-1) Med (2-13) High (14-25) Forced (26+) WCST(K=75%)61.4%29.8%7.4%1.43% WCAT58.2%35.4%5.6%0.72% WCAT uses higher threshold initially with empty WRQ but Lower threshold later reduces the episodes of forced-writes We sampled all WRQ every 2M cycles to measure occupancy
13
© 2007 IBM Corporation 13 Results for WCAT Baseline: 2365 cycles Ideal:1K cycles Adaptive threshold reduces latency and incurs half the overhead
14
© 2007 IBM Corporation 14 Outline Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary
15
© 2007 IBM Corporation 15 Iterative Write in PCM devices In Multi-Level Cells (MLC), the programming precision requirement increases linearly with the number of levels PCM cells respond differently to same programming pulse Acknowledged solution to address uncertainty: Iterative writes Each iteration consists of steps of: write-read-verify Write Verify Read Not done Done
16
© 2007 IBM Corporation 16 Model for Iterative Writes We develop an analytical model to capture number of iterations: In terms of bits/cell, num levels written in one shot, and learning Time required to write a line is worst-case of all cells in line Avg number of iterations: 8.3 (consistent with MLC literature) MLC:3 bits/cell
17
© 2007 IBM Corporation 17 Concept of Write Pausing Iterative writes can be paused to service pending read requests Reads can be performed at the end of each iteration (potential pause point) Iter 1Iter 2Iter 3Iter 4 Potential Pause Points Iter 1Iter 2Rd XIter 3 Rd X Iter 4 Better read latency with negligible write overhead We extend the iterative write algorithm of Nirschl et al. [IEDM’07] to support Write Pausing
18
© 2007 IBM Corporation 18 Results for Write Pausing Write Pausing at end of iteration gets 85% of benefit of “Anytime” Pause
19
© 2007 IBM Corporation 19 Outline Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary
20
© 2007 IBM Corporation 20 Write Pausing + WCAT Iter 1Iter 2Iter 3 Rd X Iter 4 Iter 1Iter 2Rd XIter 3 Rd X Iter 4 Iter 1Iter 2Rd XIter 3 Rd X Iter 4 Iter2 Cancelled Only one iteration is cancelled “micro-cancellation” has low overhead
21
© 2007 IBM Corporation 21 Results Write Pause + Micro Cancellation very close to Anytime Pause (re-execution overhead of micro cancellation <4% extra iterations) Baseline: 2365 cycles Ideal:1K cycles
22
© 2007 IBM Corporation 22 Impact of Write Queue Size We will need large buffers to best exploit the benefit of Pausing Speedup wrt Baseline (32-entry)
23
© 2007 IBM Corporation 23 Outline Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary
24
© 2007 IBM Corporation 24 Summary Slow writes increase the effective read latency (2.3x) Write Cancellation: Cancel ongoing write to service read Threshold based write cancellation Adaptive Threshold: better performance, half the overhead Write Pausing exploits iterative write to service pending reads Write Pausing + Micro Cancellation close to optimal pause Effective read latency: from 2365 to 1330 cycles (1.45x speedup) We will need large write buffers to exploit the benefit of Pausing
25
© 2007 IBM Corporation 25 Questions
26
© 2007 IBM Corporation 26 Write Pausing in Iterative Algorithms (Nirschl+ IEDM’07)
27
© 2007 IBM Corporation 27 Workloads and Figure of Merit 12 memory-intensive workloads from SPEC 2006: 6 rate-mode (eight copies of same benchmark) 6 mix-mode (two copies of four benchmarks) Key metric: Effective Read Latency Tin = Time at which read request enters RDQ Tout = Time at which read request finishes service at memory Effective Read Latency = Tout – Tin (average reported)
28
© 2007 IBM Corporation 28 Sensitivity to Write Latency At WriteLatency=4K, the speedup is 1.35x instead of 1.45x (at 8K latency)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.