Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory.

Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory Scheduler

Main Memory Interference Problem Causes interference between applications’ requests 2 Core Main Memory

Inter-Application Interference Results in Performance Degradation High application slowdowns due to interference 3

Tackling Inter-Application Interference: Application-aware Memory Scheduling 4 Monitor Rank Highest Ranked AID Enforce Ranks Full ranking increases critical path latency and area significantly to improve performance and fairness 4 3 2 1 2 4 3 1 Req 11 Req 24 Req 31 Req 41 Req 53 Req 71 Req 83 Request Buffer Req 52 Request App. ID (AID) = = = = = = = =

Performance vs. Fairness vs. Simplicity 5 Performance Fairness Simplicity FRFCFS PARBS ATLAS TCM Blacklisting Ideal App-unaware App-aware (Ranking) Our Solution (No Ranking) Is it essential to give up simplicity to optimize for performance and/or fairness? Our solution achieves all three goals Very Simple Low performance and fairness Complex

Outline  Introduction  Problems with Application-aware Schedulers  Key Observations  The Blacklisting Memory Scheduler Design  Evaluation  Conclusion 6

Outline  Introduction  Problems with Application-aware Schedulers  Key Observations  The Blacklisting Memory Scheduler Design  Evaluation  Conclusion 7

Problems with Previous Application-aware Memory Schedulers 1. Full ranking increases hardware complexity 2. Full ranking causes unfair slowdowns 8

Ranking Increases Hardware Complexity 10 Highest Ranked AID Enforce Ranks Req 11 Req 24 Req 31 Req 41 Req 53 Req 71 Req 83 Request Buffer Req 54 Request App. ID (AID) Next Highest Ranked AID Monitor Rank 4 3 2 1 2 4 3 1 = = = = = = = = Hardware complexity increases with application/core count

11 Ranking Increases Hardware Complexity 8x 1.8x Ranking-based application-aware schedulers incur high hardware cost From synthesis of RTL implementations using a 32nm library

Ranking Causes Unfair Slowdowns 13 GemsFDTDGemsFDTD (high memory intensity) sjeng sjeng (low memory intensity) Full ordered ranking of applications GemsFDTD denied request service

Ranking Causes Unfair Slowdowns 14 Ranking-based application-aware schedulers cause unfair slowdowns GemsFDTD (high memory intensity) sjeng (low memory intensity)

Problems with Previous Application-aware Memory Schedulers 1. Full ranking increases hardware complexity 2. Full ranking causes unfair slowdowns 15 Our Goal: Design a memory scheduler with Low Complexity, High Performance, and Fairness

Outline  Introduction  Problems with application-aware schedulers  Key Observations  The Blacklisting Memory Scheduler Design  Evaluation  Conclusion 16

Key Observation 1: Group Rather Than Rank Observation 1: Sufficient to separate applications into two groups, rather than do full ranking 17 Benefit 1: Low complexity compared to ranking Group Vulnerable Interference Causing > Monitor Rank 4 3 2 1 2 4 3 1 4 2 3 1

Key Observation 1: Group Rather Than Rank 18 GemsFDTD (high memory intensity) sjeng (low memory intensity) No denial of request service

19 Key Observation 1: Group Rather Than Rank Benefit 2: Lower slowdowns than ranking GemsFDTD (high memory intensity) sjeng (low memory intensity)

Key Observation 1: Group Rather Than Rank Observation 1: Sufficient to separate applications into two groups, rather than do full ranking 20 How to classify applications into groups? Group Vulnerable Interference Causing > Monitor Rank 4 3 2 1 2 4 3 1 4 2 3 1

Key Observation 2 Observation 2: Serving a large number of consecutive requests from an application causes interference Basic Idea: Group applications with a large number of consecutive requests as interference-causing  Blacklisting Deprioritize blacklisted applications Clear blacklist periodically (1000s of cycles) Benefits: Lower complexity Finer grained grouping decisions  Lower unfairness 21

The Blacklisting Memory Scheduler (BLISS) 23 1. Monitor AID2 00 AID Blacklist 10 20 30 AID1 1 Last Req AID3 # Consecutive Requests 1 21 234 4 2. Blacklist Last Req AID3 # Consecutive Requests 1 2. Blacklist 00 AID Blacklist 1 20 30 1 1. Monitor Req Blacklist Req 10 Req 21 Req 31 Req 40 Req 50 Req 60 Req 71 Req 80 Request Buffer ? ? ? 3. Prioritize 4. Clear Periodically 0 Simple and scalable design 3. Prioritize 4. Clear Periodically 1. Monitor ? ? ? ? ?

Methodology Configuration of our simulated baseline system – 24 cores – 4 channels, 8 banks/channel – DDR3 1066 DRAM – 512 KB private cache/core Workloads – SPEC CPU2006, TPC-C, Matlab, NAS – 80 multiprogrammed workloads 25

Metrics System Performance: Fairness: Complexity: Critical path latency and area from synthesis with 32 nm library 26

Previous Memory Schedulers FRFCFS [Zuravleff and Robinson, US Patent 1997, Rixner et al., ISCA 2000] – Prioritizes row-buffer hits and older requests FRFCFS-Cap [Mutlu and Moscibroda, MICRO 2007] – Caps number of consecutive row-buffer hits PARBS [Mutlu and Moscibroda, ISCA 2008] – Batches oldest requests from each application; prioritizes batch – Employs ranking within a batch ATLAS [Kim et al., HPCA 2010] – Prioritizes applications with low memory-intensity TCM [Kim et al., MICRO 2010] – Always prioritizes low memory-intensity applications – Shuffles thread ranks of high memory-intensity applications 27 Application-unaware + Low complexity - Low performance and fairness Application-aware + High performance and fairness - High complexity

Performance and Fairness 28 Ideal 5% 21% 1. Blacklisting achieves the highest performance 2. Blacklisting balances performance and fairness

Complexity 29 43% 70% Blacklisting reduces complexity significantly Ideal

Performance vs. Fairness vs. Simplicity 30 Performance Fairness Simplicity FRFCFS -Cap PARBS ATLAS TCM Blacklisting Ideal Highest performance Close to simplest Close to fairest Blacklisting is the closest scheduler to ideal

Summary Applications’ requests interfere at main memory Prevalent solution approach – Application-aware memory request scheduling Key shortcoming of previous schedulers: Full ranking – High hardware complexity – Unfair application slowdowns Our Solution: Blacklisting memory scheduler – Sufficient to group applications rather than rank – Group by tracking number of consecutive requests Much simpler than application-aware schedulers at higher performance and fairness 31

Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 32 The Blacklisting Memory Scheduler

Backup Slides 33

DRAM Memory Organization FR-FCFS Memory Scheduler [Zuravleff and Robinson, US Patent ‘97; Rixner et al., ISCA ‘00] – Row-buffer hit first – Older request first Unaware of inter-application interference 34 Memory Controller Bank 0 Row Buffer Bank 1 Row Buffer Bank 2 Row Buffer Bank 3 Row Buffer Rows Columns Bank 0Bank 1Bank 2Bank 3 Row hitRow miss (2-3x latency of row hit) Row Buffer

Tackling Inter-Application Interference: Application-aware Memory Scheduling Monitor application memory access characteristics (e.g., memory intensity) Rank applications based on memory access characteristics Prioritize requests at the memory controller, based on ranking 35

Performance and Fairness 5% higher system performance and 21% lower maximum slowdown than TCM 36

Complexity Results Blacklisting achieves 70% lower latency than TCM 37 Blacklisting achieves 43% lower area than TCM

Understanding Why Blacklisting Works libquantum (High memory-intensity application) Blacklisting shifts the request distribution towards the left calculix (Low memory-intensity application) Blacklisting shifts the request distribution towards the right 38

Harmonic Speedup 39

Effect of Workload Memory Intensity 40

Combining FRFCFS-Cap and Blacklisting 41

Sensitivity to Blacklisting Threshold 42

Sensitivity to Clearing Interval 43

Sensitivity to Core Count 44

Sensitivity to Channel Count 45

Sensitivity to Channel Count 46

Breakdown of Benefits 47

BLISS vs. Criticality-aware Scheduling 48

Sub-row Interleaving 49

Meeting DDR Timing Requirements 50

Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory.

Similar presentations

Presentation on theme: "Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory.

Similar presentations

Presentation on theme: "Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory."— Presentation transcript:

Similar presentations

About project

Feedback