Insertion Policy Selection Using Decision Tree Analysis Samira Khan, Daniel A. Jiménez University of Texas at San Antonio.

Slides:



Advertisements
Similar presentations
Dead Block Replacement and Bypass with a Sampling Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
Advertisements

Bypass and Insertion Algorithms for Exclusive Last-level Caches
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar Arrvindh Shriraman Eric Matthews Lesley Shannon Hongzhou Zhao.
Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University.
1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.
1 The 3P and 4P cache replacement policies Pierre Michaud INRIA Cache Replacement Championship June 20, 2010.
Prefetch-Aware Cache Management for High Performance Caching
Cache Replacement Policy Using Map-based Adaptive Insertion Yasuo Ishii 1,2, Mary Inaba 1, and Kei Hiraki 1 1 The University of Tokyo 2 NEC Corporation.
Improving Cache Performance by Exploiting Read-Write Disparity
Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall.
CS61C Midterm #2 Review Session
LRU Replacement Policy Counters Method Example
1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University.
1 Virtual Memory Sample Questions Project 3 – Build Pthread Lib Extra Office Hour: Wed 4pm-5pm HILL 367 Recitation 6.
1 Lecture 10: Large Cache Design III Topics: Replacement policies, prefetch, dead blocks, associativity Sign up for class mailing list Pseudo-LRU has a.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.
Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.
Memory Management Last Update: July 31, 2014 Memory Management1.
Bismita Srichandan, Semra Kul, Rasanjalee Disanayaka
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.
ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 6 Fair Caching Mechanisms.
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
Bypass and Insertion Algorithms for Exclusive Last-level Caches Jayesh Gaur 1, Mainak Chaudhuri 2, Sreenivas Subramoney 1 1 Intel Architecture Group, Intel.
Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.
Sampling Dead Block Prediction for Last-Level Caches
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
HPCA Laboratory for Computer Architecture1/11/2010 Dimitris Kaseridis 1, Jeff Stuecheli 1,2, Jian Chen 1 & Lizy K. John 1 1 University of Texas.
International Symposium on Computer Architecture ( ISCA – 2010 )
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.
Big Idea: -Add, subtract, multiply, and divide complex numbers.
Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur
The Evicted-Address Filter
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
PAGE REPLACEMNT ALGORITHMS FUNDAMENTAL OF ALGORITHMS.
Cache Replacement Championship
Cache Replacement Policy Based on Expected Hit Count
CS161 – Design and Architecture of Computer
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Improving Cache Performance using Victim Tag Stores
Memory Management 5/11/2018 9:49 PM
Memshare: a Dynamic Multi-tenant Key-value Cache
Cache Performance Samira Khan March 28, 2017.
Less is More: Leveraging Belady’s Algorithm with Demand-based Learning
RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks
Prefetch-Aware Cache Management for High Performance Caching
Lecture 13: Large Cache Design I
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
ECE-752 Zheng Zheng, Anuj Gadiyar
Using Dead Blocks as a Virtual Victim Cache
International Symposium on Computer Architecture ( ISCA – 2010 )
A Case for MLP-Aware Cache Replacement
Module IV Memory Organization.
Lecture 15: Large Cache Design III
CARP: Compression-Aware Replacement Policies
CDA 5155 Caches.
Adapted from slides by Sally McKee Cornell University
Massachusetts Institute of Technology
Lecture 14: Large Cache Design II
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Overview Problem Solution CPU vs Memory performance imbalance
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

Insertion Policy Selection Using Decision Tree Analysis Samira Khan, Daniel A. Jiménez University of Texas at San Antonio

Motivation  L1 and L2 filters the cache access  Last Level Cache (LLC) does not have much temporal locality  Large fraction of blocks brought to cache are never accessed again (zero reuse lines).  For SPEC CPU 2006 benchmarks, on average 60.18% lines are never accessed again while they are in the LLC

Motivation  No cache bursts in LLC  Only small portion of hits occur near the MRU position

Goal  Get rid of zero reuse lines as early as possible  Keep lines in cache for sufficient time to get the first hit  Minimal change to LRU policy  Use as little space as possible

Insertion Position Selection  Find the optimal insertion position  Zero reuse lines will get evicted earlier  Most of the non zero reuse lines should be in cache before their first hit  This will get rid of zero reuse lines and make space for useful lines  Use Decision Tree Analysis via set dueling to find the position  This allows choosing among the insertion positions to set duel

Set dueling between middle and MRU pos Set dueling between LRU and middle pos Set dueling between nearMRU and MRU pos Set dueling between nearLRU and middle pos Insert pos LRU Insert pos nearLRU Insert pos middle Insert pos nearMRU Insert pos MRU LRU posmiddle pos MRU pos nearLRU pos nearMRU pos middle pos winner MRU pos winner Middle pos winnerLRU pos winner nearMRU pos winner MRU pos winner nearLRU pos winner Middle pos winner For 400.perlbench 66.67% lines brought to cache are never accessed again and 73.03% hits occur in between MRU and middle position

Adaptive Multi Set Dueling  Current multi set dueling  Have one leader set for each insertion policy  Partial follower sets duplicate the winner set policy  Each policy set duel in a tournament manner  Not scalable  Leader sets performing the looser policies hurt performance  Adaptive multi set dueling  Leader set adaptively chooses the policy  No need for partial follower set  Scalable

Result

Space Overhead ParameterStorageTotal Storage LRU overhead per line4 bits1024*16*4 = 8 KB Set type per set2 bits1024 * 2 = 2048 bits Two counters (psel1 & psel2)Each 10 bits20 bits One counter (switched)1 bit Total8 KB bits Space overhead for a 1MB 16 way set associative LLC

Conclusion  Insertion Position Selection using Decision Tree Analysis  Requires minimal change to LRU  Needs only 2069 bits extra space  Chooses the best insertion position adaptively  Gets rid of zero reuse lines without any storage hungry predictor  Makes multi set dueling scalable

Questions

Zero Reuse Lines in SPEC CPU 2006

psel ab psel cd psel ef psel gh psel 1 psel 2 psel 1 papa pbpb φ ab pcpc pdpd pepe pfpf pgpg phph φ cd φ ef φ gh pbpb papa pαpα , if p b wins -1, if p a wins All sets in LLC Leader sets in adaptive multi set dueling scheme Leader sets in current multi set dueling scheme Adaptive Multi Set Dueling

Result MRU nearMRU middle nearLRU LRU psel 2 psel 1 s