A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

Cache and Virtual Memory Replacement Algorithms
Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi.
361 Computer Architecture Lecture 15: Cache Memory
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
DATA ADDRESS PREDICTION Zohair Hyder Armando Solar-Lezama CS252 – Fall 2003.
Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Chapter Twelve Memory Organization
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 26 Fasih ur Rehman.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
Cache Replacement Policy Based on Expected Hit Count
Main Memory Cache Architectures
Cache Memory.
The Memory System (Chapter 5)
Replacement Policy Replacement policy:
Multilevel Memories (Improving performance using alittle “cash”)
Basic Performance Parameters in Computer Architecture:
Cache Memory Presentation I
Chapter 6 Memory System Design
Performance metrics for caches
Adapted from slides by Sally McKee Cornell University
CMSC 611: Advanced Computer Architecture
Cache Replacement in Modern Processors
Performance metrics for caches
Performance metrics for caches
Cache - Optimization.
Lecture 9: Caching and Demand-Paged Virtual Memory
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Performance metrics for caches
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement

Outline Motivation Cache replacement algorithms Poisson processes in webserver-based cache Poisson processes in microprocessor-based cache Maximum Likelihood estimator Predicting procedures Simulation and evaluation Results compared to LRU Future work

Motivation Memory-processor speed gap is getting larger

One of the Solutions Memory hierarchy L1/L2/L3 closer to chip, less latency, but smaller size Cache replacement Cache is fast but limited in size conflict Good replacement policies needed on contention

Cache Replacement Algorithms Cache conflict in a 4-way associative cache Set index

Common Cache Replacement Algorithms Random Least Recently Used (LRU, most widely used) Least Frequently Used FIFO

Poisson Processes for webserver cache Arrival time of queries to a webserver can be modeled as a Poisson Process Interpretation: the probability of having k queries up to some point of time Assumption Arrivals of queries are independent of each other Not always true but valid for most cases

Poisson Processes for microprocessor cache Independence assumption invalid References to cache are highly correlated (especially to data cache) Temporal locality Spatial locality

Poisson Processes for microprocessor cache cont. One Poisson process for each block within a set Set index An example set from a 4-way associative cache

Poisson Processes for microprocessor cache cont. Correlation between these four random processes Set index Local counters to each block One global counter n for each set

Maximum Likelihood Estimator The estimate is the arithmetic mean of

Predicting Procedures Initially each block has Given a previous calculated for block, the estimated is calculated as: Probabilities for replacement is then Replace block with lowest probability Choose randomly on equal probabilities

Simulation and Evaluation Simulator: MyDLX cache simulator from EE628 Metrics: miss rate for Instruction-cache and Data-Cache Various associativities Five benchmarks Compared to LRU

Results

Results not encouraging Sometimes 0% miss rate for both algorithms (might be due the inherent characteristics of benchmarks) Statistical approach worse than LRU for most cases Getting worse for higher associativity (more blocks to predict)

Analysis of deficiencies of our model Independence model may be inaccurate (even accesses to the same block within a set may not be independent) Local counter is reset to 0 on eviction (history eliminated)

Future work and challenges More accurate model with more correlation parameters for each Poisson process Implementation complexity (hardware expensive; LRU is already expensive at high associativity) May be implemented as a software cache as a supplement to hardware cache.

The End Thank you! Questions?