Analysis of and Dynamic Page Remapping Technique to Reduce L2 Misses in an SMT Processor CSE 240B Class Project Spring 2005, UCSD Subhradyuti Sarkar Siddhartha.

Slides:

Advertisements

Similar presentations

Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare

Advertisements

The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University.

Pooja ROY, Manmohan MANOHARAN, Weng Fai WONG National University of Singapore ESWEEK (CASES) October 2014 EnVM : Virtual Memory Design for New Memory Architectures.

D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.

Segmentation and Paging Considerations

1 Memory Management Managing memory hierarchies. 2 Memory Management Ideally programmers want memory that is –large –fast –non volatile –transparent Memory.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.

Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.

CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.

Memory Management 2010.

1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.

Cache Conscious Allocation of Pointer Based Data Structures, Revisited with HW/SW Prefetching by: Josefin Hallberg, Tuva Palm and Mats Brorsson Presented.

Computer Organization and Architecture

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.

Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.

CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.

11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

Chapter 3 Memory Management: Virtual Memory

Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.

CS333 Intro to Operating Systems Jonathan Walpole.

A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,

Computer Architecture Lecture 28 Fasih ur Rehman.

Data Structures Week 5 Further Data Structures The story so far  We understand the notion of an abstract data type.  Saw some fundamental operations.

Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.

IT253: Computer Organization

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Paging.

Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

2015/11/26\course\cpeg323-08F\Topic7e1 Virtual Memory.

1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.

Swap Space and Other Memory Management Issues Operating Systems: Internals and Design Principles.

Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Implementation.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.

Exploiting Multithreaded Architectures to Improve Data Management Operations Layali Rashid The Advanced Computer Architecture U of C (ACAG) Department.

DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline  Dissertation Research Update  Previous Approach and Results  Modified Research Plan  Identifying.

Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:

1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.

3.1 Advanced Operating Systems Superpages TLB coverage is the amount of memory mapped by TLB. I.e. the amount of memory that can be accessed without TLB.

Operating Systems Lecture 9 Introduction to Paging Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

Translation Lookaside Buffer

Memory COMPUTER ARCHITECTURE

Memory Management Ch.7 and Ch.8.

18742 Parallel Computer Architecture Caching in Multi-core Systems

What we need to be able to count to tune programs

CGS 3763 Operating Systems Concepts Spring 2013

Chapter 5 Memory CSE 820.

Adaptive Code Unloading for Resource-Constrained JVMs

Virtual Memory فصل هشتم.

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Exam Review Mark Stanovich Operating Systems COP

Lecture 14: Large Cache Design II

TLB Performance Seung Ki Lee.

CSC3050 – Computer Architecture

Caches: AAT, 3C’s model of misses Prof. Eric Rotenberg

Code Transformation for TLB Power Reduction

Principle of Locality: Memory Hierarchies

CSE 542: Operating Systems

Review What are the advantages/disadvantages of pages versus segments?

CSE 542: Operating Systems

Presentation transcript:

Analysis of and Dynamic Page Remapping Technique to Reduce L2 Misses in an SMT Processor CSE 240B Class Project Spring 2005, UCSD Subhradyuti Sarkar Siddhartha Saha

Motivation ● Considerable amount of penalty for cache miss. L2 miss penalty is usually orders of magnitude higher than L1 miss. ● SMT Processors maybe more vulnerable to L2 misses due to: – If more than one identical thread runs in the processor, then they will always collide in the same cache page – Even for different threads, if they are compiled by the same compiler then they will have similar virtual address range for stack, heap and data segment.

Introduction ● In this work, we look at the hybrid hardware/software technique to to reduce L2 cache misses in an SMT processor. ● We use a set of hardware counters for every cache page to keep track of the relative hotness/coldness of cache pages. ● If the miss-rate and/or access rate amongst the cache pages become skewed over a certain threshold, we use an adaptive algorithm which tries to smooth out the cache utilization.

Contribution Summary ● An adaptive algorithm which can detect variation in utilization amongst cache pages. ● Another algorithm that can smoothen the cache utilization, possibly improving the cache preference.

Hot/Cold Detection Algorithm ● Short Term History – We detect if a cache page is hot, cold and neutral in last epoch – COLD: ● access_count[i] < total_access_count/N*t_cold – HOT: ● miss_rate[i] > t_miss – && ● access_count[i] > total_access_count/N*t_hot – NEUTRAL: ● otherwise

Hot/Cold Detection Algorithm ● Long Term History – We keep a N element circular history to keep the state of the cache pages for last N epochs. – In our simulations, we took N = 4 ● Based on the Long Term History, we determine when to classify a page as HOT or COLD. ● If number of HOT pages and number of COLD pages is non-zero, then call the re- coloring algorithm.

Re-coloring Algorithm ● For each page, keep track of the virtual pages which access the cache page most frequently. ● From each HOT page, move all but one frequently accessed virtual page to a COLD page in cache. ● We exit when number of HOT pages or COLD pages becomes zero – or when a maximum number of pages have been re- colored.

Re-Coloring ● Ideally, the page in memory should be moved. ● Following the idea of Calder et al, we can achieve the same effect by modifying the TLB. ● We simulated this in SMTSIM by implementing a address remap module.

Processor IC DCDC MAF L2 Cache Processor IC DCDC MAF L2 Cache Translation Unit Hot/Cold Detection Remap Changes to SMTSIM

Result

Future Work ● This experiments did not produce very good results. But there are further scopes of improvements. ● Many more design choices are there for the re-coloring algorithm. Our choice was a basic one. ● Based on the HOT/COLD information, the effect of a skewed cache indexing may also be investigated.