A Graph Theoretic Approach to Cache-Conscious Placement of Data for Direct Mapped Caches Mirza Beg and Peter van Beek University of Waterloo June 6 2010.

Slides:



Advertisements
Similar presentations
Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Explicit Preemption Placement for Real- Time Conditional Code via Graph Grammars and Dynamic Programming Bo Peng, Nathan Fisher, and Marko Bertogna Department.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
GOLOMB RULERS AND GRACEFUL GRAPHS
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
Train DEPOT PROBLEM USING PERMUTATION GRAPHS
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Perfect Graphs Lecture 23: Apr 17. Hard Optimization Problems Independent set Clique Colouring Clique cover Hard to approximate within a factor of coding.
Vertex Cover, Dominating set, Clique, Independent set
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
NP-Complete Problems Problems in Computer Science are classified into
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
CSE 421 Algorithms Richard Anderson Lecture 6 Greedy Algorithms.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture NP-Completeness Jan Maluszynski, IDA, 2007
1 Internet Networking Spring 2002 Tutorial 6 Network Cost of Minimum Spanning Tree.
ARCHEOLOGICAL SERIATION AND INTERVAL GRAPHS
The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.
Approximation Algorithms
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Outline Introduction The hardness result The approximation algorithm.
Applications of types of SATs Arash Ahadi What is SAT?
Network Aware Resource Allocation in Distributed Clouds.
1 Treewidth, partial k-tree and chordal graphs Delpensum INF 334 Institutt fo informatikk Pinar Heggernes Speaker:
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 11, 2014.
Minimizing Cache Usage in Paging Alejandro López-Ortiz, Alejandro Salinger University of Waterloo.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Memory Allocation of Multi programming using Permutation Graph By Bhavani Duggineni.
1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
CS774. Markov Random Field : Theory and Application Lecture 02
Introduction to Graph Theory
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Graph Coloring. Vertex Coloring problem in VLSI routing channels Standard cells Share a track Minimize channel width- assign horizontal Metal wires to.
Trading Cache Hit Rate for Memory Performance Wei Ding, Mahmut Kandemir, Diana Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli The Pennsylvania State.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 30, 2014.
Register Allocation Ajay Mathew Pereira and Palsberg. Register allocation via coloring of chordal graphs. APLOS'05Register allocation via coloring of chordal.
Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:
Hongyu Liang Institute for Theoretical Computer Science Tsinghua University, Beijing, China The Algorithmic Complexity.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Graph Coloring and Applications
An introduction to chordal graphs and clique trees
Mathematical Foundations of AI
COMPLEXITY THEORY IN PRACTICE
Parallel Programming By J. H. Wang May 2, 2017.
Hard Problems Introduction to NP
What is the next line of the proof?
Approximation Algorithms
ICS 353: Design and Analysis of Algorithms
Objective of This Course
Register Allocation via Coloring of Chordal Graphs
Presentation transcript:

A Graph Theoretic Approach to Cache-Conscious Placement of Data for Direct Mapped Caches Mirza Beg and Peter van Beek University of Waterloo June

Outline Motivation and Introduction Background Data Assignment to Cache Evaluation Related Work Conclusions 2

Outline Motivation and Introduction Background Data Assignment to Cache Evaluation Related Work Conclusions 3

Motivation Cache is a bottleneck for data access Important to improve cache locality instrumental for program performance Important to improve theoretical understanding of cache management 4

Introduction Direct mapped cache Each block has only one place it can appear in cache Types of cache misses Compulsory misses Conflict misses Capacity misses Goal is to minimize number of conflict misses Place data in memory such that it is mapped to the cache to cause minimum number of misses 5

Problem 6 Cache o1o1 o2o2 onon Memory objects Object access sequence < o 1, o 2, o 4, o 1, o 4, o 2, o 1, o 4, o 5, o 4, o 4, o 1, o 3, o 2, o 4, …… o n-1, o 9, o n, o 1, o 4 > < o 1, o 2, o 4, o 1, o 4, o 2, o 1, o 4, o 5, o 4, o 4, o 1, o 3, o 2, o 4, …… o n-1, o 9, o n, o 1, o 4 >

Problem A direct mapped cache with k lines O: the set of all objects accessed during the program run  : sequence of object accesses Defn: Given O and , find a mapping f: O  {1, …, k} such that the number of conflict misses is minimized The problem in general is NP-complete 7

Limiting Assumptions Direct mapped cache Object sizes equal to the size of line in cache Accurate profile information Known sequence of object accesses 8

Outline Motivation and Introduction Background Data Assignment to Cache Evaluation Related Work Conclusions 9

Graph Theory Interval Graph: An intersection graph of a set of intervals 10

Graph Theory Several NP-hard problems become polynomial because interval graphs have a perfect elimination order (PEO) The PEO of an interval graph can be determined in linear time For a graph with a PEO the chromatic number can be determined in linear time All maximal cliques can be found in linear time Induced sub-graphs of interval graphs are also interval graphs 11

Outline Motivation and Introduction Background Data Assignment to Cache Evaluation Related Work Conclusions 12

Program Profile 13 What are the events in the sequence ? Memory reads are recorded Where the interval starts and stops ? From the first read to the last Memory reads are recorded The interval begins at the first read and ends at the last o1o1 o2o2 o4o4 o3o3

Conflict Graph Each vertex represents a memory object Each edge weight represents the number of conflict misses caused if the objects connected to the edge are assigned to the same line in the cache 14  = ( o 1, o 2, o 3, o 1, o 2, o 3, o 1, o 2, o 1, o 4, o 1, o 4, o 1, o 4, o 1, o 4, o 3, o 2, o 5, o 2, o 5, o 2, o 5, o 2, o 5, o 2, o 4, o 5, o 5, o 4, o 5, o 4, o 5 ) o1o1 o2o2 o3o3 o4o4 o5o  = ( o 1, o 2, o 3, o 1, o 2, o 3, o 1, o 2, o 1, o 4, o 1, o 4, o 1, o 4, o 1, o 4, o 3, o 2, o 5, o 2, o 5, o 2, o 5, o 2, o 5, o 2, o 4, o 5, o 5, o 4, o 5, o 4, o 5 )

Theoretical Results (I) Theorem: Given a sequence of data accesses, the conflict graph of a program is an interval graph Results for interval graphs apply to conflict graphs Theorem: Chromatic number of the conflict graph can be determined in linear time If the number of cache lines is greater than the chromatic number of the conflict graph then the objects can be optimally assigned to cache lines for zero conflict misses 15

Theoretical Results (II) Corollary: Chromatic number for conflict graph gives number of cache lines required to achieve zero conflict misses Corollary: For a placement that creates no cache conflicts, the sum of edges in the sub- graph for objects assigned to same cache line is zero, for each cache line No two objects having an edge between them in conflict graph are assigned to same cache line 16

Assignment of Objects to Cache 17

Assignment of Objects to Cache 18 o1o1 o2o2 o3o3 o4o4 o5o o1o1 o2o2 o3o3 o4o4 o5o5 o1o1 o2o2 o 34 o5o

Outline Motivation and Introduction Background Data Assignment to Cache Evaluation Related Work Conclusions 19

Evaluation A small set of benchmarks (C) A comparison of the number of misses between Cache conscious placement algorithm (ours) Modulo assignment (current practice) Simulation of the cache Assumes freedom to rearrange objects in memory (e.g. array elements) 20

Profiler Record the access sequence of memory objects for a complete run of the program Instrument the code for memory allocations and accesses Output: an ordered sequence of memory objects 21

Evaluation 22 Bisort from the Olden benchmark

Evaluation 23 Cachekiller

Outline Motivation and Introduction Background Data Assignment to Cache Evaluation Related Work Conclusions 24

Related Work First to study hardness of the offline cache conscious data placement [Thabit 1982] A complete framework for profile bases cache conscious data placement [Calder et al. 1998] Even if there are a small number of misses, cache conscious data placement cannot be reasonably approximated [Petrank and Rawitz 2002] Optimal register allocation for SSA-form programs [Hack and Goos 2006] 25

Outline Motivation and Introduction Background Data Assignment to Cache Evaluation Related Work Conclusions 26

Conclusions Identified relationship between conflict graphs and interval graphs Gives us a better understanding of direct- mapped caches An offline algorithm for cache conscious data placement for direct mapped caches an optimal assignment if there exists one with no conflict misses a heuristic to decrease the size of the conflict graph, otherwise 27

Questions & Comments 28