Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder Used with permission of author.

Slides:

Advertisements

Similar presentations

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.

Experiments and Variables

Fast Algorithms For Hierarchical Range Histogram Constructions

Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.

1 Software Engineering I The Requirements Phase Adrian Als & Paul Walcott.

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform.

Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.

Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.

CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.

Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.

Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark.

Workload Characteristics and Representative Workloads David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA.

L16: Micro-array analysis Dimension reduction Unsupervised clustering.

Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder.

Cluster Analysis (1).

What is Cluster Analysis?

Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.

Variational Path Profiling Erez Perelman*, Trishul Chilimbi †, Brad Calder* * University of Califonia, San Diego †Microsoft Research, Redmond.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

Clustering Unsupervised learning Generating “classes”

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Computer Vision James Hays, Brown

Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.

Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.

Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.

© 2003, Carla Ellis Experimentation in Computer Systems Research Why: “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you.

Particle Filters for Shape Correspondence Presenter: Jingting Zeng.

1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”

© 2003, Carla Ellis Self-Scaling Benchmarks Peter Chen and David Patterson, A New Approach to I/O Performance Evaluation – Self-Scaling I/O Benchmarks,

Unsupervised Learning. Supervised learning vs. unsupervised learning.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.

Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.

Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

© 2003, Carla Ellis Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final.

CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:

Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental.

1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.

Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.

An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability.

Best detection scheme achieves 100% hit detection with

Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.

An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

© 2003, Carla Ellis Model Vague idea “groping around” experiences Hypothesis Initial observations Experiment Data, analysis, interpretation Results & final.

PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,

Data Mining – Algorithms: K Means Clustering

Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

GLBIO ML workshop May 17, 2016 Ivan Kryukov and Jeff Wintersinger

Cluster Analysis II 10/03/2012.

Probability and Statistics

CSCI1600: Embedded and Real Time Software

Phase Capture and Prediction with Applications

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Text Categorization Berlin Chen 2003 Reference:

CSCI1600: Embedded and Real Time Software

Phase based adaptive Branch predictor: Seeing the forest for the trees

Probability and Statistics

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Presentation transcript:

Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder Used with permission of author

ASPLOS : Sherwood et. al.2 Characterizing Program Behavior Ideal: To understand the effects of cycle-level events on full program execution Challenge: To achieve this without doing complete detailed simulation How: Build a high-level model of program behavior that can be used in conjunction with limited detailed simulation

ASPLOS : Sherwood et. al.3 Goals The goals of this research are: –To create an automatic system that is capable of intelligently characterizing time- varying program behavior –To provide both analytic and software tools to help with program phase identification –To demonstrate the utility of these tools for finding places to simulate (SimPoints) – Without full program detailed simulation

ASPLOS : Sherwood et. al.4 Our Approach Programs are neither –Completely Homogenous –nor Totally Random Instead they are quite structured Discover this structure The key is the code that is executing –the code determines the program behavior

ASPLOS : Sherwood et. al.5 Large Scale Behavior (gzip) IPC Energy DL1 IL1 L2 bpred

ASPLOS : Sherwood et. al.6 Some Definitions Interval is –A set of instructions that execute one after the other in program order –100 Million Instructions Phase is –A set of intervals with very similar behavior –Regardless of temporal adjacency

ASPLOS : Sherwood et. al.7 Outline Examining the Programs Finding Phases Automatically Application to Efficient Simulation Conclusions

ASPLOS : Sherwood et. al.8 Fingerprinting Intervals Fingerprint each interval in program –Enabling us to build high level model Basic Block Vector [PACT’01] –Tracks the code that is executing –Long sparse vector –1 dimension per static basic block –Based on instruction execution frequency

ASPLOS : Sherwood et. al.9 Basic Block Vectors ID: BB Exec Count: weigh by Block Size: = Normalize to 1 = For each interval: One BBV for each interval We can now compare vectors Start with simple manual analysis –Compare all N 2 pairs of intervals Enter the Similarity Matrix…

ASPLOS : Sherwood et. al.10 Distance Between 2 BBVs  a i -b i   i=1 D  a i -b i  i=1 D Euclidean Distance (a,b) = Manhattan Distance (a,b) =

ASPLOS : Sherwood et. al.11 Similarity Matrix Element (x,y) is distance between BBVs of 2 intervals Compare N 2 intervals Executed Instructions on Diagonal axis To compare 2 points go horizontal from one and vertically from the other Darker points indicate similar vectors Clearly shows the phase-behavior Gzip-graphic

ASPLOS : Sherwood et. al.12 Time Varying Behavior Gzip-graphic IPC and L1 data cache miss rate as a % of max value of that metric

ASPLOS : Sherwood et. al.13 A More Complex Matrix - gcc Still much structure Dark boxes show phase-behavior Boxes in interior show recurring phases Strong diagonal line indicates first half is similar to second half Manual inspection is not feasible or scalable

ASPLOS : Sherwood et. al.14 Outline Examining the Programs Finding Phases Automatically Application to Efficient Simulation Conclusions

ASPLOS : Sherwood et. al.15 Finding the Phases Basic Block Vector is a point in space The problem is to find groups of vectors/points that are all similar –Making sure that all points in a group are similar to one another –And ensuring all points that are different, are put into different groups This is a Clustering Problem A Phase is a Cluster of BBVectors

ASPLOS : Sherwood et. al.16 Phase-finding Algorithm I.Profile Program and track BB Vectors II.Use the K-means algorithm to find clusters in the data for many different values of K III.Score the likelihood of each clustering IV.Pick the best clustering

ASPLOS : Sherwood et. al.17 Improving Performance K-means requires many manipulations –Basic Block Vectors are very long > 100,000 for gcc; 800,000 for microsoft apps –Need to make the Vectors smaller Still preserve relative distances

ASPLOS : Sherwood et. al.18 Random Projection Multiply the vector by a random matrix Can safely reduce down to 15 dimensions Reduce run-time from days to minutes x 0,0 … x 0,nbb X int,0 … x int,nbb x M 0,0 …M 0,15 M nbb,0 …M nbb, x 0,0 … x 0,15 X int,0 … x int, = rand(-1, 1)

ASPLOS : Sherwood et. al.19 K-means and Scoring Initialize by randomly choosing k points as cluster centers Iterate until convergence (no membership changes): –For each data point, compare distance to each center and assign to “closest” cluster –For each cluster, make new center as centroid of its member points Do this for k = 1 to 10 Use BIC (Bayesian Information Criterion) to score each of the 10 resulting clusterings –Choose one that is at least 90% of spread in BICs –High BIC score predicted performance metric with low variance.

ASPLOS : Sherwood et. al.20 Example: gzip Revisited IPC Energy DL1 IL1 L2 bpred

ASPLOS : Sherwood et. al.21 gzip – Phases Discovered IPC Energy DL1 IL1 L2 bpred

ASPLOS : Sherwood et. al.22 gcc - A Complex Example IPC Energy DL1 IL1 L2 bpred

ASPLOS : Sherwood et. al.23 gcc – Phases Discovered IPC Energy DL1 IL1 L2 bpred

ASPLOS : Sherwood et. al.24 Outline Examining the Programs Finding Phases Automatically Application to Efficient Simulation Conclusions

ASPLOS : Sherwood et. al.25 Efficient Simulation Simulating to completion not feasible –Detailed simulation on SPEC takes months –Cycle level effects can’t be ignored To reduce simulation time –Simulate only a subset of the program at cycle-level accuracy –What subset you pick is very important For accuracy and efficiency

ASPLOS : Sherwood et. al.26 Simulation Options Simulate Blind: no estimate of accuracy Single Point: problem with complex programs that have many phases Random Sample: high accuracy, but many sections of similar code, you will be doing a lot of redundant work Choose Multiple Points: by examining the calculated phase information

ASPLOS : Sherwood et. al.27 Multiple SimPoints Perform phase analysis For each phase in the program –Pick the interval most representative of the phase (smallest Euclidean distance from centroid) –This is the SimPoint for that phase Perform detailed simulation for SimPoints Weigh results for each SimPoint –According to the size of the phase it represents

ASPLOS : Sherwood et. al.28 Sample of SPEC2000 SimPoints Intervals of 100M instr weight SimPoint

ASPLOS : Sherwood et. al.29 Results – Average Error w.r.t full SimpleScalar simulation, integer pgms

ASPLOS : Sherwood et. al.30 An Aside: High Ave. Error Rates ? For L2 and Instr Cache (esp. for integer benchmarks), multiple SimPoints errors were high Why? –Using arithmetic mean of errors –Some programs have small number of misses and errors for them serve as outliers

ASPLOS : Sherwood et. al.31 Results – Max Error

ASPLOS : Sherwood et. al.32 Outline Examining the Programs Finding Phases Automatically Application to Efficient Simulation Conclusions

ASPLOS : Sherwood et. al.33 Conclusions Gap between –Cycle level events –Full program effects Exploit large scale structure –Provide high level model –Find the model with no detail simulation –In conjunction with limited detail simulation

ASPLOS : Sherwood et. al.34 Conclusions Our Strategy –Take advantage of structure found in program –Summarize the structure in the form of phases –Find phases using techniques from clustering Use this for doing efficient simulation –High accuracy –With orders of magnitude less time

Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload SPEC2000 Micro- benchmark programs SPEC2000 intervals Basic Block Vectors SIMPOINTS monitor analysis Fast forward Synthetic traces “Real” workloads Made-up © 2003, Carla Ellis Data sets

Vague idea “groping around” experiences Hypothesis Initial observations Discussion: Destination Initial Hypothesis Pre-proposal 1: Sketch out what information you would need to collect (or have already gathered) in a “groping around” phase to get from a vague idea to the hypothesis stage for your planned project © 2003, Carla Ellis