15-745 Spring 20051 Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun.

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Advertisements

TM 1 ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors Jeffrey Dean Jamey Hicks Carl Waldspurger William Weihl George.
1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates.
Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.
Analysis of and Dynamic Page Remapping Technique to Reduce L2 Misses in an SMT Processor CSE 240B Class Project Spring 2005, UCSD Subhradyuti Sarkar Siddhartha.
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CS752 Decoupled Architecture for Data Prefetching Jichuan Chang Kai Xu.
1 © NOKIA Nokia Research Center / Performance Data Collection: Hybrid Approach Edu Metz, Raimondas Lencevicius Software Performance Architecture.
LRS Progress Report and Action Plan Update to the Profiling Working Group March 30, 2006.
1 Low Overhead Program Monitoring and Profiling Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania {naveen,
Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun.
Analysis of Path Profiling Information Generated with Performance Monitoring Hardware Alex Shye, Matt Iyer, Tipp Moseley, Dave Hodgdon Dan Fay, Vijay Janapa.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Improving Data-flow Analysis with Path Profiles ● Glenn Ammons & James R. Larus ● University of Wisconsin-Madison ● 1998 ● Presented by Jessica Friis.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
Variational Path Profiling Erez Perelman*, Trishul Chilimbi †, Brad Calder* * University of Califonia, San Diego †Microsoft Research, Redmond.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
1 Feedback-directed optimizations with estimated edge profiles from hardware event sampling Open64 workshop, CGO 2008 April 6, 2008 Vinodha Ramasamy, Robert.
Offline Programming to Online using IPS
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms Authous: Al’ecio P. D. Binotto, Carlos.
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *
Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
THE ISSUE Workshop on Air Quality in Cities M. Petrelli - Roma Tre University February 2014 The evaluation of road traffic emissions.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
Eliminating Silent Data Corruptions caused by Soft-Errors Siva Hari, Sarita Adve, Helia Naeimi, Pradeep Ramachandran, University of Illinois at Urbana-Champaign,
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Instance Construction via Likelihood- Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, Instance selection and Construction for Data Mining.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Guiding Ispike with Instrumentation and Hardware (PMU) Profiles CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis
1 Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July.
Machine Learning in Compiler Optimization By Namita Dave.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
A Graph Theoretic Approach to Cache-Conscious Placement of Data for Direct Mapped Caches Mirza Beg and Peter van Beek University of Waterloo June
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Online Subpath Profiling
White-Box Testing.
What we need to be able to count to tune programs
On Using Linearly Priced Timed Automata for Flow Analysis
Improved schedulability on the ρVEX polymorphic VLIW processor
White-Box Testing.
Adaptive Optimization in the Jalapeño JVM
Calpa: A Tool for Automating Dynamic Compilation
Instruction Level Parallelism (ILP)
Hardware Counter Driven On-the-Fly Request Signatures
Study Identification and Selection Process
Presentation transcript:

Spring Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun

Spring Motivation Continuous Optimization Dynamic Optimization Realistic Profiles OptimizeCompileRun Profile

Spring Challenges Automate optimization Low overhead profiling Accurate profiling OptimizeCompileRun ProfilePath Profile Sample Estimate

Spring Project Goals Simulate performance monitoring unit –Like in Pentium 4, Itanium, PPC 970, etc. –Allows sampling of last couple branches Estimate full path profile using samples –Leverage data mining techniques similar to PageRank Validate by doing Superblock formation –Powerful optimization to improve scheduling (especially on VLIW processors) Superblock Formation Run with Simulated PMU Path ProfileSample Data Mining based Path Estimation

Spring Superblock Formation

Spring Project Outline Implement PMU simulator and Superblock optimization as SUIF passes Implement Estimator offline using sampled branch profiles and SUIF CFG sourcelabel instrument c2dil instrumented program superblock optimized program Offline estimator path profile

Spring Completed PMU Simulation: modified HALT profiling lib Initial offline estimator (?) sourcelabel instrument c2dil instrumented program superblock optimized program Offline estimator path profile

Spring Initial Results

Spring Todo Superblock SUIF pass –Have c2dil build hyperblocks using superblocks Refine path estimation heuristics sourcelabel instrument c2dil instrumented program superblock optimized program Offline estimator path profile

Spring Questions or Comments? Anyone have a good scheduler from Assignment 2? –Better scheduler = better comparison of superblock scheduling performance The HALT lib maybe useful –If you need to uniquely label branches, basic blocks, loads, stores, etc. –Insert instrumentation at those points

Spring Extra Slides

Spring Related Work Shye, et al –Used heuristics to rebuild path profiles using PMU partial path samples –Achieved 80-99% path accuracy with <5% overhead –Did not evaluate impact on optimizations Chen, et al –Used PMU partial path sampling to dynamically form hot traces –Used to adapt to dynamic phase transitions