Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Slides:



Advertisements
Similar presentations
1 Automating Auto Tuning Jeffrey K. Hollingsworth University of Maryland
Advertisements

Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Programmability Issues
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
IBM Labs in Haifa © 2005 IBM Corporation Adaptive Application of SAT Solving Techniques Ohad Shacham and Karen Yorav Presented by Sharon Barner.
Praveen Yedlapalli Emre Kultursay Mahmut Kandemir The Pennsylvania State University.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Telescoping Languages: A Compiler Strategy for Implementation of High-Level Domain-Specific Programming Systems Ken Kennedy Rice University.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Combining Static and Dynamic Data in Code Visualization David Eng Sable Research Group, McGill University PASTE 2002 Charleston, South Carolina November.
COMP205 Comparative Programming Languages Part 1: Introduction to programming languages Lecture 3: Managing and reducing complexity, program processing.
Communication in Distributed Systems –Part 2
Course Instructor: Aisha Azeem
Domain-Specific Software Engineering Alex Adamec.
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL Emmanuel OSERET Performance Analysis Team, University.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
Overview of the Database Development Process
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
A Framework for Automated Web Application Security Evaluation
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
1 Module Objective & Outline Module Objective: After completing this Module, you will be able to, appreciate java as a programming language, write java.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
System Software for Parallel Computing. Two System Software Components Hard to do the innovation Replacement for Tradition Optimizing Compilers Replacement.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Specialization Tools and Techniques for Systematic Optimization of System Software Presented By: Ashwini Kulkarni Operating Systems Winter 2006.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
MODPI: A parallel MOdel Data Passing Interface for integrating legacy environmental system models A. Dozier, O. David, Y. Zhang, and M. Arabi.
CS 598 Scripting Languages Design and Implementation 14. Self Compilers.
A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke Presented by: Xia Cheng.
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Chapter – 8 Software Tools.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
TTCN-3 Testing and Test Control Notation Version 3.
Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:
In Search of the Optimal WHT Algorithm J. R. Johnson Drexel University Markus Püschel CMU
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Computer Engg, IIT(BHU)
Princeton University Spring 2016
Performance Optimization for Embedded Software
Optimizing Transformations Hal Perkins Winter 2008
Maximizing Speedup through Self-Tuning of Processor Allocation
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Presentation transcript:

Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

 Motivation  General choices for adaptive optimization  ADAPT The Architecture The Language An example  Results

There’s only so much optimization that can be performed at compile-time.  Have to generate code for generic system models – make compile-time assumptions that may be sensitive to input, unknown till runtime.  Convergence of technologies – difficult to generate common binary to exploit individual system characteristics.

Possible solution? “Use of adaptive and dynamic optimization paradigms, where optimization is performed at runtime when complete system and input knowledge is available.”

 Choose from statically generated code- variants +Easy -May not result in max possible optimization -Can result in code explosion  Parameterization +Single copy of source -May still not result in max possible optimization  Dynamic compilation +Complete input and system knowledge – max optimization possible -Considerable runtime overhead

 Automated De-Coupled Adaptive Program Optimization  Generic framework, which leverages existing tools  Uses a domain-specific language, AL, by which adaptive techniques can be specified …

 Supports dynamic compilation and parameterization  Enables optimizations through “runtime sampling”  Facilitates an iterative modification and search approach

3 functions of a dynamic/adaptive optimization system  Evaluate effectiveness of particular optimization for current input & system information  Apply optimization if profitable  Re-evaluate applied optimizations and tune according current runtime conditions

Runtime system consists of:  Modified version of application  Remote optimizer has source code description of target machine stand-alone tools & compilers  Local optimizer agent of remote-optimizer on system detects hot-spots tracks multiple interval contexts (here, loop bounds) runs in separate thread Optimization and execution truly asynchronous

 LO invokes RO, when hotspot detected  RO tunes the interval using available tools, according to user-specified heuristics  RPC returns  If new code available, dynamically link to application as the new best/experimental version, depending on RO’s message

 Candidate code sections have 2 control flow paths through best known version through experimental version Each of these can be replaced dynamically  Flag indicates which version to execute  Monitor experimental versions of each context collected data used as feedback if better, swap with best known version

Optimization process outside critical path/decoupled from execution

 ADAPT Language (AL) *  Features: Uses an LL1 grammar => simple parser Domain specific language with C-style format Defines reserved words that at runtime contain useful input data and system information * “A full description of ADAPT language is beyond the scope of this paper”, and by extension, this presentation.

 Initialize some variables  Constraints  Interface to tool to be used  This block defines the heuristic

StatementDescription constraint(compile- time constraint) Supplies a compile-time constraint apply_spec (condition,type, syntax[,params]) A description of a tool or flag collect (event list) execute; Initiates the monitoring of an experimental code version mark_as_best Specifies that the code variant that would be generated under the current runtime conditions is a new best known version end_phase Denotes the end of an optimization phase

 Test Machines: 6 core Sun ULTRA Enterprise 4000, single-core Pentium II Linux workstation ExperimentResult Useless Copying - Run a dynamically compiled version of code without applying any optimization Less than ~5% Some cases show a speed-up! Specialization – Loop bounds replaced as constants by their runtime value. Average improvement: E4000: 13.6% Pentium: 2.2% Flag Selection – Experiment with various combinations of compiler flags Average improvement: E4000: 35% Pentium: 9.2% Identified some non-intuitive choices Loop Unrolling – Loop unrolled by factors that evenly divide no. of iterations of innermost loop to a maximum factor of 10. Average improvement: E4000: 18% Pentium: 5% Loop Tiling – Loops deemed appropriate tiled for ½, ¼,.., 1 / 16 of L2 cache size Average improvement: E4000: 13.5% Pentium: 9.8% Parallelization – Loops deemed appropriate by Polaris parallelized Average improvement: E4000: 51.8%

 There’s advantage in doing runtime optimization  Can be applied to general-purpose programs as well  For full-blown runtime optimization, need to move optimization process outside the critical path

if (questions(“?!”) == 1) delay(); THANK_YOU(“Have a great weekend!”);