Rohan Yadav and Charles Yuan (rohany) (chenhuiy)

Slides:

Advertisements

Similar presentations

Brainy: Effective Selection of Data Structures. Why data structure selection is important How to choose the best data structure for a specific application.

Advertisements

Random Forest Predrag Radenković 3237/10

A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.

8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Fundamentals of Python: From First Programs Through Data Structures

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.

Run-Time Storage Organization

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Compiler Optimization-Space Exploration Adrian Pop IDA/PELAB Authors Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David.

CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.

Query Processing Presented by Aung S. Win.

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

System Software for Parallel Computing. Two System Software Components Hard to do the innovation Replacement for Tradition Optimizing Compilers Replacement.

Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,

Task Graph Scheduling for RTR Paper Review By Gregor Scott.

QCAdesigner – CUDA HPPS project

CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.

Machine Learning in Compiler Optimization By Namita Dave.

Outline Announcements: –HW II Idue Friday! Validating Model Problem Software performance Measuring performance Improving performance.

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Automatic Feature Generation for Machine Learning Based Optimizing Compilation Hugh Leather, Edwin Bonilla, Michael O'Boyle Institute for Computing Systems.

Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.

Introduction to Machine Learning, its potential usage in network area,

Ensemble Classifiers.

TensorFlow– A system for large-scale machine learning

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Chapter 7. Classification and Prediction

Ioannis E. Venetis Department of Computer Engineering and Informatics

C.-S. Shieh, EC, KUAS, Taiwan

Chapter 9 – Real Memory Organization and Management

Introduction to Algorithms

CS 153: Concepts of Compiler Design November 28 Class Meeting

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University

Static Optimality and Dynamic Search Optimality in Lists and Trees

Informed Search and Exploration

Department of Electrical & Computer Engineering

Lecture 5: GPU Compute Architecture

CSCI1600: Embedded and Real Time Software

Lecture 23: Feature Selection

Data Mining Practical Machine Learning Tools and Techniques

Arrays and Linked Lists

Lecture 5: GPU Compute Architecture for the last time

Announcements Homework 3 due today (grace period through Friday)

Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Closure Representations in Higher-Order Programming Languages

Operating System Chapter 7. Memory Management

Ensemble learning Reminder - Bagging of Trees Random Forest

COMP755 Advanced Operating Systems

CSCI1600: Embedded and Real Time Software

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Reinforcement Learning (2)

Multidisciplinary Optimization

Presentation transcript:

Rohan Yadav and Charles Yuan (rohany) (chenhuiy) Portability Rohan Yadav and Charles Yuan (rohany) (chenhuiy)

Portability in Multiple Contexts Improving legacy code Architecture Adaptive Variant Selection Architecture Specific Optimization

Improving Legacy Code COBOL Big idea is to reconstruct “High Level Information” from old binaries Use HLI to perform new optimizations without source Many of these are only possible due to COBOL… Stack and Heap variables are offset relative to static locations Cobol runtime functions are all located in particular memory locations Constant Pool is similar

Making BCD’s not suck These operations are incredibly slow Author’s identify and remove as many BCD op’s as possible Store intermediate results in registers Replace runtime BCD functions with better implementations

Results

Architecture Adaptive Code Different GPU architectures support operations better than others Select algorithm implementation based on these differences using Machine Learning Don’t want to train on a new architecture every time!

Approach Collect device features (core count, clock rates, atomic performance … ) Find the features most relevant to variant’s performance Can try using all the features (doesn’t perform well) Profile kernels to find which device features are most important for that kernel Further limit search space by performing cross validation on target architecture

Approach Cont. Train on a set of source architectures Use data collected to create a model on the target architecture

Results

Architecture Specific Code Compilers rely on architecture-specific information Cost models, memory models, optimization decisions Models are very complicated

Obtaining models Large number of programs run many times and analyzed Samples are redundant and excessive Experts can write heuristics to shortcut the process

Obtaining models Problem 1: writing heuristics takes years (and millions)! Problem 2: hardware is changing (more heterogeneous) all the time!

Big Idea #1 Machine Learning

How to use ML? Iterative compilation - automatically deriving heuristics by training predictors to select optimizations Can outperform expert-written heuristics! Still a problem: random search wastes a ton of time

Big Idea #2 Active Learning

What does that mean? Don't just randomly run programs and then train Identify where most optimization is possible and move in that direction! Key objective: minimize #samples per example

Sequential Analysis Candidate set: possible next example to use for training Traditionally: keep training set disjoint from candidate set New algorithm: in main loop, consider not only a new example but whether an old one is useful again

good quality data to start one observation at a time previous data stays in candidate set repeat until complete

Algorithmic Tools Problem: need to estimate uncertainty of prediction Solution 1: Gaussian Process (GP) But GP is cubic time. Solution 2: Dynamic Trees

partition state space into hyperrectangles of similar outputs maintain decision tree of hyperrectangle nodes stochastically choose one of three manipulations result: no pruning at end, resistance to noisy data!

Evaluation Task: find optimal set of compilation parameters for program Loop unrolling, cache tiling, register tiling SPAPT suite of search problems for automatic performance tuning Stencil codes, linear algebra, other HPC problems Compare against baseline ML approach

Critique Maybe really an ML paper "compiler" shows up only 10 times in the paper 6 times before introduction ends!

The Good Presents a convincing alternative to random search and traditional techniques Demonstrates that the state-of-the-art ML-based approaches have a big efficiency gap to overcome Broadly applicable to compiler optimizations across domains (parallelism, performance, memory)

The Unclear How good are the optimized outputs? How well does the convergence generalize to other types of programs? How does the learner scale if it must revisit old samples frequently?

Discussion