CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879.

Slides:



Advertisements
Similar presentations
Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto.
Advertisements

Comparison and Evaluation of Back Translation Algorithms for Static Single Assignment Form Masataka Sassa #, Masaki Kohama + and Yo Ito # # Dept. of Mathematical.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Program Representations. Representing programs Goals.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Mitigating the Compiler Optimization Phase- Ordering Problem using Machine Learning.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Compiler Optimization-Space Exploration Adrian Pop IDA/PELAB Authors Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science John Cavazos Architecture and Language Implementation Lab Thesis Seminar University.
Lecture 1CS 380C 1 380C Last Time –Course organization –Read Backus et al. Announcements –Hadi lab Q&A Wed 1-2 in Painter 5.38N –UT Texas Learning Center:
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.
CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.
Precision Going back to constant prop, in what cases would we lose precision?
L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CISC Machine Learning for Solving Systems Problems Arch Explorer Lecture 5 John Cavazos Dept of Computer & Information Sciences University of Delaware.
Ioana Burcea Initial Observations of the Simultaneous Multithreading Pentium 4 Processor Nathan Tuck and Dean M. Tullsen.
Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *
CS 395 Last Lecture Summary, Anti-summary, and Final Thoughts.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
A Data Cache with Dynamic Mapping P. D'Alberto, A. Nicolau and A. Veidenbaum ICS-UCI Speaker Paolo D’Alberto.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.
CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
RISC Architecture RISC vs CISC Sherwin Chan.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Dependence Analysis and Loop Transformations.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Intelligent Compilation John Cavazos Computer & Information Sciences Department.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science John Cavazos J Eliot B Moss Architecture and Language Implementation Lab University.
Memory-Aware Compilation Philip Sweany 10/20/2011.
Review for Quiz-1 Applied Operating System Concepts Patterson & Hennessy Chap.s 1,2,6,7 ECE3055b, Spring 2005
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Code Optimization.
Control Unit Lecture 6.
Computer Architecture Principles Dr. Mike Frank
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Improving java performance using Dynamic Method Migration on FPGAs
Pipelining and Vector Processing
Benjamin Goldberg Compiler Verification and Optimization
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Presented by: Divya Muppaneni
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Artificial Intelligence Lecture No. 28
In Search of Near-Optimal Optimization Phase Orderings
Predicting Unroll Factors Using Supervised Classification
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
CSc 453 Final Code Generation
Introduction to Optimization
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Chapter 4 The Von Neumann Model
Presentation transcript:

CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware Machine Learning applied to Static Compilation Lecture 2

CISC Machine Learning for Solving Systems Problems Hardware constantly changing Heterogeneous Processors in Gaming Devices Massively Parallel Graphics Processing Units Heterogeneous Processors In Supercomputers Powerful Embedded Devices

CISC Machine Learning for Solving Systems Problems Compilers changing slower ► In the early days of compilers … 1957: The FORTRAN Automatic Coding System Front End Front End Middle End Back End Index Optimiz’n Code Merge Flow Analysis Register Allocation Final Assembly

CISC Machine Learning for Solving Systems Problems ► And 50 years later… ► Compilers have not changed much ► Inadequate support for modern architectures Compiler changing slower Front End Front End Middle End Back End High-Level Optimiz’n Mid-Level Optimiz’n Flow Analysis Register Allocation Final Assembly 2007: Typical Compiler

CISC Machine Learning for Solving Systems Problems Proposed Solution ► Intelligent Compilers ► Using AI (i.e., machine learning) techniques ► Learn to optimize ► Specialize to architecture Feedback Intelligent Compiler (Ex: Neural Networks, Decision Trees, Reinforcement Learning) Applications Architecture

CISC Machine Learning for Solving Systems Problems Intelligent Compilers? ► Compiler improves itself ► Showing it examples of behaviour we want. Unroll Tiling Fusion Fission

CISC Machine Learning for Solving Systems Problems Applying Machine Learning ► Inputs ► Program characterization ► Outputs ► Set of optimizations to apply

CISC Machine Learning for Solving Systems Problems Case Study ► Whole Program Optimization ► Paper: Rapidly Selecting Good Compiler Optimizations using Performance Counters, Cavazos et al., CGO 2007

CISC Machine Learning for Solving Systems Problems Whole Program Optimization ► Automatically construct “model” ► Map performance counters to good opts ► Model predicts optimizations to apply ► Use performance counter characterization

CISC Machine Learning for Solving Systems Problems Inputs : Performance Cntrs ► Mnemonic Description Avg Values ► FPU_IDL (Floating Unit Idle) ► VEC_INS (Vector Instructions) ► BR_INS (Branch Instructions) ► L1_ICH (L1 Icache Hits) Application

CISC Machine Learning for Solving Systems Problems Outputs : Optimizations Optimization Level Opt Level O0 Opt Level O1 Opt Level O2 Optimizations Controlled Branch Opts Low Constant Prop / Local CSE Reorder Code Copy Prop / Tail Recursion Static Splitting / Branch Opt Med Simple Opts Low While into Untils / Loop Unroll Branch Opt High / Redundant BR Simple Opts Med / Load Elim Expression Fold / Coalesce Global Copy Prop / Global CSE SSA

CISC Machine Learning for Solving Systems Problems Training Compiler ► Present a training database of ► Characteristics of application ► “Right” optimizations to use Unroll Tiling Fusion Fission Unroll Tiling Fusion Fission (.91,.32,.40,51) (.61,.12,.50,81) Model

CISC Machine Learning for Solving Systems Problems Using Trained Compiler ► Present characteristics of “new” application ► Compiler predicts how to optimize it (.81,.35,.40,69) Model

CISC Machine Learning for Solving Systems Problems Performance Counters

CISC Machine Learning for Solving Systems Problems Characterization of 181.mcf ► Perf cntrs relative to several benchmarks

CISC Machine Learning for Solving Systems Problems Characterization of 181.mcf ► Perf cntrs relative to several benchmarks Problem: Greater number of memory accesses per instruction than average

CISC Machine Learning for Solving Systems Problems Training PC Model Compiler and

CISC Machine Learning for Solving Systems Problems Programs to train model (different from test program). Compiler and Training Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Baseline runs to capture performance counter values. Compiler and Training Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Obtain performance counter values for a benchmark. Compiler and Training Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Best optimizations runs to get speedup values. Compiler and Training Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Best optimizations runs to get speedup values. Compiler and Training Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Perform training on a large set of programs. Compiler and Training Perf Cntr Model

CISC Machine Learning for Solving Systems Problems New program interested in obtaining good performance. Compiler and Using Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Baseline run to capture performance counter values. Compiler and Using Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Input performance counter values to model. Compiler and Using Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Model predicts optimization sequences to apply Compiler and Using Perf Cntr Model

CISC Machine Learning for Solving Systems Problems Model can predict multiple optimization sequences to try. Compiler and Using Perf Cntr Model

CISC Machine Learning for Solving Systems Problems ► Variation of ordinary regression ► Inputs ► Continuous, discrete, or a mix ► 60 performance counters ► All normalized to cycles executed ► Ouputs ► Number between 0 and 1 ► Probability an optimization is beneficial Logistic Regression

CISC Machine Learning for Solving Systems Problems ► Pathscale industrial-strength compiler ► Compare to highest opt level (-Ofast) ► Orchestrate 121 compiler optimizations ► AMD Athlon processor ► Real machine; Not simulation ► 57 benchmarks ► SPEC (95, 2000), MiBench, Polyhedral Experimental Methodology

CISC Machine Learning for Solving Systems Problems ► RAND ► Randomly select 500 optimization seqs ► Combined Elimination (CE) ► State-of-the-art search technique [CGO ‘06] ► Performance Counter (PC) Model Evaluated Search Strategies

CISC Machine Learning for Solving Systems Problems PCModel vs CE 9 benchmarks over 20% improvement and 17% on average!

CISC Machine Learning for Solving Systems Problems PCModel vs CE Obtained over 25% improvement on 6 benchmarks!

CISC Machine Learning for Solving Systems Problems PCModel vs CE On average, CE obtains 9% and PC Model 17% over -Ofast

CISC Machine Learning for Solving Systems Problems Performance vs Evaluations

CISC Machine Learning for Solving Systems Problems Performance vs Evaluations PC Model (17%)

CISC Machine Learning for Solving Systems Problems Performance vs Evaluations Random (17%)

CISC Machine Learning for Solving Systems Problems Performance vs Evaluations Combined Elim (12%)

CISC Machine Learning for Solving Systems Problems CE worse than RAND? ► Combined Elimination ► Easily stuck in local minima ► RAND and PC Model ► Probabilistic techniques ► Depends on distribution of good points ► Not susceptible to local minima

CISC Machine Learning for Solving Systems Problems Static vs Dynamic Features

CISC Machine Learning for Solving Systems Problems ► Using machine learning successful ► Out-performs production-quality compiler ► Using performance counters ► Determines automatically important characteristics ► Optimizations applied only when beneficial Conclusions

CISC Machine Learning for Solving Systems Problems ► Use performance counters to predict “how” and “when” to apply an optimization ► Individual Opts: E.g., how many times to unroll a loop? ► Optimization sequences: Which opts to apply? ► Malware identification ► Can malware be identified by performance counter characteristics? Example Projects