Practical Assignment Sinking for Dynamic Compilers

Slides:



Advertisements
Similar presentations
Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
Advertisements

SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Data Flow Analysis. Goal: make assertions about the data usage in a program Use these assertions to determine if and when optimizations are legal Local:
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
Partial Redundancy Elimination. Partial-Redundancy Elimination Minimize the number of expression evaluations By moving around the places where an expression.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
1 CS 201 Compiler Construction Lecture 5 Code Optimizations: Copy Propagation & Elimination.
More Dataflow Analysis CS153: Compilers Greg Morrisett.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Code Generation Professor Yihjia Tsai Tamkang University.
© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Incremental Path Profiling Kevin Bierhoff and Laura Hiatt Path ProfilingIncremental ApproachExperimental Results Path profiling counts how often each path.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Eliminating Memory References Joshua Dunfield Alina Oprea.
1 CS 201 Compiler Construction Data Flow Analysis.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.
Adaptive Optimization with On-Stack Replacement Stephen J. Fink IBM T.J. Watson Research Center Feng Qian (presenter) Sable Research Group, McGill University.
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.
Code Optimization 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a.
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
12/5/2002© 2002 Hal Perkins & UW CSER-1 CSE 582 – Compilers Data-flow Analysis Hal Perkins Autumn 2002.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
A Region-Based Compilation Technique for a Java Just-In-Time Compiler Toshio Suganuma, Toshiaki Yasue and Toshio Nakatani Presenter: Ioana Burcea.
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Code Optimization Data Flow Analysis. Data Flow Analysis (DFA)  General framework  Can be used for various optimization goals  Some terms  Basic block.
David W. Goodwin, Kent D. Wilken
Data Flow Analysis Suman Jana
Dataflow Testing G. Rothermel.
Topic 10: Dataflow Analysis
University Of Virginia
Optimizing Transformations Hal Perkins Autumn 2011
1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.
Optimizing Transformations Hal Perkins Winter 2008
Optimizations using SSA
Dataflow Analysis Hal Perkins Winter 2008
Static Single Assignment
Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/4/2019 CPEG421-05S/Topic5.
EECS 583 – Class 7 Static Single Assignment Form
COMPILERS Liveness Analysis
Garbage Collection Advantage: Improving Program Locality
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab
Presentation transcript:

Practical Assignment Sinking for Dynamic Compilers Reid Copeland, Mark Stoodley, Vijay Sundaresan, Thomas Wong IBM Toronto Lab Compilation Technology 9/17/2019

Agenda Introduction Practical Dataflow Analysis Program Transformation Overview Results and Summary 9/17/2019

Introduction Local Variable assignment is redundant if execution can follow a path where the assigned variable is dead Goal: remove such redundant assignments Transformation: move an assignment past the blocks to avoid redundant store 9/17/2019

Optimization Assignment sinking is a widely implemented optimization in static compiler PRE-based algorithm is commonly used to implement the optimization Expensive to be used in dynamic compiler In Testarossa JIT compiler, a practical method is devised to do assignment sinking This presentation contains material which has Patents Pending 9/17/2019

Example BB5 x = a BB6 BB9 BB7 BB10 BB11 BB8 z = x + y z = x * 2 y = x / 2 9/17/2019

Example BB5 x = a BB6 BB9 x = a BB7 BB10 BB11 BB8 x = a z = x + y y = x / 2 9/17/2019

Motivation Can speed up the execution if an assignment is sunk from hot or scorching to cold block x = BB5 Scorching Edge (x not live) = x BB6 BB7 9/17/2019

Motivation Can speed up the execution if an assignment is sunk from hot or scorching to cold block x = BB5 Scorching Edge (x not live) x = = x BB6 BB7 9/17/2019

Motivation BB5 BB7 BB9 BB13 i0 = i …. i = i0 + 1 .. = i0 … if () goto BB5 … … (use of i) 9/17/2019

Motivation BB5 BB7 BB9 BB13 i0 = i …. i = i0 + 1 .. = i0 … if () goto BB5 i = i0 + 1 … i = i0 + 2 … (use of i) 9/17/2019

Practical Dataflow Analysis Formulate the dataflow problem in terms of partial liveness Partial liveness analysis Partial liveness => redundant assignment Solution to partial liveness indicates which blocks have both live and dead paths Use the solution to perform the assignment sinking transformation 9/17/2019

Dataflow Variables Liveness: a variable is live at the block on some path Live-On-All-Path (LOAP): a variable is live at the block on all the paths which follow it Live-On-Not-All-Path (LONAP): a variable is only partially live at the block Contain both live and dead successor paths 9/17/2019

Dataflow Equations Liveness Any-path backward dataflow analysis A variable is live at the block on some paths Any-path backward dataflow analysis GEN set: set of variables used before possibly being assigned in the block KILL set: set of variables assigned in the block Liveness_out(b) = 4 (Liveness_in(bi)) " bi ` b’s successors Liveness_in(b) = GEN(b) 4 (Liveness_out(b) – KILL(b)) 9/17/2019

Dataflow Equations LOAP All-path backward dataflow analysis A variable is live at the block and on all the paths that follows it All-path backward dataflow analysis GEN and KILL sets: same as Liveness LOAP_out(b) = 3 (LOAP_in(bi)) " bi ` b’s successors LOAP_in(b) = GEN(b) 4 (LOAP_out(b) – KILL(b)) 9/17/2019

Dataflow Equations LONAP A variable is only partially live at the block Non-iterative dataflow equations in terms of LOAP and Liveness LONAP_out(b) = Liveness_out(b) – LOAP_out(b) LONAP_in(b) = Liveness_in(b) – LOAP_in(b) 9/17/2019

LOAP Example BB5 x = a LOAP_out=0 LOAP_in=1 LOAP_in=0 BB6 BB9 z = x + y y = x / 2 z = x * 2 9/17/2019

LONAP Example BB5 x = a LOAP_out=0 LONAP_out=1 LOAP_in=1 LONAP_in=0 z = x + y y = x / 2 z = x * 2 9/17/2019

LONAP Example BB5 x = a LOAP_out=0 LONAP_out=1 LOAP_in=1 LONAP_in=0 y = x + 2 y = x / 2 z = x * 2 9/17/2019

Design Considerations LONAP indicates where an assignment can be beneficially sunk in terms of partial liveness Live ranges of variables changed when the assignment is sunk Use profile information to determine how an assignment is profitably sunk 9/17/2019

Design Considerations (Cont’d) GEN and KILL are still needed to indicate where an assignment can be legally sunk Sinking an assignment successfully can yield opportunity for earlier assignments to be sunk Sinking assignment along exception edges 9/17/2019

Program Transformation Overview Determine Liveness, LOAP and LONAP Blocks are analyzed in postorder fashion to identify the potential movable assignments Perform store placement pass to sink the potential movable assignments 9/17/2019

Store Placement Algorithm Assignment is sunk according to: LONAP: sink along path where it is beneficial GEN / KILL: sink along path where it is legal Sunk assignments are placed in the target block or in a synthetic block which jumps to the target Dataflow is updated along the path where the assignment is sunk allow earlier assignments to be sunk without additional pass 9/17/2019

Store Placement Example BB5 y = x + 1 x = a BB6 BB9 BB11 BB7 BB8 BB10 z = x + y y = x / 2 z = x * 2 9/17/2019

Store Placement Example KILL_cursor: maintain the kill symbols of the traversed assignments of the block BB5 y = x + 1 x = a KILL_cursor(x)=1 BB6 KILL(x)=0 KILL(x)=0 BB9 BB11 KILL(x)=0 KILL(x)=0 BB7 KILL(x)=0 BB8 BB10 KILL(x)=0 z = x + y y = x / 2 z = x * 2 9/17/2019

Store Placement Example ‘x’ is cleared in KILL_cursor ‘x’ is set in KILL for the placement blocks BB5 y = x + 1 x = a KILL_cursor(x)=0 BB6 KILL(x)=1 KILL(x)=0 BB9 x = a . BB11 KILL(x)=0 KILL(x)=0 BB7 KILL(x)=1 BB8 BB10 KILL(x)=0 x = a z = x + y y = x / 2 z = x * 2 9/17/2019

Store Placement Example Earlier assignment to ‘y’ can now be sunk BB5 y = x + 1 x = a KILL_cursor(x)=0 BB6 KILL(x)=1 KILL(x)=0 BB9 x = a BB11 KILL(x)=0 KILL(x)=0 BB7 KILL(x)=1 BB8 BB10 KILL(x)=0 y = x + 1 x = a z = x + y y = x / 2 z = x * 2 9/17/2019

Results: Sinking Opportunities Compile Time SPECjvm98 No. of Method Assignment Sunk Assignment Placed Compress 45 164 213 jess 91 336 463 db 52 218 289 javac 224 931 1173 mpegaudio 84 266 351 mtrt 60 207 333 jack 110 580 1032 Run on x86-32 Win, 3.0GHz, 1.5G RAM 9/17/2019

Results: Compile Time Overhead SPECjvm98 Scorching Compile (ms) PRE Cost / Scorching (%) Partial Liveness Cost / Scorching (%) Partial Liveness Cost / Overall (%) compress 970 72 3.7 1.1 jess 2480 19 8.8 1.9 db 783 55 0.8 javac 500 20 2.4 0.0 mpegaudio 1230 60 6.0 1.3 mtrt 6853 45 41 1.6 jack 3436 35 1.8 0.3 Run on x86-32 Win, 3.0GHz, 1.5G RAM 9/17/2019

Results: x86-32 Performance 9/17/2019

Results: x86-64 Performance 9/17/2019

Summary Practical dataflow solution to do assignment sinking is presented which is used in Testarossa JIT Compiler Compile time overhead is negligible Performance improvement is found in the benchmarks Future work: need new tuning to boost up more performance 9/17/2019

Questions ? 9/17/2019

Thank You. 9/17/2019

Backup 9/17/2019

Tuning Example BB8 x = a z = x + y Last use of x here 9/17/2019

Tuning Example After applying CSE and DSE BB8 x = a z = a + y Last use of x here 9/17/2019

Critical Edge Example x = = x 9/17/2019

Critical Edge Example x = x = = x 9/17/2019