1 CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic.

1 CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic

2 Outline  Conference overview  Brief summaries of sessions  Keynote speeches  Best paper

3 Conference Overview  Primary focus: back-end compilation techniques Static analysis and optimization Profiling Run-time techniques  8 sessions, 29 papers  Dominating topics: multicores, dynamic compilation

4 Overview of Session 1.Dynamic Optimization 2.Object-Oriented Code Generation and Optimization 3.Phase Detection and Profiling 4.Tiled and Multicore Compilation 5.Static Code Generation and Optimization Issues 6.SIMD Compilation 7.Optimization Space Exploration 8.Security and Reliability

5 Session 1: Dynamic Optimization  Kim Hazelwood (University of Virginia), Robert Cohn (Intel), A Cross-Architectural Interface for Code Cache Manipulation Pin dynamic instrumentation system with code cache The paper describes an API for various operations with the code cache (callbacks, lookups, statistics, etc.)  Derek Bruening, Vladimir Kiriansky, Tim Garnett, Sanjeev Banerji (Determina Corporation), Thread-Shared Software Code Caches Problem: sharing a code cache across multiple threads Authors propose a fine-grained locking scheme Evaluation using DynamoRIO

6 Session 1: Dynamic Optimization  Keith Cooper, Anshuman Dasgupta (Rice Univ.), Tailoring Graph-coloring Register Allocation For Runtime Compilation Problem: register allocation in JIT compilers Authors propose a novel lightweight graph-colouring technique  Weifeng Zhang, Brad Calder, Dean Tullsen (UC San Diego), A Self Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework Extension of the Trident event-driven dynamic optimization framework (previously proposed by the same authors) Dynamic insertion of prefetching instructions based on run-time analysis

7 Session 2: Object-Oriented Code Generation and Optimization  Suresh Srinivas, Yun Wang, Miaobo Chen, Qi Zhang, Eric Lin, Valery Ushakov, Yoav Zach, Shalom Goldenberg (Intel Corporation), Java JNI Bridge: An MRTE Framework for Mixed Native ISA Execution Use a dynamic translator for the execution of native calls to one ISA on a different ISA’s Java platform  Kris Venstermans, Lieven Eeckhout, Koen De Bosschere (Ghent University), Space-Efficient 64-bit Java Objects through Selective Typed Virtual Addressing Use address bits on a 64-bit architecture to encode object type in order to save memory Objects of the same type allocated in a contiguous (virtual) region

8 Session 2: Object-Oriented Code Generation and Optimization  Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan (IBM Canada), Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler The IBM TestaRossa JIT compiler This paper focuses on code patching and profiling in a multi- threaded environment with a lot of class loading/unloading  Lixin Su, Mikko H Lipasti (University of Wisconsin Madison), Dynamic Class Hierarchy Mutation Run-time reassignment of objects from one derived class to another, changing its virtual tables Offers opportunity for optimizations based on specialization

9 Session 3: Phase Detection and Profiling  Priya Nagpurkar, (UCSB), Michael Hind (IBM), Chandra Krintz, (UCSB), Peter Sweeney, V.T. Rajan (IBM), Online Phase Detection Algorithms Detecting phase behaviour in virtual machines Track dynamic program parameters (methods invoked, branch directions…) over time and apply a similarity model  Jeremy Lau, Erez Perelman, Brad Calder (UC San Diego), Selecting Software Phase Markers with Code Structure Analysis Portions of code whose execution correlates with phase changes Procedure calls and returns, loop boundaries Profile-based hierarchical loop-call graph

10 Session 3: Phase Detection and Profiling  Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara), Profiling over Adaptive Ranges Voted best paper – details later  Hyesoon Kim, Muhammad Aater Suleman, Onur Mutlu, Yale N. Patt (UT-Austin), 2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set Predicts whether the prediction accuracy of each branch will vary across input sets Heuristic approach used to derive representative profiling results from a single input set

11 Session 4: Tiled and Multicore Compilation  David Wentzlaff, Anant Agarwal (MIT), Constructing Virtual Architectures on a Tiled Processor Map components of a superscalar architecture (Pentium III) onto a parallel tiled architecture (Raw) using dynamic translation In a way, uses Raw as a coarse-grain FPGA  Aaron Smith, (UT-Austin), J. Burrill, (UMass at Amherst), J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinley (UT-Austin), Compiling for EDGE Architectures TRIPS EDGE (Explicit Data Graph Execution) architecture This paper focuses on compilation of standard C and FORTRAN benchmarks

12 Session 4: Tiled and Multicore Compilation  Shih-wei Liao, Zhaohui Du, Gansha Wu, Guei-Yuan Lueh (Intel), Data and Computation Transformations for Brook Streaming Applications on Multiprocessors Parallel compiler for the Brook streaming language An extension of C that enables specifying data parallelism  Michael L. Chu, Scott A. Mahlke (University of Michigan), Compiler-directed Object Partitioning for Multicluster Processors Partitioning of data in clustered architectures such as Raw I didn’t really understand what programming model these authors have in mind?

13 Session 5: Static Code Generation and Optimization Issues  Two papers about the HPUX Itanium compiler: Dhruva R. Chakrabarti, Shin-Ming Liu (Hewlett-Packard), Inline Analysis: Beyond Selection Heuristics Cross-module techniques for selection of inlined call sites and the choice of specialized function versions Robert Hundt, Dhruva R. Chakrabarti, Sandya S. Mannarswamy (Hewlett-Packard), Practical Structure Layout Optimization and Advice Data layout and placement on the heap to improve locality Structure splitting, structure peeling, dead field removal, and field reordering

14 Session 5: Static Code Generation and Optimization Issues  Chris Lupo, Kent Wilken (University of California, Davis), Post Register Allocation Spill Code Optimization Authors propose a profile-based algorithm for placement of save/restore instructions handling spilled variables in function calls Implemented as a part of GCC  Seung Woo Son, Guangyu Chen, Mahmut Kandemir (Pennsylvania State University), A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality Goal: restructure code so that disk idle periods are lengthened The approach targets array-based programs: disk layout of array data exposed to the compiler

15 Session 6: SIMD Compilation  Jianhui Li, Qi Zhang, Shu Xu, Bo Huang (Intel China Software Center), Optimizing Dynamic Binary Translation for SIMD Instructions Algorithms for dynamic binary translation of SIMD instructions in general-purpose architectures (such as MMX in x86) Evaluation using IA-32 binaries on Itanium 2  Dorit Nuzman (IBM), Richard Henderson (Red Hat), Multi- Platform Auto-Vectorization Implementation of automatic vectorizer for GCC 4.0

16 Session 7: Optimization-space Exploration  Felix Agakov, Edwin Bonilla, John Cavazos, Bjoern Franke, Grigori Fursin, Michael O'Boyle, Marc Toussaint, John Thomson, Chris Williams (U. of Edinburgh), Using Machine Learning to Focus Iterative Optimization Predictive modelling used to search the optimization space Targets embedded platforms – AMD Au1500 and Texas Instruments TI C6713  Prasad Kulkarni, David Whalley, Gary Tyson (Florida State University), Jack Davidson (University of Virginia), Exhaustive Optimization Phase Order Space Exploration Exhaustive search of the phase order space (15 phases) using aggressive pruning; takes time on the order of minutes to hours Targets StrongARM SA-100

17 Session 7: Optimization-space Exploration  Zhelong Pan, Rudolf Eigenmann (Purdue University), Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning Problem: find the optimal combination of 38 GCC O3 options, targeting Pentium IV and Sparc II Proposed heuristic algorithm that provides s quality solution in time on the order of several hours

18 Session 8: Security and Reliability  Edson Borin, (UNICAMP), Cheng Wang, Youfeng Wu (Intel), Guido Araujo (UNICAMP), Software-Based Transparent and Comprehensive Control-Flow Error Detection Addresses the problem of soft (transient) errors that cause branches to incorrect instructions Implemented in SW as a part of a dynamic binary translator  Tao Zhang, Xiaotong Zhuang, Santosh Pande (Georgia Tech), Compiler Optimizations to Reduce Security Overheads Optimizations that specifically target techniques that implement software protection with minimal HW support

19 Session 8: Security and Reliability  Susanta Nanda, Wei Li, Tzi-cker Chiueh (State University of NY at Stony Brook), BIRD: Binary Interpretation using Runtime Disassembly Goal: framework for automatic detection of vulnerabilities such as buffer overflows when the source code is not available Static and dynamic disassembly and instrumentation – targets Windows x86 application

20 Keynote Speeches  Wei Li, Principal Engineer, Intel: "Parallel Programming 2.0"  Kevin Stoodley, Fellow and CTO of Compilation Technology, IBM: "Productivity and Performance: Future Directions in Compilers"

21 Wei Li: Parallel Programming 2.0  Major technological change: Moore’s Law continues to increase transistor counts However: power, memory latency, limits to ILP are setting an effective performance ceiling  General trend towards thread-level on-chip parallelism SMT Chip multiprocessors

22 Wei Li: Parallel Programming 2.0  “Parallel Programming 2.0” refers to the advent of multicores  A very optimistic future vision:

23 Wei Li: Parallel Programming 2.0  Key issue – where will the parallelism come from?  Parallel programming needs to become more mainstream Consumer vs. HPC/server/database Inclusion into education at more elementary level New tools for greater ease of programming  Intel’s parallel programming tools http://www.intel.com/software

24 K. Stoodley:"Productivity and Performance: Future Directions in Compilers"  Limits to traditional static compilation  Overview of IBM compiler technology  Testarossa JIT compiler, Toronto Portable Optimizer, Tobey backend  Challenges at present and near future  Software abstraction complexity – forces the scope of compilation to higher levels  Maintaining high performance backwards compatibility increasingly difficult

25 K. Stoodley:"Productivity and Performance: Future Directions in Compilers"  Future: convergence/combination of dynamic and static compilation technologies xlc Toronto Portable Optimizer (TPO) W-Code Profile-Directed Feedback (PDF) xlCxlf TOBEY Backend Static Machine Code class jar J9 Execution Engine (Java + Others) Testarossa JIT Dynamic Machine Code CPO Front Ends Binary Translation

26 Best Paper  Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara): Profiling over Adaptive Ranges

27 Profiling over Adaptive Ranges  Problem: how to count specific events efficiently and accurately? Code segments executed Memory regions accessed IP addresses of routed packets  In all cases, impossible to maintain separate counters for the entire range of values Each basic block, memory address, IP address…

28 Trade-off: Precision vs. Efficiency Unlimited counters Uniform ranges  Profiling with uniform ranges fails to distinguish hot code

29 Higher Precision for Hot Regions  Good trade-off with limited resources: High precision for hot regions Low precision for colder ones, but this affects the accuracy less  Challenge: how to determine what exactly to count with what precision?

30 Solution: Adaptive Profiling  Start with one counter; split counters as they become hot:

33 Counter Merging  Problem: what if program behaviour changes after the initialization phase?

34 Counter Merging  Problem: what if program behaviour changes after the initialization phase?

35 Counter Merging  Solution: perform counter merging along with splitting

36 Counter Merging  Counters of merged child nodes added to the parent

37 Counter Merging  Counters of merged child nodes added to the parent

38 Counter Merging  Problem: how to identify nodes for merging? They are by definition those ones that are not updated frequently  Solution: periodic batched merge operations Tree depth grows at logarithmic rate  can be done at exponentially increasing intervals

39 Additional Contributions  Heuristics for splitting and merging  Theoretical analysis of accuracy guarantees  Proposal for hardware implementation  Experimental evaluation Memory requirements Average and worst-case errors on benchmarks Performance of HW implementation Accuracies on the order of 98.0-99.8% with only 8-64K of memory

40 Conclusions  Highly interesting program My short presentation certainly doesn’t do justice to most of the mentioned works!  Readings to perhaps consider for future CARG: D. Wentzlaff, A. Agarwal, Constructing Virtual Architectures on a Tiled Processor A. Smith et al., Compiling for EDGE Architectures F. Agakov et al., Using Machine Learning to Focus Iterative Optimization (Highly subjective!)

1 CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic.

Similar presentations

Presentation on theme: "1 CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic.

Similar presentations

Presentation on theme: "1 CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic."— Presentation transcript:

Similar presentations

About project

Feedback