CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University.

Slides:



Advertisements
Similar presentations
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Java Virtual Machine (JVM). Lecture Objectives Learn about the Java Virtual Machine (JVM) Understand the functionalities of the class loader subsystem.
Chapter 9 Subprograms Specification: name, signature, actions Signature: number and types of input arguments, number and types of output results –Book.
Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.
Chapter 5: Elementary Data Types Properties of types and objects –Data objects, variables and constants –Data types –Declarations –Type checking –Assignment.
Compiler Construction
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
Course Outline Traditional Static Program Analysis –Classic analyses and applications Software Testing, Refactoring Dynamic Program Analysis.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
Feedback: Keep, Quit, Start
Interprocedural analyses and optimizations. Costs of procedure calls Up until now, we treated calls conservatively: –make the flow function for call nodes.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
OOP in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Previous finals up on the web page use them as practice problems look at them early.
Encapsulation by Subprograms and Type Definitions
Swerve: Semester in Review. Topics  Symbolic pointer analysis  Model checking –C programs –Abstract counterexamples  Symbolic simulation and execution.
Course Outline Traditional Static Program Analysis –Classic analyses and applications –Soot Software Testing, Refactoring Dynamic Program Analysis.
From last time: Inlining pros and cons Pros –eliminate overhead of call/return sequence –eliminate overhead of passing args & returning results –can optimize.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Chapter 10 Implementing Subprograms. Copyright © 2007 Addison-Wesley. All rights reserved. 1–2 Semantics of Call and Return The subprogram call and return.
Unit 061 Java Virtual Machine (JVM) What is Java Virtual Machine? The Class Loader Subsystem Linking oVerification oPreparation oResolution Class Initialization.
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
CSE 425: Object-Oriented Programming II Implementation of OO Languages Efficient use of instructions and program storage –E.g., a C++ object is stored.
“is a”  Define a new class DerivedClass which extends BaseClass class BaseClass { // class contents } class DerivedClass : BaseClass { // class.
CSE 332: C++ templates This Week C++ Templates –Another form of polymorphism (interface based) –Let you plug different types into reusable code Assigned.
CSE 425: Data Types II Survey of Common Types I Records –E.g., structs in C++ –If elements are named, a record is projected into its fields (e.g., via.
An Introduction to Design Patterns. Introduction Promote reuse. Use the experiences of software developers. A shared library/lingo used by developers.
Features of Object Oriented Programming Lec.4. ABSTRACTION AND ENCAPSULATION Computer programs can be very complex, perhaps the most complicated artifact.
Object Oriented Programming with C++/ Session 6 / 1 of 44 Multiple Inheritance and Polymorphism Session 6.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
OOP and Dynamic Method Binding Chapter 9. Object Oriented Programming Skipping most of this chapter Focus on 9.4, Dynamic method binding – Polymorphism.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
CSC3315 (Spring 2008)1 CSC 3315 Subprograms Hamid Harroud School of Science and Engineering, Akhawayn University
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
11th Nov 2004PLDI Region Inference for an Object-Oriented Language Wei Ngan Chin 1,2 Joint work with Florin Craciun 1, Shengchao Qin 1,2, Martin.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Overview of C++ Templates
Using Types to Analyze and Optimize Object-Oriented Programs By: Amer Diwan Presented By: Jess Martin, Noah Wallace, and Will von Rosenberg.
Points-To Analysis in Almost Linear Time Josh Bauman Jason Bartkowiak CSCI 3294 OCTOBER 9, 2001.
CSE 425: Control Abstraction II Exception Handling Previous discussion focuses on normal control flow –Sometimes program reaches a point where it cannot.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
5/7/03ICSE Fragment Class Analysis for Testing of Polymorphism in Java Software Atanas (Nasko) Rountev Ohio State University Ana Milanova Barbara.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Design issues for Object-Oriented Languages
Manuel Fahndrich Jakob Rehof Manuvir Das
Object Lifetime and Pointers
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Java Programming Course
창 병 모 숙명여대 전산학과 자바 언어를 위한 정적 분석 틀 (A Framework for SBA for Java) KAIST 프로그램 분석시스템 연구단 세미나 창 병 모 숙명여대 전산학과.
자바 언어를 위한 정적 분석 (Static Analyses for Java) ‘99 한국정보과학회 가을학술발표회 튜토리얼
Type Systems.
Presentation transcript:

CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Concrete type analysis… why we care Runtime cost of virtual method resolution is high Reduction of code size Call graphs needed for interprocedural analysis Function inlining Inference algorithms very expensive – coming up with efficient algorithms is the challenge

Fast Static Analysis of C++ Virtual Function Calls Bacon and Sweeney

Overview Goal: Resolving virtual function calls Three Static analysis algorithms –Unique name –Class Hierarchy Analysis (CHA) –Rapid Type Analysis (RTA)

Example class A{ public: virtual int foo(){return 1;}; } class B: public A { public: virtual int foo(){return 2;}; virtual int foo(int t){return I+1}; } void main(){ b* p = new B(); int result1 = p-> foo(1); int result2 = p->foo(); A* q = p; int result3 = q->foo(); } A B int foo() int foo(int)

Unique Name Link time process Doesn’t require access to source code Checks mangled name Unique signature implies replacing virtual call with direct call

Class Hierarchy Uses static declared type with class hierarchy information Builds call graph Replaces virtual calls with direct calls when there are no derived classes for the static type Rely on type safety of language (sometimes need to disable downcasts)

Rapid Type Analysis Starts with call graph generated from CHA Prunes the size the call graph based on static information about class instantiation Flow insensitive like CHA –results in efficiency –Inherits limitations of flow insensitive analysis  Rely on type safety of language (sometimes need to disable downcasts)

Results What biases the results? (C++) Ran analysis algorithms on seven real programs of varying size (large - small) RTA wins 4 out of 7 WHY? Discuss Static analysis can fail with certain programming idioms (e.g. base*b = new sub() ) Code Size: often reduces code size dramatically

Practical Virtual Method Call Resolution for Java Sundaresan et al

Overview Study practical, context-insensitive, flow insensitive techniques to resolve virtual function calls in Java Present Reaching-type analysis –Variable-type analysis –Refers-to analysis Uses Soot(Jimple) framework

Three Groups of analysis Baseline (discussed previously) –Class hierarchy analysis –Rapid type Analysis Reaching type –Declared type analysis –Variable type analysis (more fine grain/accurate) Refers-to –Developed for C but ported to

Reaching-type Analysis Build a type propagation graph Initialize the graph with type information generated by new() Propagate type information along directed edges Nodes are associated with all reaching types

Variable and Declared Type Analysis Variable Type (pg 10) –Uses variable name as the representative Declared Type (pg 11) –Uses the type by which the initial variable was declared –Puts all variables of the same declared type into the same equivalence class –Coarser and less precise Both algorithms have an initialization phase and an propagation phase Size of propagation graph: O(C*M c ) edges

Refers-to Analysis Takes into account aliasing Nodes –Reference nodes (locals, parameters, instance fields) –Abstract location nodes (heap locations) Algorithm: Each reference node initially refers to a unique abstract location, assignments merge abstract locations as the algorithm progresses

Alternative Approaches Type prediction –Requires profiling code –Making the common case fast –Runtime type test –Resolves more calls Alias analysis –Very expensive (interprocedural, flow sensitive) Sometimes static analysis is not possible e.g. dynamically loaded classes based on command line inputs or newly available classes.  Does anyone see a way to address this?

Benchmarks and Results Ran on 9 programs, 7 of which are used in the SPECjvm benchmark suite Variable type analysis best at improving call graph precision Type based analysis more efficient because it build nodes based on the classes in the program and not each individual variable Table II shows exact numbers for how many monomorphic edges…. So why couldn’t they resolve all of these? How did they get this information in the first place???

“The Cartesian Product Algorithm” Simple and Precise Type Inference of Parametric Polymorphism

Polymorphism Explicit concrete type declarations undesirable for programmer Algorithms must be used to infer types Parametric polymorphism: ability of routines to be invoked on arguments of several different types CPA uses context sensitivity, whereas other inference algorithms do not, this is key b/c CPA uses different code for each context

Basic Type Inference Algorithm Step 1: Allocate type variables (associate a type var with every slot and expression in the program) Step 2: Seed type variables (to capture the initial state of the target program) Step 3: Establish constraints, propagate (builds a directed graph that expresses propagation of types through assignments) Basic algorithm analyzes polymorphism imprecisely

Improvements on Basic Algorithm 1-Level Expansion –Different templates for each send –Inefficient P-Level (precise, yet worst-case complexity is exponential) Iterative algorithm (precise, more efficient than expansion)

Cartesian Product Algorithm “There is no such thing as a polymorphic call, only polymorphic call sites” Turns the analysis of each send into a case analysis (makes exact type info available for each case immediately, eliminates iteration) Maintain per-method pools of templates so that template-sharing can be achieved (efficiency) Iteration is avoided because of –Monotonicity of cartesian product –Monotone context of application (iterative is not monotone because comparing types for equality is not a monotone function) Efficient and precise (also, no need to expand away inheritance)

Precision improvements possible? Yes mod arg = (self-(arg*(self div: arg) ) x mod: y, where type(x) = type(y) = {smallInt, float} Iterative algorithm infers {smallInt, float} CPA infers {smallInt} In this case, there is a benefit from having four templates connected, one for each tuple in the product of the types of x and y

Results “Extractor” – having less precise information about type forces it to extract more CPA delivers the smallest extractions, and the best CPU time of the different algorithms How generalizable are the results from the Self system? How much type inference is even necessary for the programs they benchmarked (Unix diff command)?

Thanks caller callee