Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial.

Similar presentations


Presentation on theme: "Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial."— Presentation transcript:

1 Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

2 What is Chord? Static and dynamic program analysis framework for Java Started in 2006 as static Checker of races and deadlocks Publicly available under New BSD License Key goals: – versatile: applies to various analyses, domains, platforms – extensible: users can build own analyses atop given ones – productive: facilitates rapid prototyping of analyses – robust: deterministic, handles partial programs, etc.

3 Key Features of Chord Many standard static and dynamic analyses Writing/solving analyses using Datalog/BDDs Analyses as building blocks Context-sensitive static analysis framework Dynamic analysis framework

4 Outline of Tutorial Part 1: Getting Started With Chord Program Representation Part 2: Analysis Using Datalog/BDDs Chaining Analyses Together Part 3: Context-Sensitive Analysis Dynamic Analysis

5 Downloading Chord Stable Binary Release – http://jchord.googlecode.com/files/chord-bin-2.0.tar.gz Stable Source Release 1.http://jchord.googlecode.com/files/chord-src-2.0.tar.gz (mandatory) – Chords source code + JARs of libraries used by Chord 2.http://jchord.googlecode.com/files/chord-libsrc-2.0.tar.gz (optional) – (adapted) Java source code of libraries used by Chord Latest Development Snapshot svn checkout http://jchord.googlecode.com/svn/trunk/ chord Or checkout only relevant directories under trunk/: – main/ (released as 1 above) – libsrc/ (released as 2 above) – test/ (Chords regression test suite) – … (many more)

6 Compiling Chord Requirements: – JVM for Java 5 or higher – Apache Ant – C++ compiler (not needed by default) Optional: edit chord.properties – to enable C BuDDy library: set chord.use.buddy=true – to enable C++ JVMTI agent: set chord.use.jvmti=true Run in main directory: ant compile main/ build.xml chord.properties agent/ bdd/ doc/ examples/ lib/ src / web/ chord.jar libbuddy.so | buddy.dll | libbuddy.dylib libchord_instr_agent.so

7 Running Chord Requirements: JVM for Java 5 or higher no other dependencies (e.g., Eclipse) Run either command in any directory: ant –f /build.xml [–Dkey i =val i ]* run requires Apache Ant not available in Binary Release java –cp /chord.jar [–Dkey i =val i ]* chord.project.Boot where denotes path of Chords main/ directory –Dkey i =val i sets value of system property key i to val i

8 Chord Properties All inputs to Chord are specified via System Properties conventionally named chord.* (e.g., chord.work.dir) Three choices with decreasing precedence: 1.On command line via –Dkey=val format use to specify properties specific to the current Chord run 2.Via user-specified file denoted by chord.props.file use to specify properties specific to program being analyzed (e.g. its main class, classpath, etc.) default value = "[chord.work.dir]/chord.properties" 3.Via pre-defined file main/chord.properties use to specify properties that must hold in every Chord run (e.g., maximum memory to be used by JVM)

9 Architecture of Chord Classic or Modern Runtime bytecode translator (joeq) bytecode instrumentor (javassist) saxon XSLT bddbddb BuDDy Java2HTML static analysis Datalog analysis dynamic analysis program bytecode domain D 1 relation R 12 relation R 1 domain D 2 relation R 2 analysis result in XML analysis result in HTML program source program quadcode relation R 12 analysis program inputs domain D 1 analysis domain D 2 analysis example program analysis Java program user demands this to run starts, blocks on R 2, D 2 starts, runs to finish starts, blocks on D 1, D 2, R 1, R 12 starts, blocks on D 1 resumes, runs to finish starts, blocks on D 1 resumes, runs to finish

10 Setting Up a Java Program for Analysis Command to run in Chords main directory: ant –Dchord.work.dir= /example run example/ src/ foo/ Main.java... classes/ foo/ Main.class... lib/ src/ taz/... jar/ taz.jar chord.properties chord_output/ bddbddb/ chord.main.class=foo.Main chord.class.path=classes:lib/jar/taz.jar chord.src.path=src:lib/src chord.run.ids=0,1 chord.args.0="-thread 1 -n 10" chord.args.1="-thread 2 -n 50"

11 Java Program Representations Java source code.java Java bytecode.class javac Disassembled Java bytecode javap

12 Example: Java Source Code 1: package test; 2: 3: public class HelloWorld { 4: public static void main(String[] args) { 5: System.out.print("Hello World!"); 6: } 7: } File test/HelloWorld.java:

13 Pretty-Printing Java Bytecode public class test.HelloWorld extends java.lang.Object Constant pool: const #1 = Method #6.#20; // java/lang/Object." ":()V... public static void main(java.lang.String[]); Code: Stack=2, Locals=1, Args_size=1 0: getstatic #2; // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3; // String Hello World! 5: invokevirtual #4; // Method java/io/PrintStream.println:... 8: return javap –private –verbose –classpath [–bootclasspath ] SourceFile: "HelloWorld.java" LineNumberTable: line 5: 0 line 6: 8 LocalVariableTable: Start Length Slot Name Signature 0 9 0 args [Ljava/lang/String; Run "javac –g" on.java files to keep debug info (lines, vars, source) in.class files

14 Java Program Representations Java source code.java Quadcode Java bytecode.class javac Joeq Disassembled Java bytecode javap

15 Pretty-Printing Quadcode Class: test.HelloWorld Method: main:([Ljava/lang/String;)V@test.HelloWorld 0#1 5#3 5#2 8#4 Control flow graph: BB0 (ENTRY) (in:, out: BB2) BB2 (in: BB0 (ENTRY), out: BB1 (EXIT)) 1: GETSTATIC_A T1,.out 3: MOVE_A T2, AConst: "Hello World!" 2: INVOKEVIRTUAL_V println:(Ljava/lang/String;)V@java.io.PrintStream, (T1,T2) 4: RETURN_V BB1 (EXIT) (in: BB2, out: ) Exception handlers: [] Register factory: Registers: 3 ant –Dchord.work.dir= –Dchord.out.file= – Dchord.print.classes= –Dchord.verbose=0 run Alternative options: –Dchord.print.methods= –Dchord.print.all.classes=true Replace any `$` by `#` to prevent shell interpretation

16 Type Hierarchy jq_Type jq_Primitivejq_Reference jq_Class jq_Array (all defined in package joeq.Class)

17 chord.program.Program API static Program g() fully-qualified name of the class, e.g., "java.lang.String[]" IndexSet getTypes() all types in classes that may be loaded IndexSet getClasses() all classes that may be loaded IndexSet getMethods() all methods that may be called

18 joeq.Class.jq_Class API String getName() fully-qualified name of the class, e.g., "java.lang.String[]" jq_InstanceField[] getDeclaredInstanceFields() all instance fields declared in the class jq_StaticField[] getDeclaredStaticFields() all static fields declared in the class jq_InstanceMethod[] getDeclaredInstanceMethods() all instance methods declared in the class jq_StaticMethod[] getDeclaredStaticMethods() all static methods declared in the class

19 joeq.Class.jq_Method API String getName().toString() name of the method String getDesc().toString() descriptor of the method, e.g., "(Ljava/lang/String;)V" jq_Class getDeclaringClass() declaring class of the method ControlFlowGraph getCFG() control-flow graph of the method Quad getQuad(int bci) first quad at the given bytecode offset (null if missing) int getLineNumber(int bci) line number of the given bytecode offset (-1 if missing) String toString() ID of the method in format mName:mDesc@cName

20 Control Flow Graphs (CFGs) Each CFG contains: a set of registers (register factory) a directed graph whose nodes are basic blocks and edges denote control flow Register Factory: one register per argument (local variables) named R0, R1, …, Rn one register per temporary (stack variables) named Tn+1, Tn+2, …, Tm Basic Block (BB): sequence of primitive statements (quads) unique entry BB: no quads and no incoming edges unique exit BB: no quads and no outgoing edges

21 joeq.Compiler.Quad.ControlFlowGraph API RegisterFactory getRegisterFactory() set of all local variables EntryOrExitBasicBlock entry() unique entry basic block EntryOrExitBasicBlock exit() unique exit basic block List reversePostOrder () List of all basic blocks in reverse post-order jq_Method getMethod() containing method of the CFG

22 joeq.Compiler.Quad.BasicBlock API int size() number of quads in the basic block Quad getQuad(int index) quad at the given 0-based index List getPredecessors() list of immediate predecessor basic blocks List getSuccessors() list of immediately successor basic blocks jq_Method getMethod() containing method of the basic block

23 Quad Instructions Each quad contains an operator and upto 4 operands Example: getfield l = b.f: Operand lo = Getfield.getDest(q); Operand bo = Getfield.getBase(q); if (lo instanceof RegisterOperand && bo instanceof RegisterOperand) { Register l = ((RegisterOperand) lo).getRegister(); Register b = ((RegisterOperand) bo).getRegister(); jq_Field f = Getfield.getField(q).getField();... }

24 Kinds of Quads joeq.Compiler.Quad.Operator Move Getstatic Branch Invoke Phi Putstatic IntIfCmp InvokeVirtual Unary Getfield Goto InvokeStatic Binary Putfield Jsr InvokeInterface New ALoad Ret NewArray AStore LookupSwitch MultiNewArray Checkcast TableSwitch Alength Instanceof Monitor Return

25 joeq.Compiler.Quad.Quad API Operator getOperator() kind of the quad int getBCI() bytecode offset of the quad in its containing method String toByteLocStr() unique identifier of the quad in format offset!mName:mDesc@cName String toJavaLocStr() location of the quad in format fileName:lineNum in Java source code String toLocStr() location of the quad in both Java bytecode and source code String toVerboseStr() verbose description of the quad (its location plus contents) BasicBlock getBasicBlock() containing basic block of the quad

26 Traversing Quadcode import chord.program.Program; import joeq.Class.jq_Method; import joeq.Compiler.Quad.*; QuadVisitor qv = new QuadVisitor.EmptyVisitor() { public void visitNew(Quad q) {... } public void visitPhi(Quad q) {... }... }; Program program = Program.g(); for (jq_Method m : program.getMethods()) { if (!m.isAbstract()) { ControlFlowGraph cfg = m.getCFG(); for (BasicBlock bb : cfg.reversePostOrder()) for (Quad q : bb.getQuads()) q.accept(qv); } }

27 Java Program Representations Java source code.java Quadcode Java bytecode.class HTMLized Java source code.html j2h Java2HTML javac Joeq Disassembled Java bytecode javap

28 HTMLizing Java Source Code Programmatically: import chord.program.Program; Program program = Program.g(); program.HTMLizeJavaSrcFiles(); From command line: 1.Use j2h: ant –Djava.dir= –Dhtml.dir= j2h_xref 2.Use Java2HTML: ant –Djava.dir= –Dhtml.dir= j2h_fast

29 Java Program Representations Java source code.java Jasmin code.j Quadcode Java bytecode.class HTMLized Java source code.html j2h Java2HTML javac Joeq Chord Disassembled Java bytecode javap Jasmin

30 Analysis Scope Construction Determines which parts of the program to analyze Computed in either of these cases: chord.build.scope=true chord.program.Program.g() is called Algorithm specified by chord.scope.kind=[rta|cha|dynamic] Rapid Type Analysis (RTA) Class Hierarchy Analysis (CHA) Dynamic Analysis All three algorithms require specifying: chord.main.class= chord.class.path=

31 Analysis Scope Representation Reachable Methods stored in file specified by chord.methods.file (default = "[chord.out.dir]/methods.txt") Resolved Reflection stored in file specified by chord.reflect.file (default = "[chord.out.dir]/reflect.txt") # resolvedClsForNameSites... # resolvedObjNewInstSites... # resolvedConNewInstSites... # resolvedAryNewInstSites... mname:mdesc@cname... Class Class.forName(String) Object Class.newInstance() Object Constructor.newInstance(Object[]) Object Array.newInstance(Class, int) bci!mname:mdesc@cname->cname 1,cname 2,...,cname N

32 Rapid Type Analysis (RTA) Preferred (and default) scope construction algorithm Allows specifying reflection resolution via chord.reflect.kind=[none|static|dynamic] Preferred way to resolve reflection is dynamic and requires specifying how to run program: chord.run.args=id1,…,idN chord.args.id1=, …, chord.args.idN=

33 Dynamic Analysis Based Scope Construction Runs program and observes which classes are loaded Requires JVMTI (set chord.use.jvmti=true in file main/chord.properties) Requires specifying how to run program: chord.run.args=id1,…,idN chord.args.id1=, …, chord.args.idN= All methods of each loaded class are deemed reachable Currently no support for reflection resolution

34 Additional Analysis Scope Features Scope Reuse Enables using scope constructed by a previous run of Chord Constructs scope from files specified by chord.methods.file and chord.reflect.file Specified via chord.reuse.scope=true Scope Exclusion Enables excluding certain classes from scope Treats all methods in such classes as no-ops Specified via three properties: 1. chord.std.scope.exclude (default = "") 2. chord.ext.scope.exclude (default = "") 3. chord.scope.exclude (default = "[chord.std.scope.exclude],[chord.ext.scope.exclude]")

35 Native Method Stubs Specified in file main/src/chord/program/stubs/stubs.txt in format: mname:mdesc@cname stub_cname where stub_cname denotes a class implementing: public interface joeq.Compiler.Quad.ICFGBuilder { public ControlFlowGraph run(jq_Method m); } Example: start:()V@java.lang.Thread chord.program.stubs.ThreadStartCFGBuilder

36 Example Native Method Stub public ControlFlowGraph run(jq_Method m) { jq_Class c = m.getDeclaringClass(); jq_Method n = c.getDeclaredInstanceMethod( new jq_NameAndDesc("run", "()V")); RegisterFactory f = new RegisterFactory(0, 1); Register r = f.getOrCreateLocal(0, c); ControlFlowGraph cfg = new ControlFlowGraph(m, 1, 0, f); Quad q1 = Invoke.create(0, m, Invoke.INVOKEVIRTUAL_V.INSTANCE, null, new MethodOperand(n), 1); Invoke.setParam(q1, 0, new RegisterOperand(r, c)); Quad q2 = Return.create(1, m, RETURN_V.INSTANCE); BasicBlock bb = cfg.createBasicBlock(1, 1, 2, null); bb.appendQuad(q1); bb.appendQuad(q2); BasicBlock eb = cfg.entry(), xb = cfg.exit(); eb.addSuccessor(bb); bb.addPredecessor(eb); bb.addSuccessor(xb); xb.addPredecessor(bb); return cfg; } void start() { this.run(); return; }

37 Outline of Tutorial Part 1: Getting Started With Chord Program Representation Part 2: Analysis Using Datalog/BDDs Chaining Analyses Together Part 3: Context-Sensitive Analysis Dynamic Analysis

38 Program Domain Building block for analyses based on Datalog/BDDs Represents an indexed set of values of a fixed kind typically artifacts from program being analyzed (e.g., set of all methods in the program) Assigns unique 0-based index to each value everything in Datalog/BDDs must be numbered indices given in order in which values are added order affects efficiency of running analysis on large sets initial indices (0, 1,...) typically given to frequently-used values (e.g., the main method) O(1) access to value given index, and vice versa

39 Example Predefined Program Domains NameDescriptionDefining Class Ttypeschord.analyses.type.DomT Mmethodschord.analyses.method.DomM Ffieldschord.analyses.field.DomF Vvariables of ref typechord.analyses.var.DomV Pquads (program points)chord.analyses.point.DomP Hobject allocation quadschord.analyses.alloc.DomH Imethod call quadschord.analyses.invk.DomI Eheap-accessing quadschord.analyses.heapacc.DomE Aabstract threadschord.analyses.alias.DomA Cabstract method contextschord.analyses.alias.DomC Oabstract objectschord.analyses.alias.DomO

40 Writing a Program Domain Analysis Domain M: all methods in the program – main method has index 0 – java.lang.Thread.start() method has index 1 package chord.analyses.method; @Chord(name = "M") public class DomM extends ProgramDom { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); } }

41 Running a Program Domain Analysis ant –Dchord.work.dir= –Dchord.run.analyses=M run package chord.analyses.method; @Chord(name = "M") public class DomM extends ProgramDom { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); } }

42 Running a Program Domain Analysis main:([Ljava/lang/String;)V@Bldg start:()V@java.lang.Thread :()V@Bldg … M M.map chord_output/ bddbddb/ M.map M.dom package chord.analyses.method; @Chord(name = "M") public class DomM extends ProgramDom { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); } }

43 chord.project.analyses.ProgramDom API void setName(String name) set name of domain boolean add(T val) add value to domain if not present; return true if added int getOrAdd(T val) add value to domain if not present; return its index in either case void save() save domain to disk (.dom and.map files) String toUniqueString(T val) unique string representation of value int size() number of values in domain T get(int index) value having the given index; IndexOutofBoundsEx if not found int indexOf(T val) index of given value; -1 if not found Note: values once added cannot be removed!

44 Program Relation Building block for analyses based on Datalog/BDDs Represents a set of tuples over one or more fixed program domains Represented symbolically as a BDD enables storing and manipulating large relations efficiently Provides various relational operations projection, selection, join, etc. BDD size and efficiency of operations depends heavily on encoding of relation content as opposed to size ordering of values within program domains relative ordering between program domains

45 Writing a Program Relation Analysis Relation MI: tuples (m, i) such that method m contains call i package chord.analyses.invk; @Chord(name = "MI", sign = "M0,I0:M0_I0") public class RelMI extends ProgramRel { @Override public void fill() { DomI domI = (DomI) doms[1]; for (Quad q : domI) { jq_Method m = q.getMethod(); add(m, q); } } } M0_I0 : Domain order Only dictates performance Can also be I0_M0 or I0xM0 Easy to change over time M0,I0 : Domain names Order mnemonically (hard to change over time) Suffix 0, 1, etc. distinguishes repeating domains

46 Writing a Program Relation Analysis package chord.analyses.var; @Chord(name = "VT", sign = "V0,T0:T0_V0") public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } } } Relation VT: tuples (v, t) such that local variable v has type t

47 Running a Program Relation Analysis ant –Dchord.work.dir= –Dchord.run.analyses=VT run package chord.analyses.var; @Chord(name = "VT", sign = "V0,T0:T0_V0") public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } } }

48 Running a Program Relation Analysis chord_output/ bddbddb/ V.dom, T.dom, V.map, T.map VT.bdd # V0:2 T0:2 # 1 2 # 3 4 6 4 2 1 4 3 7 4 0 1 6 3 7 1 5 3 0 7 4 2 5 0 3 2 6 5 2 1 3 4

49 Program Relation as Binary Function Variable v0 has types t1, t2, t3 Variable v1 has type t3 Variable v2 has type t3 Relation VT = { (0, 1), (0, 2), (0, 3), (1, 3), (2, 3) } VT b1b1 b2b2 b3b3 b4b4 f 00000 00011 00101 00111 01000 01010 01100 01111 10000 10010 10100 10111 11000 11010 11100 11110

50 BDD: Binary Decision Diagrams (Bryant 1986) b2b2 b4b4 b3b3 b3b3 b4b4 b4b4 b4b4 00010000 b2b2 b4b4 b3b3 b3b3 b4b4 b4b4 b4b4 01110001 b1b1 0 edge 1 edge Graphical Encoding of a Binary Function

51 BDD: Collapsing Redundant Nodes b2b2 b4b4 b3b3 b3b3 b4b4 b4b4 b4b4 00010000 b2b2 b4b4 b3b3 b3b3 b4b4 b4b4 b4b4 01110001 b1b1 0 edge 1 edge

52 BDD: Collapsing Redundant Nodes b2b2 b4b4 b3b3 b3b3 b4b4 b4b4 b4b4 b2b2 b4b4 b3b3 b3b3 b4b4 b4b4 b4b4 0 b1b1 1 0 edge 1 edge

53 BDD: Collapsing Redundant Nodes b2b2 b4b4 b3b3 b3b3 b2b2 b3b3 b3b3 b4b4 b4b4 0 b1b1 1 0 edge 1 edge

54 BDD: Collapsing Redundant Nodes b2b2 b4b4 b3b3 b3b3 b2b2 b3b3 b4b4 b4b4 0 b1b1 1 0 edge 1 edge

55 BDD: Eliminating Unnecessary Nodes b2b2 b4b4 b3b3 b3b3 b2b2 b3b3 b4b4 b4b4 0 b1b1 1 0 edge 1 edge

56 BDD: Eliminating Unnecessary Nodes 0 edge 1 edge b2b2 b3b3 b2b2 b3b3 b4b4 0 b1b1 1

57 BDD Representation on Disk b2b2 b3b3 b2b2 b3b3 b4b4 0 b1b1 1 2 3 4 6 5 7 chord_output/ bddbddb/ V.dom, T.dom, V.map, T.map VT.bdd # V0:2 T0:2 # b1 b2 # b3 b4 6 4 b2 b1 b4 b3 7 b4 0 1 6 b3 7 1 5 b3 0 7 4 b2 5 0 3 b2 6 5 2 b1 3 4 BDD variable order # BDD variables # internal nodes One entry per internal node of form:

58 BDD Variable Order is Important b1b1 b3b3 b4b4 01 b2b2 b 1 b 2 + b 3 b 4 b 1 < b 2 < b 3 < b 4 b 1 < b 3 < b 2 < b 4 b1b1 b3b3 b4b4 01 b2b2 b3b3 b2b2

59 chord.project.analyses.ProgramRel API void setName(String name) set name of relation void setSign(RelSign sign) set signature (domain names and order) of relation void setDoms(Dom[] doms) set domains of relation void zero() or one() initialize contents of relation to zero (no tuples) or one (all tuples) void add(T1 e1, …, TN eN) add tuple (e1, …, eN) to relation void remove(T1 e1, …, TN eN) remove tuple (e1, …, eN) from relation void save() save contents of relation to disk

60 chord.project.analyses.ProgramRel API void load() load contents of relation from disk Iterable getAryNValTuples() iterate over all tuples in the relation int size() number of tuples in the relation boolean contains(T1 e1, …, TN eN) does relation contain tuple (e1, …, eN)? RelView getView() obtain a copy of the relation upon which to do projection, selection, etc. without affecting original relation void close() free memory used to hold relation

61 Pointer Analysis Answers which pointers can point to which objects at run-time Central to many program optimization & verification problems Problem is undecidable No exact (i.e. both sound and complete) solution But many conservative (i.e. sound) approximate solutions exist Determine which pointers may point to which objects All are incomplete but differ in precision (i.e. false-positive rate) Continues to be active area of research

62 Example class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; } } 0 List Bldg Event List events floors Obj[] elems Obj[] elems Floor 0 1 Event 1 b el fl f e ef a a disjoint-reach( el, fl )?

63 0-CFA Pointer Analysis for Java Flow sensitivity flow-insensitive: ignores intra-procedural control flow Call graph construction Heap abstraction Aggregate modeling Context sensitivity

64 Example: Flow Insensitivity class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; Event e = new Event(); el.elems[ ] = e; Floor f = new Floor(); fl.elems[ ] = f; } } for (int i = 0; i < K; i++) for (int i = 0; i < M; i++) * i * i

65 0-CFA Pointer Analysis for Java Flow sensitivity flow-insensitive: ignores intra-procedural control flow Call graph construction on-the-fly: mutually recursively with pointer analysis Heap abstraction Aggregate modeling Context sensitivity

66 Example: Call Graph (Base Case) Code deemed reachable so far … class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } for (int i = 0; i < K; i++) for (int i = 0; i < M; i++) class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; Event e = new Event(); el.elems[*] = e; Floor f = new Floor(); fl.elems[*] = f; } } reachableM(0).

67 0-CFA Pointer Analysis for Java Flow sensitivity flow-insensitive: ignores intra-procedural control flow Call graph construction on-the-fly: mutually recursively with pointer analysis Heap abstraction allocation sites: objects at same site indistinguishable Aggregate modeling Context sensitivity

68 Example: Heap Abstraction class List { Obj[] elems; List() { Obj[] a = new 6 Obj[…]; this.elems = a; } } for (int i = 0; i < K; i++) for (int i = 0; i < M; i++) class Bldg { List events, floors; static void main(String[] a) { Bldg b = new 1 Bldg(); } Bldg() { List el = new 2 List(); this.events = el; List fl = new 3 List(); this.floors = fl; Event e = new 4 Event(); el.elems[*] = e; Floor f = new 5 Floor(); fl.elems[*] = f; } }

69 v = new h … Rule for Object Allocation Sites Before: After: v new h … … v … … VH(v, h) :- reachableM(m), MobjValAsgnInst(m, v, h).

70 v1 = v2 Rule for Copy Assignments Before: After: v1 new h … … v1 new h … … VH(v1, h) :- reachableM(m), MobjVarAsgnInst(m, v1, v2), VH(v2, h). v2 new h … … v2 new h … …

71 0-CFA Pointer Analysis for Java Flow sensitivity flow-insensitive: ignores intra-procedural control flow Call graph construction on-the-fly: mutually recursively with pointer analysis Heap abstraction allocation sites: objects at same site indistinguishable Aggregate modeling instance field sensitive but array element insensitive Context sensitivity

72 b.f = v b Rule for Heap Writes Before: After: new h1 … … v new h2 … … v … … new h3 new h1 … … f new h2 new h3 … … … … b new h1 … … f f f is instance field or [*] (array element) HFH(h1, f, h2) :- reachableM(m), MputInstFldInst(m, b, f, v), VH(b, h1), VH(v, h2).

73 v = b.f v Rule for Heap Reads new h v new h2 new h … … … … … … b new h1 … … b … … new h2 new h1 … … f new h2 new h1 … … f f is instance field or [*] (array element) Before: After: VH(v, h2) :- reachableM(m), MgetInstFldInst(m, v, b, f), VH(b, h1), HFH(h1, f, h2).

74 0-CFA Pointer Analysis for Java Flow sensitivity flow-insensitive: ignores intra-procedural control flow Call graph construction on-the-fly: mutually recursively with pointer analysis Heap abstraction allocation sites: objects at same site indistinguishable Aggregate modeling instance field sensitive but array element insensitive Context sensitivity context-insensitive: ignores inter-procedural control flow (analyzes each method in single context)

75 Before: After: T n.bar() T m.foo() v.foo() Rule for Dynamically Dispatching Calls v new h … … v … … T T i i T n.bar() { …; ; …; } CHA(T, foo) = T m.foo() { … } IM(i, m) :- reachableM(n), MI(n, i), virtIM(i, m), IinvkArg0(i, v), VH(v, h), HT(h, t), CHA(t, m, m). reachableM(m) :- IM(_, m).

76 #name=cipa-0cfa-dlog.include "V.dom".include "T.dom"....bddvarorder M0xI0_F0_V0xV1_T0_H0xH1 VT(v:V0, T0) input reachableM(m:M0) FH(f:F0, h:H0) output VH(v:V0, h:H0) output HFH(h1:H0, f:F0, h2:H1) output IM(i:I0, m:M0) output... reachableM(m) :- IM(_, m).... Writing a Datalog Analysis analysis constraints (Horn clauses) solved via BDD operations input, intermediate, output program relations represented as BDDs BDD variable order program domains

77 Running a Datalog Analysis chord_output/ bddbddb/ V.dom, T.dom, V.map, T.map VT.bdd reachableM.bdd FH.bdd VH.bdd HFH.bdd IM.bdd #name=cipa-0cfa-dlog.include "V.dom".include "T.dom"....bddvarorder M0xI0_F0_V0xV1_T0_H0xH1 VT(v:V0, T0) input reachableM(m:M0) FH(f:F0, h:H0) output VH(v:V0, h:H0) output HFH(h1:H0, f:F0, h2:H1) output IM(i:I0, m:M0) output... reachableM(m) :- IM(_, m).... ant –Dchord.work.dir= –Dchord.run.analyses=cipa-0cfa-dlog run

78 Example b new 1 Bldg el new 2 List fl new 3 List e new 5 Floor new 6 Obj[] f new 4 Event eventsfloors elems [*] 1 2,3 a for (int i = 0; i < K; i++) for (int i = 0; i < M; i++) class List { Obj[] elems; List() { Obj[] a = new 6 Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new 1 Bldg(); } Bldg() { List el = new 2 List(); this.events = el; List fl = new 3 List(); this.floors = fl; Event e = new 4 Event(); el.elems[*] = e; Floor f = new 5 Floor(); fl.elems[*] = f; } } elems

79 Printing Program Relations (Command Line) Relation rVV: el! :()V@Bldg, fl! :()V@Bldg... ant –Dwork.dir= / chord_output/bddbddb –Ddlog.file=a.dlog solve.include "V.dom".include "H.dom".include "F.dom".bddvarorder... VH(v:V0, h:H0) input HFH(h1:H0, f:F0, h2:H1) input rVH(v:V0, h:H0) rVV(v1:V0, v2:V1) printtuples rVH(v, h) :- VH(v, h). rVH(v, h) :- rVH(v, h), HFH(h, _, h). rVV(v1, v2) :- v1 { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/7/1717539/slides/slide_79.jpg", "name": "Printing Program Relations (Command Line) Relation rVV: el.", "description": ":()V@Bldg, fl. :()V@Bldg... ant –Dwork.dir= / chord_output/bddbddb –Ddlog.file=a.dlog solve.include V.dom .include H.dom .include F.dom .bddvarorder... VH(v:V0, h:H0) input HFH(h1:H0, f:F0, h2:H1) input rVH(v:V0, h:H0) rVV(v1:V0, v2:V1) printtuples rVH(v, h) :- VH(v, h). rVH(v, h) :- rVH(v, h), HFH(h, _, h). rVV(v1, v2) :- v1

80 Querying Program Relations (Command Line) ant –Dwork.dir= / chord_output/bddbddb –Ddlog.file=q.dlog debug b!main:(…)@Bldg... null 1!main:(…)@Bldg 2! :()V@Bldg 3! :()V@Bldg....include "V.dom".include "H.dom".include "F.dom".bddvarorder... VH(v:V0, h:H0) input HFH(h1:H0, f:F0, h2:H1) input File H.map: File V.map: prompt> VH(0,h)? 1!main:(…)@Bldg prompt> HFH(1,_,h)? 2! :()V@Bldg 3! :()V@Bldg File q.dlog: b new 1 Bldg el new 2 List fl new 3 List e new 5 Floor new 6 Obj[] f new 4 Event eventsfloors elems [*] a elems

81 Pros and Cons of Datalog/BDDs 1.Good for rapidly crafting initial versions of analysis with focus on false positive/negative rate instead of scalability 2.Good for analyses … 1.whose constraint solving strategy is not obvious (e.g. best known alternative is chaotic iteration) 2.on data with lots of redundancy and too large to compute/store/read using Java if represented explicitly (e.g. cloning-based analyses) 3.involving few simple rules (e.g. transitive closure) 3.Bad for analyses … 1.with more complicated formulations (e.g. summary-based analyses) 2.over domains not known exactly in advance (i.e. on-the-fly analyses) 3.involving many interdependent rules (e.g. points-to analyses) 4.Unintuitive effects of BDDs on performance (e.g. k-CFA: small non-uniform k across sites worse than large uniform k)

82 Writing an Analysis in Chord Declaratively in Datalog or imperatively in Java Datalog analysis is any file that: has extension.dlog or.datalog occurs in path specified by property chord.dlog.analysis.path Java analysis is any class that: is annotated with @Chord occurs in path specified by property chord.java.analysis.path

83 Create subclass of chord.project.analyses.JavaAnalysis : Compile above class to a location in path specified by any of: @Chord(name = " my-java ", consumes = { " C1 ",..., " Cm " }, produces = { " P1 ",..., " Pn " }, namesOfTypes = { T1 ",..., Tk " }, types = { T1.class,..., Tk.class }, namesOfSigns = { " S1 ",..., " Sr " }, signs = { "... ",..., "... " }) public class MyAnalysis extends JavaAnalysis { @Override public void run() {... } } Writing a Java Analysis Property nameDefault value chord.std.java.analysis.path"chord.jar" chord.ext.java.analysis.path"" chord.java.analysis.pathconcat. of above two property values mandatory field target types not inferable otherwise relation signs not inferable otherwise

84 Chord Project Global entity for organizing all analyses and their inputs and outputs (collectively called analysis results) Computed if chord.project.Project.g() is called Consists of set of each of: analyses called tasks analysis results called targets data/control dependencies between tasks and targets Either of two kinds chosen by chord.classic=[true|false]: chord.project.ClassicProject (this tutorial) only data dependencies, can only run tasks sequentially chord.project.ModernProject (ongoing) data and control dependencies, can run tasks in parallel

85 Computing a Chord Project Compute all tasks: Each file with extension.dlog/.datalog in chord.dlog.analysis.path Each class having annotation @Chord in chord.java.analysis.path Compute all targets: Each target consumed or produced by some task Compute dependency graph: Nodes are all tasks and targets Edge from target C to task T if T consumes C Edge from task T to target P if T produces P Perform consistency checks Error if target has no type or has multiple types, error if relation has no sign, warn if target produced by multiple tasks, etc.

86 Example: Chord Project T1T2T3 T4 R1R2 R3R4 {} T1 { R1 } {} T2 { R1 } { R4} T3 { R2 } { R1, R2 } T4 { R3, R4 } Each task has form { C1, …, Cm } T { P1, …, Pn } where: – T is name of task – C1, …, Cm are names of targets consumed by the task – P1, …, Pn are names of targets produced by the task

87 Running a Java Analysis ant –Dchord.work.dir= –Dchord.run.analyses=my-java run @Chord(name = " my-java ", consumes = { " C1 ",..., " Cm " }, produces = { " P1 ",..., " Pn " } ) public class MyAnalysis extends JavaAnalysis { @Override public void run() {... } } If done bit of this analysis is 1: do nothing Else do the following in order: For each of C1, …, Cm whose done bit is 0: Recursively run unique analysis producing it Report runtime error if none or multiple such analyses exist Execute run() method of this analysis Set done bits of this analysis and P1, …, Pn to 1

88 Running a Java Analysis T1T2T3 T4 R1R2 R3R4 {} T1 { R1 } {} T2 { R1 } { R4} T3 { R2 } { R1, R2 } T4 { R3, R4 } ant –Dchord.work.dir= –Dchord.run.analyses=T1,T4 run

89 Predefined Analysis Templates JavaAnalysis ProgramDom ProgramRel DlogAnalysis RHSAnalysis ForwardRHSAnalysis BackwardRHSAnalysis BasicDynamicAnalysisDynamicAnalysis Organized in a hierarchy in package chord.project.analyses :

90 chord.project.ClassicProject API ITask getTask(String name) representation of named task Object getTrgt(String name) representation of named target ITask runTask(String name) run named task (and any needed tasks prior to it) boolean is[Task|Trgt]Done(String name) is named task/target already executed/computed? void set[Task|Trgt]Done(String name) set done bit of named task/target to 1 void reset[Task|Trgt]Done(String name) Set done bit of named task/target to 0

91 Example Java Analysis package chord.analyses.alias; @Chord(name = " cicg-java ", consumes = { " IM " }) public class CICGAnalysis extends JavaAnalysis { private ProgramRel cg; @Override public void run() { cg = (ProgramRel) ClassicProject.g().getTrgt( " IM " ); } public Set getCallees(Quad q) { if (!cg.isOpen()) cg.load(); RelView view = cg.getView(); view.selectAndDelete(0, q); Iterable res = view.getAry1ValTuples(); Set callees = new HashSet (); for (jq_Method m : res) callees.add(m); view.free(); return callees; } public void free() { if (cg.isOpen()) cg.close(); } }

92 Example Java Analysis @Chord(name = " my-java " ) public class MyAnalysis extends JavaAnalysis { @Override public void run() { ClassicProject p = ClassicProject.g(); CICGAnalysis a = (CICGAnalysis) p.getTask( " cicg-java " ); p.runTask(a); for (Quad q :...) { Set tgts = a.getCallees(q);... } a.free(); } }

93 Specialized Java Analyses ProgramDom: Consumes targets specified in @Chord annotation Produces only a single target (the defined program domain itself) run() method computes and saves domain to disk ProgramRel: Consumes targets specified in @Chord annotation, plus target of each of its program domains Produces only a single target (the defined program relation itself) run() method computes and saves relation to disk DlogAnalysis: Consumes only its declared domains and declared input relations Produces only its declared output relations run() method runs bddbddb

94 Analyses as Building Blocks 1.Modularity each analysis is written independently 2.Flexibility analyses can interact in powerful ways with other analyses (by user-specified data/control dependencies) 3.Efficiency analyses executed in demand-driven fashion results computed by each analysis automatically cached for reuse by other analyses without re-computation independent analyses automatically executed in parallel 4.Reliability result is independent of order in which analyses are run

95 Outline of Tutorial Part 1: Getting Started With Chord Program Representation Part 2: Analysis Using Datalog/BDDs Chaining Analyses Together Part 3: Context-Sensitive Analysis Dynamic Analysis

96 Context-Sensitive Analysis Respects inter-procedural control-flow to varying degrees Broadly two kinds: Bottom-Up: analyze method without any knowledge of its callers Top-Down: analyze method only in called contexts Two kinds of top-down approaches: Cloning-based (k-limited) Summary-based Fully context-sensitive approaches: Bottom-up Top-down summary-based

97 Context-Sensitive Analysis in Chord Top-down: both cloning-based and summary-based Cloning-based analysis k-CFA, k-object-sensitivity, hybrid Summary-based analysis Tabulation algorithm from Reps, Horwitz, Sagiv (POPL95)

98 Example: Context-Insensitive Analysis 1 2, 3 for (int i = 0; i < K; i++) for (int i = 0; i < M; i++) disjoint-reach( el, fl )? class Bldg { List events, floors; static void main(String[] a) { Bldg b = new 1 Bldg(); } Bldg() { List el = new 2 List(); this.events = el; List fl = new 3 List(); this.floors = fl; Event e = new 4 Event(); el.elems[*] = e; Floor f = new 5 Floor(); fl.elems[*] = f; } } class List { Obj[] elems; List() { Obj[] a = new 6 Obj[…]; this.elems = a; } } b new 1 Bldg el new 2 List fl new 3 List e new 5 Floor new 6 Obj[] f new 4 Event eventsfloors elems [*] a elems

99 Example: Cloning-Based Analysis 1 2 for (int i = 0; i < K; i++) for (int i = 0; i < M; i++) 3 2 3 disjoint-reach( el, fl )? List() { Obj[] a = new 6 Obj[…]; this.elems = a; } class List { Obj[] elems; List() { Obj[] a = new 6 Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new 1 Bldg(); } Bldg() { List el = new 2 List(); this.events = el; List fl = new 3 List(); this.floors = fl; Event e = new 4 Event(); el.elems[*] = e; Floor f = new 5 Floor(); fl.elems[*] = f; } } b new 1 Bldg el new 2 List fl new 3 List e new 5 Floor new 6 Obj[] f new 4 Event eventsfloors elems [*] a elems

100 Example: Cloning with Object Sensitivity 1 2 for (int i = 0; i < K; i++) for (int i = 0; i < M; i++) 3 b new 1 Bldg el new 2 List fl new 3 List e new 5 Floor new 6 Obj[] f new 4 Event eventsfloors elems [*] a disjoint-reach( el, fl )? new 6 Obj[] a 2 3 2 3 class Bldg { List events, floors; static void main(String[] a) { Bldg b = new 1 Bldg(); } Bldg() { List el = new 2 List(); this.events = el; List fl = new 3 List(); this.floors = fl; Event e = new 4 Event(); el.elems[*] = e; Floor f = new 5 Floor(); fl.elems[*] = f; } } List() { Obj[] a = new 6 Obj[…]; this.elems = a; } class List { Obj[] elems; List() { Obj[] a = new 6 Obj[…]; this.elems = a; } }

101 Running Cloning-based Analyses in Chord chord.ctxt.kind=[ci|cs|co] kind of context sensitivity for each method and its locals chord.inst.ctxt.kind=[ci|cs|co] kind of context sensitivity for each instance method and its locals chord.stat.ctxt.kind=[ci|cs|co] kind of context sensitivity for each static method and its locals chord.kobj.k=[1|2|…] k value to use for each object allocation site chord.kcfa.k=[1|2|…] k value to use for each method call site ant –Dchord.work.dir= –Dchord.run.analyses= run cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog

102 Output of Pointer/Call-Graph Analyses in Chord cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog rootCM (c,m): m is entry method in ctxt c CICM (c1,i,c2,m): call site i in ctxt c1 may call method m in ctxt c2 CVC (c,v,o): local v may point to object o in ctxt c of its declaring method FC (f,o): static field f may point to object o CFC (o1,f,o2): instance field f of object o1 may point to object o2 cipa_0cfa.dlog rootM IM VH FH HFH

103 Cloning-Based vs. Summary-Based Analysis Cloning-based Analysis: Flow-insensitive Notion of method contexts is somewhat arbitrary Summary-based Analysis: Flow-sensitive Notion of method contexts is defined by the user

104 Example: Thread-Escape Analysis class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; } } class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } 0 List Bldg Event List eventsfloors Obj[] elems Obj[] elems Floor 0 1 Event 1 el fl b

105 Example: Thread-Escape Analysis Elev floors p: = local= shared local(p,v): Is v reachable from single thread at p? v class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); } } 0 List Bldg Event List eventsfloors Obj[] elems Obj[] elems Floor 0 1 Event 1 el fl b

106 Example: Trivial Pointer Abstraction v p: local( p, v )? class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); } } Elev floors 0 List Bldg Event List eventsfloors Obj[] elems Obj[] elems Floor 0 1 Event 1

107 Example: Allocation Sites Pointer Abstraction p: local( p, v )? v Elev floors 0 List Bldg Event List eventsfloors Obj[] elems Obj[] elems Floor 0 1 Event 1 class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); } }

108 Example: k-CFA Pointer Abstraction p: class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); } } local( p, v )? v Elev floors 0 List Bldg Event List eventsfloors Obj[] elems Obj[] elems Floor 0 1 Event 1

109 Complexity of Static Analyses pointer abstraction max abstract values (N) trivial1 allocation sites H k-CFAH. I^k precisescalable 2-partition2 Our Static Analysis: control-flow abstraction max abstract states flow and context insensitive 1 flow sensitive context insensitive L flow and context sensitive L. 2^(N 2. F) flow and context sensitive Q. L. 4^F Challenge: an abstraction that is both precise and scalable L = program points, F = fieldsH = allocation sites, I = call sites Q = queries

110 Drawback of Existing Static Analyses Different queries require different parts of the program to be abstracted precisely But existing analyses use the same abstraction to prove all queries simultaneously existing analyses sacrifice precision and/or scalability Q 1Q 1 Q 2Q 2 abstraction A P Q 1 ? P Q 2 ? Q 1Q 1 Q 2Q 2 static analysis P

111 Insight 1: Client-Driven Static Analysis Query-driven: allows using separate abstractions for proving different queries Parametrized: parameter dictates how much precision to use for each program part for a given query static analysis abstraction A 2 static analysis abstraction A 1 P P Q 1 ? Q 1Q 1 Q 2Q 2 P Q 2 ?

112 static void main(…) { Bldg b = new Bldg(); for (*) List el = b.events; Event v = el.elems[*]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (*) Event e = new Event(); el.elems[*] = e; for (*) Floor f = new Floor(); fl.elems[*] = f; for (*) Elev t = new Elev(fl); t.start(); } List() { Obj[] a = new Obj[…]; this.elems = a; } h6: Example: Client-Driven Static Analysis (RHS) p: h1h2h3h4h5h7h6 h1: h4: h5: h3: h2: b this [*] events floors elems this [*] events floors elems e fl this f [*] elems this elems this local( p, v )? el h7: el t b v

113 Writing a Summary-Based Analysis in Chord @Chord(name = "…") public class MyAnalysis extends ForwardRHSAnalysis { @Override ICICG getCallGraph() { … } @Override Set > getInitPathEdges() { … } @Override PE getInitPathEdge(Quad q, jq_Method m, PE pe) { … } @Override PE getMiscPathEdge(Quad q, PE pe) { … } @Override PE getInvkPathEdge(Quad q, PE clr, jq_Method m, SE tgt) { … } @Override SE getSummaryEdge(jq_Method m, PE pe); @Override public boolean doMerge() { … } @Override PE getCopy(PE pe) { … } } Implement representations of path/summary edges: Create a subclass of chord.project.analyses.rhs. [ Forward | Backward ] RHSAnalysis class PE, SE implements chord.project.analyses.rhs.IEdge { @Override public boolean matchesSrcNodeOf(IEdge edge) { … } @Override public boolean mergeWith(IEdge edge) { … } }

114 Insight 2: Leveraging Dynamic Analysis P dynamic analysis Challenge: Efficiently find cheap parameter to prove query 2^H choices, most choices imprecise or unscalable Our solution: Use dynamic analysis parameter is inferred efficiently (linear in H) it can fail to prove query, but it is precise in practice and no cheaper parameter can prove query Q inputs I 1... I n static analysis abstraction A P Q? H

115 Example: Leveraging Dynamic Analysis h6: p: h1: h4: h5: h3: h2: h7: static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); } List() { Obj[] a = new Obj[…]; this.elems = a; } v 0 List Elev Bldg List events floors Obj[] elems Obj[] elems Floor 0 1 Elev floors 1 Event h1h2h3h4h5h7h6 local( p, v )?

116 Dynamic Analysis Implementation Space for Java Chord supports instrumenting bytecode at load-time and offline Implement inside a JVM Use JVMTI Instrument bytecode at load-time Instrument bytecode offline Portability dependency on specific version of specific JVM not supported by some JVMs (e.g. Android) Efficiency Flexibility no support for what is doable by bytecode instru. can change only method bytecode after class loaded Other issues not trivial to modify production JVM event handing code must be written in C/C++ must run program twice to find which classes to instru. bytecode verifier may fail at runtime

117 Writing A Dynamic Analysis in Chord import chord.project.analyses.DynamicAnalysis; @Chord(name = "…") public class MyDynamicAnalysis extends DynamicAnalysis { @Override public InstrScheme getInstrScheme() { InstrScheme s = new InstrScheme(); s.set ( );... s.set ( ); return scheme; } @Override public void initAllPasses() { … } @Override public void doneAllPasses() { … } @Override public void initPass() { … } @Override public void donePass() { … } @Override public void process ( ) { … }... @Override public void process ( ) { … } }

118 Predefined Instrumentation Events EnterMainMethod(t) EnterMethod(m, t) LeaveMethod(m, t) EnterLoop(b, t) LoopIteration(b, t) LeaveLoop(b, t) BasicBlock(b, t) Quad(p, t) [Bef|Aft]MethodCall(i, t, o) [Bef|Aft]New(h, t, o) NewArray(h, t, o) [Get|Put]staticPrimitive(e, t, b, f) [Get|Put]staticReference (e, t, b, f, o) [Get|Put]fieldPrimitive(e, t, b, f) [Get|Put]fieldReference (e, t, b, f, o) [Get|Put]aloadPrimitive(e, t, b, i) [Get|Put]aloadReference (e, t, b, i, o) [Get|Put]astorePrimitive(e, t, b, i) [Get|Put]astoreReference (e, t, b, i, o) Thread[Start|Join](i, t, o) [Acquire|Release]Lock([l|r], t, o) Wait|NotifyAny|NotifyAll(i, t, o) Dynamic IDs: t=thread ID, o=object ID (0 denotes null) Static IDs: m:M, b:B, p:P, i:I, h:H, e:E, f:F, l:L, r:R

119 Configuring Dynamic Analysis Bytecode instrumentation kind: chord.instr.kind=[online|offline] How to communicate events: chord.trace.kind=[none|pipe|full] JVMTI to start/end generating events: chord.use.jvmti=[true|false] Reuse traces from older Chord run: chord.reuse.traces=[true|false] in same JVM as that running instrumented program Pro: can inspect state Con: either exclude JDK from instrumentation or dont use it in event handling code, to avoid correctness or performance problems in separate JVM after JVM running instrumented program finishes Con: infeasible for long- running programs which generate lots of events, since all events are stored in a (binary) file on disk in separate JVM in parallel with JVM running instrumented program Best option: uses buffered POSIX pipe to communicate events between event- generating JVM and event-handling JVM

120 Architecture of Dynamic Analysis in Chord chord.project.analyses.BasicDynamicAnalysis workhorse run() method: configures and runs dynamic analysis chord.project.analyses.DynamicAnalysis provides interface to handle predefined instrumentation events chord.instr.BasicInstrumentor provides interface to instrument various parts of a Java program chord.instr.Instrumentor instruments predefined events chord.runtime.BasicEventHandler starts/stops one-JVM dynamic analysis and maintains object IDs chord.runtime.TraceEventHandler starts/stops two-JVM dynamic analysis chord.runtime.EventHandler writes predefined events to buffer encapsulating trace file

121 Combining Static and Dynamic Analysis Static followed by Dynamic reduce instrumentation overhead of dynamic Dynamic followed by Static Counterexamples: query is false on some input Likely invariants: a query true on some inputs is likely true on all inputs [Ernst 2001] Proofs: a query true on some inputs is likely true on all inputs and for likely the same reason [this talk] Static and Dynamic interleaved Yogi, concolic testing (EXE, DART, CUTE, SAGE)

122 Benchmark Characteristics classes methods (x 1000) bytecodes (x 1000) allocation sites (x 1000) queries (x 1000) hedc3091.91511.90. 6 weblech5323.12303.00.7 lusearch6113.82673.57.2 hsqldb7716.44725.114.4 avrora14985. 93125.914.4 sunflow9926.64786.110.0

123 Benchmark Characteristics classes methods (x 1000) bytecodes (x 1000) allocation sites (x 1000) queries (x 1000) hedc3091.91511.90. 6 weblech5323.12303.00.7 lusearch6113.82673.57.2 hsqldb7716.44725.114.4 avrora14985. 93125.914.4 sunflow9926.64786.110.0

124 Precision Comparison Previous ApproachOur Approach Pointer abstraction: Allocation sites Control abstraction: Flow insensitive Context insensitive Pointer abstraction: 2-partition Control abstraction: Flow sensitive Context sensitive

125 Precision Comparison Previous ApproachOur Approach Previous scalable approach resolves 27% of queries Our approach resolves 82% of queries 55% of queries are proven thread-local 27% of queries are observed thread-shared

126 Running Time Breakdown baseline static analysis our approach dynamic analysis static analysis total per query group meanmax hedc24s6s38s1s2s weblech39s8s1m2s4s lusearch43s31s8m3s6s hsqldb1m08s35s86m11s21s avrora1m00s32s41m5s8s sunflow1m18s3m74m9s19s

127 Sparsity of Our Abstraction total # sites # sites set to all queriesproven queries meanmaxmeanmax hedc1,9143.2121.45 weblech2,9582.281.55 lusearch3,5492.2181.518 hsqldb5,0562.7561.35 avrora5,92312.11952.331 sunflow6,0532.2181.315

128 Related Open-Source Projects JikesRVM: Java Research Virtual Machine Soot + Paddle: Static analysis and transformation framework for Java bytecode IBM WALA: Static analysis framework for Java bytecode and related languages RoadRunner (Flanagan & Freund): Dynamic analysis framework for Java concurrency

129 Acknowledgments Joeq: Static analysis and transformation framework for Java bytecode Javassist: Java bytecode manipulation framework bddbddb: BDD-based Datalog solver

130 Further Information Chord homepage: http://jchord.googlecode.com/ Chord user guide: http://chord.stanford.edu/user_guide/ Chord questions: chord-discuss@googlegroups.com

131 Thank You!


Download ppt "Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial."

Similar presentations


Ads by Google