Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chord: A Program Analysis Platform for Java

Similar presentations


Presentation on theme: "Chord: A Program Analysis Platform for Java"— Presentation transcript:

1 Chord: A Program Analysis Platform for Java
CS 6340

2 What is Chord? Static and dynamic program analysis framework for Java
Started in 2006 as static Checker of races and deadlocks Publicly available under New BSD License Key goals: versatile: applies to various analyses, domains, platforms extensible: users can build own analyses atop given ones productive: facilitates rapid prototyping of analyses robust: deterministic, handles partial programs, etc. 1. versatility: a) analyses – static, dynamic, imperatively in Java or declaratively in Datalog, summary or cloning, client-driven, iterative refinement, combined static/dynamic b) domains: parallel, mobile, cloud, verification, testing, security, performance c) platforms: Android & Hadoop; highly portable (no dependence on OS or JVM, does not require Eclipse) 2. extensibility 3. productivity: many program analysis templates are offered (e.g. RHS, Datalog, etc.) 4. robustness: conglomeration of tools that are reasonably efficient but robust, results deterministic across different runs, etc.

3 Key Features of Chord Many standard static and dynamic analyses
Writing/solving analyses using Datalog/BDDs Analyses as “building blocks” Context-sensitive static analysis framework Dynamic analysis framework

4 Outline of Lecture Getting Started with Chord Program Representation
Analysis Using Datalog/BDDs Chaining Analyses Together Context-Sensitive Analysis

5 Downloading Chord Stable Binary Release Stable Source Release
Stable Source Release (mandatory) Chord’s source code + JARs of libraries used by Chord (optional) (adapted) Java source code of libraries used by Chord Latest Development Snapshot svn checkout chord Or checkout only relevant directories under trunk/: main/ (released as 1 above) libsrc/ (released as 2 above) test/ (Chord’s regression test suite) … (many more)

6 Compiling Chord Requirements: Optional: edit chord.properties
JVM for Java 5 or higher Apache Ant C++ compiler (not needed by default) Optional: edit chord.properties to enable C BuDDy library: set chord.use.buddy=true to enable C++ JVMTI agent: set chord.use.jvmti=true Run in main directory: ant compile main/ build.xml chord.properties agent/ bdd/ doc/ examples/ lib/ src/ web/ chord.jar libbuddy.so | buddy.dll | libbuddy.dylib libchord_instr_agent.so

7 Running Chord Requirements: JVM for Java 5 or higher
no other dependencies (e.g., Eclipse) Run either command in any directory: ant –f <...>/build.xml [–Dkeyi=vali]* run requires Apache Ant not available in Binary Release java –cp <…>/chord.jar [–Dkeyi=vali]* chord.project.Boot where <…> denotes path of Chord’s main/ directory –Dkeyi=vali sets value of system property keyi to vali

8 Chord Properties All inputs to Chord are specified via System Properties conventionally named chord.* (e.g., chord.work.dir) Three choices with decreasing precedence: On command line via –Dkey=val format use to specify properties specific to the current Chord run Via user-specified file denoted by chord.props.file use to specify properties specific to program being analyzed (e.g. its main class, classpath, etc.) default value = "[chord.work.dir]/chord.properties" Via pre-defined file main/chord.properties use to specify properties that must hold in every Chord run (e.g., maximum memory to be used by JVM)

9 example program analysis
Architecture of Chord starts, blocks on D1 resumes, runs to finish starts, runs to finish Classic or Modern Runtime bytecode translator (joeq) bytecode instrumentor (javassist) saxon XSLT bddbddb BuDDy Java2HTML static analysis Datalog analysis dynamic analysis program bytecode domain D1 relation R12 relation R1 domain D2 relation R2 analysis result in XML analysis result in HTML program source program quadcode relation R12 analysis program inputs domain D1 analysis domain D2 analysis example program analysis Java program starts, runs to finish starts, blocks on D1 resumes, runs to finish starts, blocks on D1, D2, R1, R12 resumes, runs to finish user demands this to run resumes, runs to finish starts, blocks on R2, D2

10 Setting Up a Java Program for Analysis
example/ src/ foo/ Main.java classes/ foo/ Main.class lib/ src/ taz/ jar/ taz.jar chord.properties chord_output/ bddbddb/ Command to run in Chord’s main directory: ant –Dchord.work.dir=<…>/example run chord.main.class=foo.Main chord.class.path=classes:lib/jar/taz.jar chord.src.path=src:lib/src chord.run.ids=0,1 chord.args.0="-thread 1 -n 10" chord.args.1="-thread 2 -n 50"

11 Outline of Lecture Getting Started with Chord Program Representation
Analysis Using Datalog/BDDs Chaining Analyses Together Context-Sensitive Analysis

12 Java Program Representations
Java source code .java javac Java bytecode .class javap Disassembled Java bytecode

13 Example: Java Source Code
File test/HelloWorld.java: 1: package test; 2: 3: public class HelloWorld { 4: public static void main(String[] args) { 5: System.out.print("Hello World!"); 6: } 7: }

14 Pretty-Printing Java Bytecode
javap –private –verbose –classpath <CLASS_PATH> [–bootclasspath <BOOT_CLASS_PATH>] <CLASS_NAME> public class test.HelloWorld extends java.lang.Object Constant pool: const #1 = Method #6.#20; // java/lang/Object."<init>":()V public static void main(java.lang.String[]); Code: Stack=2, Locals=1, Args_size=1 0: getstatic #2; // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3; // String Hello World! 5: invokevirtual #4; // Method java/io/PrintStream.println: : return SourceFile: "HelloWorld.java" Run "javac –g" on .java files to keep debug info (lines, vars, source) in .class files LineNumberTable: line 5: 0 line 6: 8 LocalVariableTable: Start Length Slot Name Signature args [Ljava/lang/String;

15 Java Program Representations
Java source code .java javac Java bytecode .class Joeq Quadcode javap Disassembled Java bytecode

16 Pretty-Printing Quadcode
ant –Dchord.work.dir=<WORK_DIR> –Dchord.out.file=<OUTPUT_FILE> –Dchord.print.classes=<CLASS_NAMES> –Dchord.verbose=0 run Class: test.HelloWorld Method: #1 5#3 5#2 8#4 Control flow graph: BB0 (ENTRY) (in: <none>, out: BB2) BB2 (in: BB0 (ENTRY), out: BB1 (EXIT)) 1: GETSTATIC_A T1, .out 3: MOVE_A T2, AConst: "Hello World!" 2: INVOKEVIRTUAL_V (T1,T2) 4: RETURN_V BB1 (EXIT) (in: BB2, out: <none>) Exception handlers: [] Register factory: Registers: 3 Alternative options: –Dchord.print.methods=<METHOD_SIGNS> –Dchord.print.all.classes=true Replace any `$` by `#` to prevent shell interpretation

17 (all defined in package joeq.Class)
Type Hierarchy jq_Type jq_Primitive jq_Reference jq_Class jq_Array (all defined in package joeq.Class)

18 chord.program.Program API
static Program g() fully-qualified name of the class, e.g., "java.lang.String[]" IndexSet<jq_Type> getTypes() all types in classes that may be loaded IndexSet<jq_Reference> getClasses() all classes that may be loaded IndexSet<jq_Method> getMethods() all methods that may be called

19 joeq.Class.jq_Class API
String getName() fully-qualified name of the class, e.g., "java.lang.String[]" jq_InstanceField[] getDeclaredInstanceFields() all instance fields declared in the class jq_StaticField[] getDeclaredStaticFields() all static fields declared in the class jq_InstanceMethod[] getDeclaredInstanceMethods() all instance methods declared in the class jq_StaticMethod[] getDeclaredStaticMethods() all static methods declared in the class

20 joeq.Class.jq_Method API
String getName().toString() name of the method String getDesc().toString() descriptor of the method, e.g., "(Ljava/lang/String;)V" jq_Class getDeclaringClass() declaring class of the method ControlFlowGraph getCFG() control-flow graph of the method Quad getQuad(int bci) first quad at the given bytecode offset (null if missing) int getLineNumber(int bci) line number of the given bytecode offset (-1 if missing) String toString() ID of the method in

21 Control Flow Graphs (CFGs)
Each CFG contains: a set of registers (register factory) a directed graph whose nodes are basic blocks and edges denote control flow Register Factory: one register per argument (local variables) named R0, R1, …, Rn one register per temporary (stack variables) named Tn+1, Tn+2, …, Tm Basic Block (BB): sequence of primitive statements (quads) unique entry BB: no quads and no incoming edges unique exit BB: no quads and no outgoing edges

22 joeq.Compiler.Quad.ControlFlowGraph API
RegisterFactory getRegisterFactory() set of all local variables EntryOrExitBasicBlock entry() unique entry basic block EntryOrExitBasicBlock exit() unique exit basic block List<BasicBlock> reversePostOrder () List of all basic blocks in reverse post-order jq_Method getMethod() containing method of the CFG

23 joeq.Compiler.Quad.BasicBlock API
int size() number of quads in the basic block Quad getQuad(int index) quad at the given 0-based index List<BasicBlock> getPredecessors() list of immediate predecessor basic blocks List<BasicBlock> getSuccessors() list of immediately successor basic blocks jq_Method getMethod() containing method of the basic block

24 Quad Instructions Each quad contains an operator and upto 4 operands
Example: getfield l = b.f: Operand lo = Getfield.getDest(q); Operand bo = Getfield.getBase(q); if (lo instanceof RegisterOperand && bo instanceof RegisterOperand) { Register l = ((RegisterOperand) lo).getRegister(); Register b = ((RegisterOperand) bo).getRegister(); jq_Field f = Getfield.getField(q).getField(); }

25 Kinds of Quads joeq.Compiler.Quad.Operator
Move Getstatic Branch Invoke Phi Putstatic IntIfCmp InvokeVirtual Unary Getfield Goto InvokeStatic Binary Putfield Jsr InvokeInterface New ALoad Ret NewArray AStore LookupSwitch MultiNewArray Checkcast TableSwitch Alength Instanceof Monitor Return

26 joeq.Compiler.Quad.Quad API
Operator getOperator() kind of the quad int getBCI() bytecode offset of the quad in its containing method String toByteLocStr() unique identifier of the quad in String toJavaLocStr() location of the quad in format fileName:lineNum in Java source code String toLocStr() location of the quad in both Java bytecode and source code String toVerboseStr() verbose description of the quad (its location plus contents) BasicBlock getBasicBlock() containing basic block of the quad

27 Traversing Quadcode import chord.program.Program; import joeq.Class.jq_Method; import joeq.Compiler.Quad.*; QuadVisitor qv = new QuadVisitor.EmptyVisitor() { public void visitNew(Quad q) { ... } public void visitPhi(Quad q) { ... } }; Program program = Program.g(); for (jq_Method m : program.getMethods()) { if (!m.isAbstract()) { ControlFlowGraph cfg = m.getCFG(); for (BasicBlock bb : cfg.reversePostOrder()) for (Quad q : bb.getQuads()) q.accept(qv); } }

28 Java Program Representations
Java source code .java HTMLized Java source code .html j2h Java2HTML javac Java bytecode .class Joeq Quadcode javap Disassembled Java bytecode

29 HTMLizing Java Source Code
Programmatically: import chord.program.Program; Program program = Program.g(); program.HTMLizeJavaSrcFiles(); From command line: Use j2h: ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_xref Use Java2HTML: ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_fast

30 Java Program Representations
Java source code .java HTMLized Java source code .html j2h Java2HTML javac Java bytecode .class Joeq Quadcode javap Jasmin Chord Disassembled Java bytecode Jasmin code .j

31 Analysis Scope Construction
Determines which parts of the program to analyze Computed in either of these cases: chord.build.scope=true chord.program.Program.g() is called Algorithm specified by chord.scope.kind=[rta|cha|dynamic] Rapid Type Analysis (RTA) Class Hierarchy Analysis (CHA) Dynamic Analysis All three algorithms require specifying: chord.main.class=<MAIN CLASS> chord.class.path=<CLASSPATH>

32 Analysis Scope Representation
Reachable Methods stored in file specified by chord.methods.file (default = "[chord.out.dir]/methods.txt") Resolved Reflection stored in file specified by chord.reflect.file (default = "[chord.out.dir]/reflect.txt") ... # resolvedClsForNameSites # resolvedObjNewInstSites # resolvedConNewInstSites # resolvedAryNewInstSites Class Class.forName(String) Object Class.newInstance() Object Constructor.newInstance(Object[]) Object Array.newInstance(Class, int)

33 Rapid Type Analysis (RTA)
Preferred (and default) scope construction algorithm Allows specifying reflection resolution via chord.reflect.kind=[none|static|dynamic] Preferred way to resolve reflection is ‘dynamic’ and requires specifying how to run program: chord.run.args=id1,…,idN chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN>

34 Dynamic Analysis Based Scope Construction
Runs program and observes which classes are loaded Requires JVMTI (set chord.use.jvmti=true in file main/chord.properties) Requires specifying how to run program: chord.run.args=id1,…,idN chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN> All methods of each loaded class are deemed reachable Currently no support for reflection resolution

35 Additional Analysis Scope Features
Scope Reuse Enables using scope constructed by a previous run of Chord Constructs scope from files specified by chord.methods.file and chord.reflect.file Specified via chord.reuse.scope=true Scope Exclusion Enables excluding certain classes from scope Treats all methods in such classes as no-ops Specified via three properties: 1. chord.std.scope.exclude (default = "") 2. chord.ext.scope.exclude (default = "") 3. chord.scope.exclude (default = "[chord.std.scope.exclude],[chord.ext.scope.exclude]")

36 start:()V@java.lang.Thread chord.program.stubs.ThreadStartCFGBuilder
Native Method Stubs Specified in file main/src/chord/program/stubs/stubs.txt in format: stub_cname where stub_cname denotes a class implementing: public interface joeq.Compiler.Quad.ICFGBuilder { public ControlFlowGraph run(jq_Method m); } Example: chord.program.stubs.ThreadStartCFGBuilder

37 Example Native Method Stub
void start() { this.run(); return; } public ControlFlowGraph run(jq_Method m) { jq_Class c = m.getDeclaringClass(); jq_Method n = c.getDeclaredInstanceMethod( new jq_NameAndDesc("run", "()V")); RegisterFactory f = new RegisterFactory(0, 1); Register r = f.getOrCreateLocal(0, c); ControlFlowGraph cfg = new ControlFlowGraph(m, 1, 0, f); Quad q1 = Invoke.create(0, m, Invoke.INVOKEVIRTUAL_V.INSTANCE, null, new MethodOperand(n), 1); Invoke.setParam(q1, 0, new RegisterOperand(r, c)); Quad q2 = Return.create(1, m, RETURN_V.INSTANCE); BasicBlock bb = cfg.createBasicBlock(1, 1, 2, null); bb.appendQuad(q1); bb.appendQuad(q2); BasicBlock eb = cfg.entry(), xb = cfg.exit(); eb.addSuccessor(bb); bb.addPredecessor(eb); bb.addSuccessor(xb); xb.addPredecessor(bb); return cfg; }

38 Outline of Lecture Getting Started with Chord Program Representation
Analysis Using Datalog/BDDs Chaining Analyses Together Context-Sensitive Analysis

39 Program Domain Building block for analyses based on Datalog/BDDs
Represents an indexed set of values of a fixed kind typically artifacts from program being analyzed (e.g., set of all methods in the program) Assigns unique 0-based index to each value everything in Datalog/BDDs must be numbered indices given in order in which values are added order affects efficiency of running analysis on large sets initial indices (0, 1, ...) typically given to frequently-used values (e.g., the main method) O(1) access to value given index, and vice versa

40 Example Predefined Program Domains
Name Description Defining Class T types chord.analyses.type.DomT M methods chord.analyses.method.DomM F fields chord.analyses.field.DomF V variables of ref type chord.analyses.var.DomV P quads (program points) chord.analyses.point.DomP H object allocation quads chord.analyses.alloc.DomH I method call quads chord.analyses.invk.DomI E heap-accessing quads chord.analyses.heapacc.DomE A abstract threads chord.analyses.alias.DomA C abstract method contexts chord.analyses.alias.DomC O abstract objects chord.analyses.alias.DomO

41 Writing a Program Domain Analysis
package chord.analyses.method; @Chord(name = "M") public class DomM extends ProgramDom<jq_Method> { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); } } Domain M: all methods in the program main method has index 0 java.lang.Thread.start() method has index 1

42 Running a Program Domain Analysis
package chord.analyses.method; @Chord(name = "M") public class DomM extends ProgramDom<jq_Method> { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); } } ant –Dchord.work.dir=<…> –Dchord.run.analyses=M run

43 Running a Program Domain Analysis
package chord.analyses.method; @Chord(name = "M") public class DomM extends ProgramDom<jq_Method> { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); } } chord_output/ bddbddb/ M.map M.dom <N> M <N> M.map

44 chord.project.analyses.ProgramDom<T> API
void setName(String name) set name of domain boolean add(T val) add value to domain if not present; return true if added int getOrAdd(T val) add value to domain if not present; return its index in either case void save() save domain to disk (.dom and .map files) String toUniqueString(T val) unique string representation of value int size() number of values in domain T get(int index) value having the given index; IndexOutofBoundsEx if not found int indexOf(T val) index of given value; -1 if not found Note: values once added cannot be removed!

45 Program Relation Building block for analyses based on Datalog/BDDs
Represents a set of tuples over one or more fixed program domains Represented symbolically as a BDD enables storing and manipulating large relations efficiently Provides various relational operations projection, selection, join, etc. BDD size and efficiency of operations depends heavily on encoding of relation content as opposed to size ordering of values within program domains relative ordering between program domains

46 Writing a Program Relation Analysis
package chord.analyses.invk; @Chord(name = "MI", sign = "M0,I0:M0_I0") public class RelMI extends ProgramRel { @Override public void fill() { DomI domI = (DomI) doms[1]; for (Quad q : domI) { jq_Method m = q.getMethod(); add(m, q); } } } Relation MI: tuples (m, i) such that method m contains call i M0,I0: Domain names Order mnemonically (hard to change over time) Suffix 0, 1, etc. distinguishes repeating domains M0_I0: Domain order Only dictates performance Can also be I0_M0 or I0xM0 Easy to change over time

47 Writing a Program Relation Analysis
package chord.analyses.var; @Chord(name = "VT", sign = "V0,T0:T0_V0") public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } } } Relation VT: tuples (v, t) such that local variable v has type t

48 Running a Program Relation Analysis
package chord.analyses.var; @Chord(name = "VT", sign = "V0,T0:T0_V0") public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } } } ant –Dchord.work.dir=<…> –Dchord.run.analyses=VT run

49 Running a Program Relation Analysis
package chord.analyses.var; @Chord(name = "VT", sign = "V0,T0:T0_V0") public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } } } # V0:2 T0:2 # 1 2 # chord_output/ bddbddb/ V.dom, T.dom, V.map, T.map VT.bdd

50 Program Relation as Binary Function
V T b1 b2 b3 b4 f 1 Variable v0 has types t1, t2, t3 Variable v1 has type t3 Variable v2 has type t3 Relation VT = { (0, 1), (0, 2), (0, 3), (1, 3), (2, 3) }

51 BDD: Binary Decision Diagrams (Bryant 1986)
0 edge 1 edge b2 b2 b3 b3 b3 b3 b4 b4 b4 b4 b4 b4 b4 b4 1 1 1 1 1 Graphical Encoding of a Binary Function

52 BDD: Collapsing Redundant Nodes
0 edge 1 edge b2 b2 b3 b3 b3 b3 b4 b4 b4 b4 b4 b4 b4 b4 1 1 1 1 1

53 BDD: Collapsing Redundant Nodes
0 edge 1 edge b2 b2 b3 b3 b3 b3 b4 b4 b4 b4 b4 b4 b4 b4 1

54 BDD: Collapsing Redundant Nodes
0 edge 1 edge b2 b2 b3 b3 b3 b3 b4 b4 b4 1

55 BDD: Collapsing Redundant Nodes
0 edge 1 edge b2 b2 b3 b3 b3 b4 b4 b4 1

56 BDD: Eliminating Unnecessary Nodes
0 edge 1 edge b2 b2 b3 b3 b3 b4 b4 b4 1

57 BDD: Eliminating Unnecessary Nodes
0 edge 1 edge b2 b2 b3 b3 b4 1

58 BDD Representation on Disk
2 chord_output/ bddbddb/ V.dom, T.dom, V.map, T.map VT.bdd b1 3 4 b2 b2 5 # V0:2 T0:2 # b1 b2 # b3 b4 6 4 b2 b1 b4 b3 7 b b b b b b1 3 4 b3 b3 6 # internal nodes 7 # BDD variables b4 1 BDD variable order One entry per internal node of form: <nodeId, varId, loNodeId, hiNodeId>

59 BDD Variable Order is Important
b1b2 + b3b4 b1 b3 b4 1 b2 b1 b3 b4 1 b2 b1 < b2 < b3 < b4 b1 < b3 < b2 < b4

60 chord.project.analyses.ProgramRel<T> API
void setName(String name) set name of relation void setSign(RelSign sign) set signature (domain names and order) of relation void setDoms(Dom[] doms) set domains of relation void zero() or one() initialize contents of relation to zero (no tuples) or one (all tuples) void add(T1 e1, …, TN eN) add tuple (e1, …, eN) to relation void remove(T1 e1, …, TN eN) remove tuple (e1, …, eN) from relation void save() save contents of relation to disk

61 chord.project.analyses.ProgramRel<T> API
void load() load contents of relation from disk Iterable<T1,…,TN> getAryNValTuples() iterate over all tuples in the relation int size() number of tuples in the relation boolean contains(T1 e1, …, TN eN) does relation contain tuple (e1, …, eN)? RelView getView() obtain a copy of the relation upon which to do projection, selection, etc. without affecting original relation void close() free memory used to hold relation

62 Example: Pointer Analysis
class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; } } class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } b Bldg events floors el List List fl elems elems Obj[] Obj[] a a 1 1 disjoint-reach(el, fl)? Event Event Floor Floor e e f f

63 Example: Call Graph (Base Case)
class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; Event e = new Event(); el.elems[*] = e; Floor f = new Floor(); fl.elems[*] = f; } } class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; } } for (int i = 0; i < K; i++) Code deemed reachable so far … for (int i = 0; i < M; i++) reachableM(0).

64 Example: Heap Abstraction
class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; } } class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; } } for (int i = 0; i < K; i++) for (int i = 0; i < M; i++)

65 Rule for Object Allocation Sites
Before: After: v newh’ v = newh … newh’ v newh VH(v, h) :- reachableM(m), MobjValAsgnInst(m, v, h).

66 Rule for Copy Assignments
Before: After: v1 newh’ v2 newh v1 = v2 newh’ v1 v2 newh newh VH(v1, h) :- reachableM(m), MobjVarAsgnInst(m, v1, v2), VH(v2, h).

67 Rule for Heap Writes Before: After: … … … … … … … … … … … … … …
newh1 v newh2 newh1 newh3 b.f = v f is instance field or [*] (array element) newh3 f b newh1 v newh2 newh1 f newh2 HFH(h1, f, h2) :- reachableM(m), MputInstFldInst(m, b, f, v), VH(b, h1), VH(v, h2).

68 Rule for Heap Reads Before: After: … … … … … … … … … … … … … …
v newh b newh1 newh1 newh2 v = b.f f is instance field or [*] (array element) newh f v b newh1 newh1 newh2 newh2 VH(v, h2) :- reachableM(m), MgetInstFldInst(m, v, b, f), VH(b, h1), HFH(h1, f, h2).

69 Rule for Dynamically Dispatching Calls
Before: After: v newh T i CHA(T, foo) = Tm.foo() { … } Tn.bar() { …; ; …; } v.foo() v newh T i Tn.bar() Tm.foo() Tm.foo() { … } IM(i, m) :- reachableM(n), MI(n, i), virtIM(i, m’), IinvkArg0(i, v), VH(v, h), HT(h, t), CHA(t, m’, m). reachableM(m) :- IM(_, m).

70 Writing a Datalog Analysis
#name=cipa-0cfa-dlog .include "V.dom" .include "T.dom" ... .bddvarorder M0xI0_F0_V0xV1_T0_H0xH1 VT(v:V0, T0) input reachableM(m:M0) FH(f:F0, h:H0) output VH(v:V0, h:H0) output HFH(h1:H0, f:F0, h2:H1) output IM(i:I0, m:M0) output reachableM(m) :- IM(_, m). ... program domains BDD variable order input, intermediate, output program relations represented as BDDs analysis constraints (Horn clauses) solved via BDD operations

71 Running a Datalog Analysis
#name=cipa-0cfa-dlog .include "V.dom" .include "T.dom" ... .bddvarorder M0xI0_F0_V0xV1_T0_H0xH1 VT(v:V0, T0) input reachableM(m:M0) FH(f:F0, h:H0) output VH(v:V0, h:H0) output HFH(h1:H0, f:F0, h2:H1) output IM(i:I0, m:M0) output reachableM(m) :- IM(_, m). ... chord_output/ bddbddb/ V.dom, T.dom, V.map, T.map VT.bdd reachableM.bdd FH.bdd VH.bdd HFH.bdd IM.bdd ant –Dchord.work.dir=<…> –Dchord.run.analyses=cipa-0cfa-dlog run

72 for (int i = 0; i < K; i++) for (int i = 0; i < M; i++)
Example class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; } } class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; } } 1 b 2,3 new1 Bldg for (int i = 0; i < K; i++) el fl events floors for (int i = 0; i < M; i++) new2 List new3 List elems elems new6 Obj[] [*] a [*] new4 Event new5 Floor f e

73 Printing Program Relations (Command Line)
ant –Dwork.dir=<…>/chord_output/bddbddb –Ddlog.file=a.dlog solve Relation rVV: ... disjoint-reach(el, fl)? b File a.dlog: .include "V.dom" .include "H.dom" .include "F.dom" .bddvarorder ... VH(v:V0, h:H0) input HFH(h1:H0, f:F0, h2:H1) input rVH(v:V0, h:H0) rVV(v1:V0, v2:V1) printtuples rVH(v, h) :- VH(v, h). rVH(v, h) :- rVH(v, h’), HFH(h’, _, h). rVV(v1, v2) :- v1<v2, rVH(v1, h), rVH(v2, h). new1 Bldg el fl events floors new2 List new3 List elems elems new6 Obj[] [*] a [*] new4 Event new5 Floor f e

74 Querying Program Relations (Command Line)
ant –Dwork.dir=<…>/chord_output/bddbddb –Ddlog.file=q.dlog debug .include "V.dom" .include "H.dom" .include "F.dom" .bddvarorder ... VH(v:V0, h:H0) input HFH(h1:H0, f:F0, h2:H1) input File q.dlog: b new1 Bldg el fl events floors File V.map: new2 List new3 List ... prompt> VH(0,h)? prompt> HFH(1,_,h)? elems elems File H.map: new6 Obj[] null ... [*] a [*] new4 Event new5 Floor f e

75 Pros and Cons of Datalog/BDDs
Good for rapidly crafting initial versions of analysis with focus on false positive/negative rate instead of scalability Good for analyses … whose constraint solving strategy is not obvious (e.g. best known alternative is chaotic iteration) on data with lots of redundancy and too large to compute/store/read using Java if represented explicitly (e.g. cloning-based analyses) involving few simple rules (e.g. transitive closure) Bad for analyses … with more complicated formulations (e.g. summary-based analyses) over domains not known exactly in advance (i.e. on-the-fly analyses) involving many interdependent rules (e.g. points-to analyses) Unintuitive effects of BDDs on performance (e.g. k-CFA: small non-uniform k across sites worse than large uniform k)

76 Outline of Lecture Getting Started with Chord Program Representation
Analysis Using Datalog/BDDs Chaining Analyses Together Context-Sensitive Analysis

77 Writing an Analysis in Chord
Declaratively in Datalog or imperatively in Java Datalog analysis is any file that: has extension .dlog or .datalog occurs in path specified by property chord.dlog.analysis.path Java analysis is any class that: is annotated occurs in path specified by property chord.java.analysis.path

78 Writing a Java Analysis
Create subclass of chord.project.analyses.JavaAnalysis: Compile above class to a location in path specified by any of: @Chord(name = "my-java", consumes = { "C1", ..., "Cm" }, produces = { "P1", ..., "Pn" }, namesOfTypes = { “T1", ..., “Tk" }, types = { T1.class, ..., Tk.class }, namesOfSigns = { "S1", ..., "Sr" }, signs = { "...", ..., "..." }) public class MyAnalysis extends JavaAnalysis { @Override public void run() { ... } } mandatory field target types not inferable otherwise relation signs not inferable otherwise Property name Default value chord.std.java.analysis.path "chord.jar" chord.ext.java.analysis.path "" chord.java.analysis.path concat. of above two property values

79 Chord Project Global entity for organizing all analyses and their inputs and outputs (collectively called analysis results) Computed if chord.project.Project.g() is called Consists of set of each of: analyses called tasks analysis results called targets data/control dependencies between tasks and targets Either of two kinds chosen by chord.classic=[true|false]: chord.project.ClassicProject (this tutorial) only data dependencies, can only run tasks sequentially chord.project.ModernProject (ongoing) data and control dependencies, can run tasks in parallel

80 Computing a Chord Project
Compute all tasks: Each file with extension .dlog/.datalog in chord.dlog.analysis.path Each class having in chord.java.analysis.path Compute all targets: Each target consumed or produced by some task Compute dependency graph: Nodes are all tasks and targets Edge from target C to task T if T consumes C Edge from task T to target P if T produces P Perform consistency checks Error if target has no type or has multiple types, error if relation has no sign, warn if target produced by multiple tasks, etc.

81 Example: Chord Project
Each task has form { C1, …, Cm } T { P1, …, Pn } where: T is name of task C1, …, Cm are names of targets consumed by the task P1, …, Pn are names of targets produced by the task T1 T2 T3 {} T1 { R1 } {} T2 { R1 } { R4} T3 { R2 } { R1, R2 } T4 { R3, R4 } R1 R2 T4 R3 R4

82 Running a Java Analysis
ant –Dchord.work.dir=<…> –Dchord.run.analyses=my-java run @Chord(name = "my-java", consumes = { "C1", ..., "Cm" }, produces = { "P1", ..., "Pn" } ) public class MyAnalysis extends JavaAnalysis { @Override public void run() { ... } } If done bit of this analysis is 1: do nothing Else do the following in order: For each of C1, …, Cm whose done bit is 0: Recursively run unique analysis producing it Report runtime error if none or multiple such analyses exist Execute run() method of this analysis Set done bits of this analysis and P1, …, Pn to 1

83 Running a Java Analysis
T1 T2 T3 {} T1 { R1 } {} T2 { R1 } { R4} T3 { R2 } { R1, R2 } T4 { R3, R4 } R1 R2 T4 R3 R4 ant –Dchord.work.dir=<…> –Dchord.run.analyses=T1,T4 run

84 Predefined Analysis Templates
Organized in a hierarchy in package chord.project.analyses: ProgramDom ProgramRel DlogAnalysis JavaAnalysis ForwardRHSAnalysis Users can override all these analysis templates except DlogAnalysis RHSAnalysis BackwardRHSAnalysis BasicDynamicAnalysis DynamicAnalysis

85 chord.project.ClassicProject API
ITask getTask(String name) representation of named task Object getTrgt(String name) representation of named target ITask runTask(String name) run named task (and any needed tasks prior to it) boolean is[Task|Trgt]Done(String name) is named task/target already executed/computed? void set[Task|Trgt]Done(String name) set ‘done’ bit of named task/target to 1 void reset[Task|Trgt]Done(String name) Set ‘done’ bit of named task/target to 0

86 Example Java Analysis package chord.analyses.alias; @Chord(name = "cicg-java", consumes = { "IM" }) public class CICGAnalysis extends JavaAnalysis { private ProgramRel cg; @Override public void run() { cg = (ProgramRel) ClassicProject.g().getTrgt("IM"); } public Set<jq_Method> getCallees(Quad q) { if (!cg.isOpen()) cg.load(); RelView view = cg.getView(); view.selectAndDelete(0, q); Iterable<jq_Method> res = view.getAry1ValTuples(); Set<jq_Method> callees = new HashSet<jq_Method>(); for (jq_Method m : res) callees.add(m); view.free(); return callees; } public void free() { if (cg.isOpen()) cg.close(); } }

87 Example Java Analysis @Chord(name = "my-java") public class MyAnalysis extends JavaAnalysis { @Override public void run() { ClassicProject p = ClassicProject.g(); CICGAnalysis a = (CICGAnalysis) p.getTask("cicg-java"); p.runTask(a); for (Quad q : ...) { Set<jq_Method> tgts = a.getCallees(q); } a.free(); } }

88 Specialized Java Analyses
ProgramDom: Consumes targets specified annotation Produces only a single target (the defined program domain itself) run() method computes and saves domain to disk ProgramRel: Consumes targets specified annotation, plus target of each of its program domains Produces only a single target (the defined program relation itself) run() method computes and saves relation to disk DlogAnalysis: Consumes only its declared domains and declared input relations Produces only its declared output relations run() method runs bddbddb

89 Analyses as Building Blocks
Modularity each analysis is written independently Flexibility analyses can interact in powerful ways with other analyses (by user-specified data/control dependencies) Efficiency analyses executed in demand-driven fashion results computed by each analysis automatically cached for reuse by other analyses without re-computation independent analyses automatically executed in parallel Reliability result is independent of order in which analyses are run

90 Outline of Lecture Getting Started with Chord Program Representation
Analysis Using Datalog/BDDs Chaining Analyses Together Context-Sensitive Analysis

91 Context-Sensitive Analysis
Respects inter-procedural control-flow to varying degrees Broadly two kinds: Bottom-Up: analyze method without any knowledge of its callers Top-Down: analyze method only in called contexts Two kinds of top-down approaches: Cloning-based (k-limited) Summary-based Fully context-sensitive approaches: Bottom-up Top-down summary-based

92 Context-Sensitive Analysis in Chord
Top-down: both cloning-based and summary-based Cloning-based analysis k-CFA, k-object-sensitivity, hybrid Summary-based analysis Tabulation algorithm from Reps, Horwitz, Sagiv (POPL’95)

93 Example: Context-Insensitive Analysis
class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; } } disjoint-reach(el, fl)? b new1 Bldg 1 el fl events floors new2 List new3 List for (int i = 0; i < K; i++) elems elems for (int i = 0; i < M; i++) new6 Obj[] [*] a [*] 2, 3 new4 Event new5 Floor class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; } } f e

94 Example: Cloning-Based Analysis
class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; } } disjoint-reach(el, fl)? b new1 Bldg 1 el fl events floors new2 List new3 List for (int i = 0; i < K; i++) elems elems for (int i = 0; i < M; i++) new6 Obj[] [*] a [*] 2 3 new4 Event new5 Floor class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; } } List() { Obj[] a = new6 Obj[…]; this.elems = a; } f e 2 3

95 Example: Cloning with Object Sensitivity
class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; } } disjoint-reach(el, fl)? b new1 Bldg 1 el fl events floors new2 List new3 List for (int i = 0; i < K; i++) elems elems for (int i = 0; i < M; i++) 2 new6 Obj[] new6 Obj[] 3 a [*] [*] a 2 3 new4 Event new5 Floor class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; } } List() { Obj[] a = new6 Obj[…]; this.elems = a; } f e 2 3

96 Running Cloning-based Analyses in Chord
cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog ant –Dchord.work.dir=<…> –Dchord.run.analyses=<ONE OF ABOVE> run chord.ctxt.kind=[ci|cs|co] kind of context sensitivity for each method and its locals chord.inst.ctxt.kind=[ci|cs|co] kind of context sensitivity for each instance method and its locals chord.stat.ctxt.kind=[ci|cs|co] kind of context sensitivity for each static method and its locals chord.kobj.k=[1|2|…] k value to use for each object allocation site chord.kcfa.k=[1|2|…] k value to use for each method call site

97 Output of Pointer/Call-Graph Analyses in Chord
cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog rootCM (c,m): m is entry method in ctxt c CICM (c1,i,c2,m): call site i in ctxt c1 may call method m in ctxt c2 CVC (c,v,o): local v may point to object o in ctxt c of its declaring method FC (f,o): static field f may point to object o CFC (o1,f,o2): instance field f of object o1 may point to object o2 cipa_0cfa.dlog rootM IM VH FH HFH

98 Cloning-Based vs. Summary-Based Analysis
Cloning-based Analysis: Flow-insensitive Notion of method contexts is somewhat arbitrary Summary-based Analysis: Flow-sensitive Notion of method contexts is defined by the user

99 Related Open-Source Projects
JikesRVM: Java Research Virtual Machine Soot + Paddle: Static analysis and transformation framework for Java bytecode IBM WALA: Static analysis framework for Java bytecode and related languages

100 Further Information Chord homepage: http://jchord.googlecode.com/
Chord user guide: Chord questions:


Download ppt "Chord: A Program Analysis Platform for Java"

Similar presentations


Ads by Google