Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.

Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor

Overview Background Flow-Sensitive Analysis Semi-Sparse Flow-Sensitive Analysis Questions

Uses Gather pointer information to improve precision which allows optimizations Flow sensitive is beneficial for the following – Security analysis – Deep error checking – Hardware synthesis – Multi-threaded programs

Types of Analysis Types of pointer Analysis – Flow Consider statement ordering in code Little progress made in scalability – Context Consider Procedure calls Good progress in scalability Complimentary improvement of precision

Analysis Tradeoffs Scalability vs Precision – It takes time to analysis code – It takes memory to hold the analysis Insensitive vs Sensitive – Insensitive less complex/precise – Sensitive more complex/precise Larger pieces of code in general are complex

Traditional Flow-Sensitive Analysis Lattice of dataflow facts Meet operator on lattice Transfer functions map lattice elements to other lattice elements Use CFG = – N nodes (program points) – E edges (flow)

Traditional Flow-Sensitive Analysis Iterative algorithm – Runs until convergence Adds successor nodes to work list when output set changes Propagates pointer information to all reachable nodes Prohibitive in memory and computation complexity

Contributions Two Ideas – Semi-sparse analysis – Novel use of Binary Decision Diagrams Two new optimizations – Top-level pointer equivalence – Local points-to graph equivalence

Static Single Assignment Def/use relation captured Let us use it to reduce information sent to nodes w = a; x = b; y = &c; z = y; y = &d; w 1 = a 1 ; x 1 = b 1 ; y 1 = c 1 ; z 1 = y 1 ; y 2 = d 1 ; w = a; x = b; y = c; z = y; y = d; w 1 = a 1 ; x 1 = b 1 ; y 1 = ?; z 1 = ?; y 2 = ?; Pointer Analysis SSA

Partial Single Static Assignment Two classes of variable – Address-Taken In memory Use ALLOC/STORE – Top-level Never expose address Not dynamically allocated int a, b, *c, *d; int* w = &a; int* x = &b; int** y = &c; int** z = y; c = 0; *y = w; *z = x; y = &d; z = y; *y = w; *z = x; w 1 = ALLOC a x 1 = ALLOC b y 1 = ALLOC c z 1 = y 1 STORE 0 y 1 STORE w 1 y 1 STORE x 1 z 1 y 2 = ALLOC d z 2 = y 2 STORE w 1 y 2 STORE x 1 z 2

Partial Single Static Assignment Advantages – Single global points-to graph for top-level variables They have same pointer information over entire program – Top-level def/use info immediately available – Local points-to graph only contain address-taken information

Dataflow Graph DFG - combination of sparse evaluation graph (SEG) and def-use chain – Optimized version of CFG Omits nodes that neither define nor use pointer info – Connects adr-taken statements so defs reach uses Two stage construction – First DEF adr and USE adr are considered – Second stage connects top-level defs to uses

Dataflow Graph Inst Type ExampleDef-Use Info ALLOCx = ALLOC i DEF top COPYx = y zDEF top, USE top LOADx = *yDEF top, USE top, USE adr STORE*x = yUSE top, DEF adr, USE adr CALLx = foo(y)DEF top, USE top, DEF adr, USE adr RETreturn xUSE top, USE adr

Dataflow Graph y 1 = ALLOC c STORE 0 y 1 w 1 = ALLOC a x 1 = ALLOC b z 1 = y 1 STORE w 1 y 1 y 2 = ALLOC d STORE x 1 z 1 z 2 = y 2 STORE w 1 y 2 STORE x 1 z 2 w 1 = ALLOC a x 1 = ALLOC b y 1 = ALLOC c z 1 = y 1 STORE 0 y 1 STORE w 1 y 1 STORE x 1 z 1 y 2 = ALLOC d z 2 = y 2 STORE w 1 y 2 STORE x 1 z 2

Semi-Sparse Analysis Each function has program statement work list – Initialized to statements that define variables Each program statement that uses or defines address-taken variables has two points-to graphs – IN = incoming address-taken info – OUT = outgoing address-taken info Global points-to graph holds pointer info for top- level variables Function work list that holds function waiting to be processed – Initialized to contain all functions in program

Semi-Sparse Analysis Iterative algorithm Computes for all nodes until convergence IN k = U (x in pred(k)) OUT x OUT k = GEN k U (IN k – KILL k ) KILL set determines strong or weak update – Know value of left hand side do strong update precise – Unsure of left hand side do weak update conservative

Top-Level Pointer Equivalence Optimization – Reduces number of top-level variables in DFG – x equiv y iff x points-to z and y points-to z Key Idea – Replace variables with identical points-to sets with single set representative – Member of the set selected as representative

Top-Level Pointer Equivalence y 1 = ALLOC c STORE 0 y 1 w 1 = ALLOC a x 1 = ALLOC b z 1 = y 1 STORE w 1 y 1 y 2 = ALLOC d STORE x 1 z 1 z 2 = y 2 STORE w 1 y 2 STORE x 1 z 2 w 1 = ALLOC a x 1 = ALLOC b y 1 = ALLOC c z 1 = y 1 STORE 0 y 1 STORE w 1 y 1 STORE x 1 z 1 y 2 = ALLOC d z 2 = y 2 STORE w 1 y 2 STORE x 1 z 2 STORE x 1 y 1 STORE x 1 y 2 w 1 = ALLOC a x 1 = ALLOC b y 1 = ALLOC c STORE 0 y 1 STORE w 1 y 1 STORE x 1 y 1 y 2 = ALLOC d STORE w 1 y 2 STORE x 1 y 2

Local Points-to Graph Equivalence Optimization – Eliminates nodes in DFG with identical points-to graphs Share a single points-to graph – Used in SEG portion of graph Key Idea – Non-preserving nodes Only STORE and CALL modify adr-taken pointer info. – Preserving nodes Propagate pointer info to other nodes

Local Points-to Graph Equivalence Process takes O(n 3 ) – N is the number of nodes in SEG portion of DFG (DEF adr or USE adr ) Further optimized to only use STORE – 0.1% precision loss Similar to RTL – STORE to STORE collapsible Collapsed Points-to Graph RET Points-to Graph LOAD Points-to Graph STORE Points-to Graph

BDDs Compressed representation of set relations – Operations performed without decompression Set operations can be performed in polynomial-time Useful to store CFG and points-to graph Transfer functions are BDD operations – Set operations

Semi-Sparse Symbolic Analysis Encode top-level points-to information in BDD – Most variables are top-level BDDs can not operate on individual statements efficiently – Use iterative algorithm for address-taken points-to information Strong and weak updates Allows BDD to operate efficiently

Results of the Analysis Pointer Information Representation Semi-Sparse Flow- Sensitive Semi-Sparse Flow- Sensitive Optimized SSO vs SS bitmap75x faster 26x less memory Against baseline 183x faster 47x less memory Against baseline 2.5x faster 6.8x less memory Against SS BDD44.8x faster 1.4x less memory Against baseline 114x faster 1.4x less memory Against baseline 4.4x faster 1.03x less memory Against SS

Questions

Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.

Similar presentations

Presentation on theme: "Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.

Similar presentations

Presentation on theme: "Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor."— Presentation transcript:

Similar presentations

About project

Feedback