Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Application Errors and Security Flaws Using PQL: a Program Query Language MICHAEL MARTIN, BENJAMIN LIVSHITS, MONICA S. LAM PRESENTED BY SATHISHKUMAR.

Similar presentations


Presentation on theme: "Finding Application Errors and Security Flaws Using PQL: a Program Query Language MICHAEL MARTIN, BENJAMIN LIVSHITS, MONICA S. LAM PRESENTED BY SATHISHKUMAR."— Presentation transcript:

1 Finding Application Errors and Security Flaws Using PQL: a Program Query Language MICHAEL MARTIN, BENJAMIN LIVSHITS, MONICA S. LAM PRESENTED BY SATHISHKUMAR INSTRUCTOR CHRISTOPH CSALLNER 1

2 Outline 2 1. Introduction 2. PQL 3. Abstract Execution Trace 4. PQL Query 5. Dynamic matcher 6. Static checker 7. Experimental results 8. Conclusion

3 1. Introduction Program analyzer finds enormous errors in software Program checkers targeted at finding patterns common to many application programs Error checkers to check whether program conforms certain design rules Deals with sequence of events associated with set of related objects. 3

4 2. PQL PQL is Program Query Language Allows programmers to express questions Query looks like a code excerpt corresponding to shortest amount of code that violate design rule. If match found, then specify action to perform Matched events may be widely spaced 4

5 Techniques used & Results found Both Static & Dynamic analysis 6 large real world open source java applications Contains nearly 60k classes Found 206 major errors such as  Security flaws  Resource leaks  Violation of consistency invariants 5

6 Focus PQL focuses on important class of error patterns that deal with sequences of events associated with a set of related objects. Events may be scattered throughout different methods PQL finds all matches in program that have equivalent behavior Records relevant information or corrects erroneous execution 6

7 Example – SQL injection vulnerability Applications using user-controlled input strings directly as database query cmds are susceptible to SQL injections Code fragment in a Java servlet:  con.execute(request.getParameter("query")); This code reads a parameter from an HTTP request and passes it directly to a database backend. By supplying an appropriate query, a malicious user can gain access to unauthorized data, damage the contents in the database, and in some cases, even execute arbitrary code on the server. 7

8 Resolving SQL injection vulnerability 8 To catch this, check the following in code  object r of type HttpServletRequest,  object c of type Connection, and  object p of type String Result of invoking getParameter on ‘r’ yields string ‘p’ ‘p’ is used as a parameter to invocation of execute on ‘c’ If true,  Replace call to ‘execute’ with ‘Util.CheckedSQL’ that validates query to ensure that it matches a permissible action. If the query is invalid or susceptible to bad, the request is not made. Note: Two events in the application need not happen consecutively.

9 SQL injection query (sample) 9

10 Static & Dynamic checkers 10 Static checker:  Finds all potential matches in program  Uses points-to analysis  Results are flow insensitive with respect to the query  Does not ensure that calls occur in the same order  Results will have false positives but not false negatives Dynamic checker:  Find matches occur at runtime  Precise and permit actions to be triggered  PQL creates an instrumented version of the input program that reports runtime match iff there are object instances that match the query

11 3. Abstract Execution Trace 11 Abstract the program execution as a trace of primitive events, each of which contains a unique event ID, an event type, and a list of attributes. Objects are named by unique identifiers. PQL focuses on objects, and so it only matches against instructions that directly dereference objects. PQL currently does not allow references to variables of primitive data types such as integers, floats and characters.

12 AET (cont’d) 12 Field loads and stores:  The attributes of these event types are the source object, target object, and the field name. Array loads and stores:  The attributes of these event types are the source and target objects. The array index is ignored. Method calls and returns:  The attributes of these event types are the method invoked, the formal objects passed in as arguments and the returned object. The return event parameter includes the ID of its corresponding call event. Object creations:  The attributes of this event type are the newly returned object and its class. End of program:  This event type has no attributes and occurs just before the Java Virtual Machine terminates.

13 AET-Example 13 If Len=2: Two matches are found in this trace with simpleSQLinjection query.

14 4. PQL query 14 A PQL query is a pattern to be matched on the execution trace and actions to be performed upon the match. A match to the query is a set of objects and a subsequence of the trace that together satisfy the pattern. Two matches:  r=o3, c=o5, p=o4  r=o3, c=o5, p=o7

15 Query Grammar 15 Wildcard “_”  Query can use wildcard symbol “_” whose different occurrences can be matched to different member names or objects. Sequence “a;b”  Stmt ‘a’ is followed by ‘b’.  It may not be contiguous. Any events may occur between them. Exclusion “~ b”  Stmt ‘b’ should not exist.  a; ~b; c matches ‘a’ followed by ‘c’ iff ‘b’ does not occur between them. Alternation “|”  a|b is the statement matching either a or b.

16 Query Grammar (cont’d) 16 Partial order “a,b,c;”  Three statements a, b, and c would match in any order Within construct: “within”  Matching method call event, matching pattern  Insisting that the return of the method should not occur at any point between the call and the full match of the pattern. Checking for potential leaks of file handles:

17 Query Grammar (cont’d) 17

18 Subqueries 18 Subqueries allow users to specify recursive event sequences or recursive object relations. Subqueries are analogous to recursive functions in a programming language. They can return multiple values, which are bound to variables in the calling query. By recursively invoking subqueries, each with its own set of variables, queries can match against an unbounded number of objects.

19 Recursive subquery 19

20 Recursive subquery (cont’d) 20 Recursion is useful for matching against java wrappers. Java exposes higher-level I/O functions by providing wrappers over base input streams. For example, to read Java Objects from some socket s, one might first wrap the stream with a BufferedInputStream to cache incoming data, then with an ObjectInputStream to parse the objects from the stream

21 Recursive subquery (cont’d) 21 Captures arbitrary levels of matching: The base case in derivedStream Subquery declares that any stream can be considered derived from itself. The other captures a single wrapper and then re-invokes derivedStream recursively. The query first finds all the streams derived from the input stream of a socket, then all objects read from any of the derived streams.

22 Match found! 22 PQL provides two facilities to log information about matches or perform actions. executes:  Executes a specified method when match occurs. replaces:  Replace existing stmt with specified method that represent actions to be executed in its place. Symbol “*”  Represents that every variable in match will be packaged into a collection that can be handled generically.  Util.PrintStackTrace(*)

23 5. Dynamic matcher 23 Approach to finding matches to PQL queries dynamically consists of the following three steps:  Translate queries to state machines.  Instrument the target application to produce the full abstract execution trace.  Use a query recognizer to interpret all the state machines over the execution trace to find all matches.

24 State machines 24 State machine contains a set of states, which includes a start state, an fail state, and an accept state. Each state carry ‘bindings’ with them. Bindings:  mapping from variables in a PQL query to objects in the heap at run time. A state transition specifies the event for which under which current state and current bindings transition to the next state and a new set of bindings. State transitions generally represent a single primitive statement corresponding to a single event in the execution trace.

25 Translate Query to State machines query main() uses Object x, final; matches { x = getParameter(_) | x = getHeader(); f := derived (x); execute (f); } query derived(Object x) uses Object t; returns Object y; matches { { y := x; } |{ t = x.toString(); y := derived(t); } | { t.append(x); y := derived(t); } } 25

26 State Machine – Query main()    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f) 26

27 State Machine – Query derived()       t=x.toString() t.append(x) y := x y := derived(t) * * 27

28    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f) { } { x=o 1 } { x=o 1 } 1 28 o 1 = getHeader(o 2 )

29 derived(O 1 )       t=x.toString() t.append(x) y := x y := derived(t) * * {x=o 1 } {x=y=o 1 } 29

30 o 1 = getHeader(o 2 )    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f) { } { x=o 1 } { x=o 1 } 1 {x=o 1,f=o 1 } 30

31       t=x.toString() t.append(x) y := x y := derived(t) * * {x=o 1 } derived(o 1 ) o 3.append(o 1 ) {x=o 1, t=o 3 } 2 {x=y=o 1 } 31

32       t=x.toString() t.append(x) y := x y := derived(t) * * derived(o 3 ) {x=o 3 } {x=y=o 3 } 32

33       t=x.toString() t.append(x) y := x y := derived(t) * * {x=o 1 } o 1 = getHeader(o 2 ) o 3.append(o 1 ) {x=o 1, t=o 3 } 2 {x=y=o 1 } {x=o 1, y=t=o 3 } 33

34 execute(O3)    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f) { } { x=o 1 } { x=o 1 } 1 o 1 = getHeader(o 2 ) {x=o 1,f=o 1 } o 3.append(o 1 ) o 3.append(o 4 ) o 5 = execute(o 3 ) {x=o 1,f=o 3 }, {x=o 1,f=o 3 } 34

35 6. Static checker 35 Uses pointer analysis technique  It is a static code analysis technique that establishes which pointers, or heap references, can point to which variables or storage locations. The points-to-information is stored in a deductive database called bddbddb. The data are compactly represented with binary decision diagrams(BDDs), and can be accessed efficiently with queries written in the logic programming language Datalog.

36 ‘bddbddb’ Database 36 All inputs and results for the static analyzer are stored as relations in the ‘bddbddb’ database. Database includes byte codes B, variables V, methods M, contexts C, integers Z and heap objects. Context represents the various call chains that can occur in the program. The source program is represented as a number of input relations:  actual parameter passing  ret method returns  fldldfield loads  fldstfield stores  arrayldarray loads  arraystarray stores

37 Datalog 37 A Datalog program P consists of a set of domains D, a set of relations R, and a set of rules Q

38 Datalog (cont’d) 38 Relation vP0 is the set of initial points-to relations. vP0(v, h)  Places a reference to heap object ‘h’ in variable ‘v’ in an operation.  Ex: s = new String() store(x,f,y)  x.f = y load(x,f,y)  y = x.f assign(x, y)  x = y vP(v, h)  is true if variable v may point to heap object h at any point during program execution hP(h1, f, h2)  is true if heap object field h1.f may point to heap object h2 Rule 1  If v has a reference to heap object h then v can point to h Rule 2  If variable v2 can point to object h and v1 includesv2, then v1 can also point to h Rule 3  v1.f = v2, if v1 can point to h1 and v2 can point to h2, then h1.f can point to h2 Rule 4  v2 = v1.f, if v1 can point to h1 and h1.f can point to h2, then v2 can point to h2

39 Java to Input relations 39 Domain V contains values va, vb, vd representing variables a, b, d Domain H contains values h1, h3 representing objects allocated on lines 1,3 Domain F consists of value ‘name’, representing ‘name’ field of a Dog object Initial points-to relations in vP0 are (va, h1) and (vd, h3) The program has one assignment operation: assign(vb, va) one store operation: store(vd, name, vb)

40 Java to Input relations (cont’d) 40 Satisfies Rule 1  vP(va, h1) and vP(vd, h3) are true Satisfies Rule 2  vP(vb, h1) is true since assign(vb, va) and vP(va, h1) are true Satisfies Rule 3  hP(h3, name, h1) is true since store(vd, name, vb), vP(vd, h3) and vP(vb, h1) are true.

41 PQL to Datalog 41

42 PQL to Datalog (cont’d) 42 Query is represented as a number of input relations:  actual ->parameter passingret ->method returns  Fldld->field loadsfldst->field stores  Arrayld->array loadsarrayst->array stores

43 PQL to Datalog (cont’d) 43 Datalog rule says that an object h is a cause of an injection if b1 is a call to getParameter, b2 is a call of execute, and the return result of getParameter v1 in some context c1 points to the same heap object h as v2, the first parameter of the call to execute in some context c2. Here the result of getParameter() is the input to execute(), hence match found!

44 7. Experimental results 44 Applications No. of Classes No. of bugs found Eclipse19,439192 personalblog5,2362 road2hibernate7,0621 snipsnap10,8518 roller16,3591 webgoat1,0212 TOTAL59,968206

45 Major Error patterns 45 Important error patterns found by PQL: Serialization errors  data corruption bug in web servers.  Ex: do not store object of type X in Y SQL injections  a major threat to the security of database servers Mismatched method pairs  causes resource leaks and data structure inconsistencies  “a call to method A must always be followed by a call to method B”  Ex: lock();…….unlock(); Lapsed listeners  a common memory leakage pattern in Java that may lead to resource exhaustion and crashes in long-running applications

46 8. Conclusion 46 PQL is Program Query Language Allows programmers to express questions Query looks like a code excerpt corresponding to shortest amount of code that violate design rule. If match found, then specify action to perform Matched events may be widely spaced Static analysis can solve serialization error query Dynamic analysis for complex queries like matched method pairs and lapsed listeners 206 bugs in 60k classes

47 References 47 Finding Application Errors and Security Flaws Using PQL: a Program Query Language, Michael M., Benjamin L., Monica S. L., Computer Science Department, Stanford University research.microsoft.com/~livshits/papers/ppt/oopsla05. ppt Using Datalog with Binary Decision Diagrams for Program Analysis, John W., Dzintars A., Michael C., and Monica S.L., Computer Science Department, Stanford University, Stanford, CA 94305, USA

48 Q? 48 QUESTIONS??? THANK YOU!!!


Download ppt "Finding Application Errors and Security Flaws Using PQL: a Program Query Language MICHAEL MARTIN, BENJAMIN LIVSHITS, MONICA S. LAM PRESENTED BY SATHISHKUMAR."

Similar presentations


Ads by Google