Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Datalog Applied to Software Security Vulnerabilities Datalog Data-Flow Analysis.

Similar presentations


Presentation on theme: "1 Datalog Applied to Software Security Vulnerabilities Datalog Data-Flow Analysis."— Presentation transcript:

1 1 Datalog Applied to Software Security Vulnerabilities Datalog Data-Flow Analysis

2 2 The Story uI long thought that Datalog was a dead issue. uMy colleague Monica Lam discovered Datalog and has been using it to write efficient algorithms for important data- flow questions. uSo Datalog gets new life, but not as a database query language.

3 3 Outline 1.Vulnerabilities. wBuffer overflow. wSQL injection. 2.Datalog --- review. 3.Andersen’s pointer analysis. wExplanation. wExpression in Datalog.

4 4 Buffer Overflow uUser provides a string that is later copied to a buffer of fixed length. uBuffer is located on the run-time stack, so it is near the place where the return address is held. uString is so long it overwrites the return address. uReturn actually goes to intruder’s code.

5 5 The Data-Flow Issue  Suppose we encounter strcpy(b,a) in a program. uIs a a string that was obtained by reading input? uHas some function already checked that a is not too long to fit in buffer b ? uInput and check could have occurred anywhere.

6 6 SQL Injection uSQL queries are often constructed by programs, e.g., via JDBC. uThese queries may take constants from user input. uCareless code can allow rather unsuspected queries to be constructed and executed.

7 7 Example: SQL Injection uRelation Accounts(name, passwd, acct). uWeb interface: get name and password from user, store in strings n and p, issue query, display account number. SELECT acct FROM Accounts WHERE name = :n AND passwd = :p

8 8 User (Who Is Not Yanni) Types Name: Password: Your account number is 1234-567 Yanni’ // who cares?

9 9 The Query Executed SELECT acct FROM Accounts WHERE name = ‘Yanni’ //’ AND passwd = ‘who cares?’ All treated as a comment

10 10 The Data-Flow Issue uIf a string is passed to JDBC or another query interface, did (part of) the string come from the user? uIf so, has it been checked for validity? uAgain --- the string could have come from anywhere in the program.

11 11 Outline 1.Vulnerabilities. wBuffer overflow. wSQL injection. 2.Datalog --- review. 3.Andersen’s pointer analysis. wExplanation. wExpression in Datalog.

12 12 Datalog --- (1) Atom = p(A,B,C) Literal = Atom or NOT Atom Rule = Atom :- Literal & … & Literal Predicate Arguments: variables or constants The body : For each assignment of values to variables that makes all these true … Make this atom true (the head ).

13 13 Example: Datalog Rules path(X,Y):-edge(X,Y) path(X,Y):-path(X,Z) & path(Z,Y) “There is a path from X to Y if there is an edge from X to Y, or if there is some Z such that there are paths from X to Z and from Z to Y.”

14 14 Datalog --- (2) uIntuition: subgoals in the body are combined by “and” (strictly speaking: “join”). uIntuition: Multiple rules for a predicate (head) are combined by “or.”

15 15 Datalog --- (3) uPredicates are implemented by relations. uEach tuple, or assignment of values to the arguments, represents a ground atom (true or false proposition).

16 16 EDB Vs. IDB Predicates uSome predicates are given as input data. wCalled EDB, or extensional database predicates. uOthers are defined by the rules only. wCalled IDB, or intensional database predicates.

17 17 Datalog for Program Analysis uInspect the code to produce the EDB. wExample: userDef(s,i) = “string s is assigned a user-supplied value at position i.” uCompute IDB relations that tell what you need to know about how values flow through the program. wExample: unsafe(x,i) = “string x at position i has a value that is derived from the user.”

18 18 Iterative Algorithm for Datalog uStart with the EDB predicates = “whatever the code dictates,” and with all IDB predicates empty. uRepeatedly examine the bodies of the rules, and see what new IDB facts can be discovered from the EDB and existing IDB facts.

19 19 Outline 1.Vulnerabilities. wBuffer overflow. wSQL injection. 2.Datalog --- review. 3.Andersen’s pointer analysis. wExplanation. wExpression in Datalog.

20 20 Andersen’s Pointer Analysis uComputes Java object references. uIncludes “where can this string have come from?’’ uCast of characters: 1.Local variables, which point to: 2.Heap objects, which may have fields that are references to other heap objects.

21 21 Representing Heap Objects uA heap object is named by the statement in which it is created. uNote many run-time objects may have the same name.  Example: h: T v = new T; says variable v can point to (one of) the heap object(s) created by statement h. vh

22 22 Other Relevant Statements  v.f = w makes the f field of the heap object h pointed to by v point to what variable w points to. v hg w i ff

23 23 Other Statements --- (2)  v = w.f makes v point to what the f field of the heap object h pointed to by w points to. v hg wi f

24 24 Other Statements --- (3)  v = w makes v point to whatever w points to. wInterprocedural Analysis : Also models copying an actual parameter to the corresponding formal or return value to a variable. v h w

25 25 EDB Relations uThe facts about the statements in the program and what they do to pointers are accumulated and placed in several EDB relations.  Example: there would be an EDB relation copy(To,From) whose tuples are the pairs (v,w) such that there is a copy statement v=w.

26 26 Convention for EDB uInstead of using EDB relations for the various statement forms, we shall simply use the quoted statement itself to stand for an atom derived from the statement. uExample: “v=w” stands for copy(v,w).

27 27 IDB Relations upts(V,H) will get the set of pairs (v,h) such that variable v can point to heap object h. uhpts(H1,F,H2) will get the set of triples (h,f,g) such that the field f of heap object h can point to heap object g.

28 28 Datalog Rules 1.pts(V,H) :- “H: V = new T” 2.pts(V,H) :- “V=W” & pts(W,H) 3.pts(V,H) :- “V=W.F” & pts(W,G) & hpts(G,F,H) 4.hpts(H,F,G) :- “V.F=W” & pts(V,H) & pts(W,G)

29 29 Example T p(T x) { h:T a = new T; a.f = x; return a; } void main() { g:T b = new T; b = p(b); b = b.f; }

30 30 Apply Rules Recursively --- Round 1 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g)

31 31 Apply Rules Recursively --- Round 2 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g) pts(b,h) pts(x,g)

32 32 Apply Rules Recursively --- Round 3 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g) pts(x,g) pts(b,h) hpts(h,f,g) pts(x,h)

33 33 Apply Rules Recursively --- Round 4 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g) pts(x,g) pts(b,h) pts(x,h)hpts(h,f,g) hpts(h,f,h)

34 34 The Rest of the Story... uMore complex analysis adds arguments to predicates that distinguish: 1.Flow --- where in the code you are. 2.Context --- what is the sequence of functions that appear on the run-time stack.

35 35 Catching Security Holes uSince strings in Java are really reference variables, the analysis (with flow and context) lets us know exactly where a string that is used could have been created. uDangerous ones were read from the user.

36 36 Computational Complexity uAnalysis (with flow and context) for million-line programs is beyond the state of the art. uWhaley and Lam show efficiency enhancements from: 1.Seminaive (incremental) evaluation of Datalog. 2.Binary Decision Diagrams (BDD’s).

37 37 Research Suggestions 1.Scaling up to “Windows”-sized programs. 2.Handle weakly typed languages (e.g. C) where anything can be a pointer. 3.Framing defense against new attacks as data-flow problems.


Download ppt "1 Datalog Applied to Software Security Vulnerabilities Datalog Data-Flow Analysis."

Similar presentations


Ads by Google