Presentation is loading. Please wait.

Presentation is loading. Please wait.

Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications of correctness.

Similar presentations


Presentation on theme: "Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications of correctness."— Presentation transcript:

1 Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications of correctness

2 2 Motivation: why specifications? Verification tools find bugs early make guarantees scale with programs need specifications verifier Bugs! program Specifications program

3 3 Language-usage specifications verifier Bugs! program array accesses memory allocation type safety... program Easy to write, big payoff

4 4 Library-usage specifications verifier program cut-and-paste (X11) network server (socket API) device drivers (kernel API)... program Harder to write, smaller payoff Bugs!

5 5 Program specifications verifier program symbol table well-formed IR well-formed... Hardest to write, smallest payoff Bugs!

6 6 Solution: specification mining Specification mining gleans specifications from artifacts of program development: From programs (static)? From executions of test cases (dynamic)? From other artifacts?

7 7 Mining from traces Advantages: No infeasible paths Pointer/alias analysis is easy Few bugs, as program passes its tests Common behavior is correct behavior... socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) read(so = 8, buf = 0x100, len = 12, return = 12) close(so = 8, return = 0) close(so = 7, return = 0)...

8 8 Output: a specification socket(return = X) accept(so = X, return = Y) close(so = Y)close(so = X) read(so = Y) write(so = Y) Specification says what programs should do: Temporal dependences ( accept follows socket ) Data dependences ( accept input is socket output) start end

9 9 How we mine specifications extract scenarios standardize PFSA learner... socket(domain = 2, type = 1, proto = 0, return = 7))... ACEGBACEGB ACEGBACEGB ACEGBACEGB socket(domain = 2, type = 1, proto = 0, return = 7))... socket(domain = 2, type = 1, proto = 0, return = 7))... socket(...) accept(...) read(...) write(...) close(...) socket(...) accept(...) read(...) write(...) close(...) socket(...) accept(...) read(...) write(...) close(...) Traces Scenarios (dep. graphs) Strings postprocess Specification PFSA socket(return = X) accept(so = X, return = Y) close(so = Y)close(so = X) read(so = Y) write(so = Y) start end.. A B E F C D start end.. 10 20 40

10 10 Outline of the talk The specification mining problem Our specification mining system Annotating traces with dependences Extracting and standardizing scenarios Probabilistic learning and postprocessing Experimental results Related work

11 11 An impossible problem C (all correct traces) T (training traces) Find a Turing machine that generates C, given T. I (all traces) Unsolvable: No restrictions on C No connection between C and T Simple variants are also undecidable [Gold67]

12 12 A simpler problem Find a PFSA that generates an approximation of P. 0 1 P a probability distribution Probability Correct Noise

13 13 A simpler problem Find a PFSA that generates an approximation of P. All scenarios 0 1 P a probability distribution over all scenarios Probability Correct scenarios Noise

14 14 A simpler problem Find a PFSA that generates an approximation of P. Tractable, plus Scenarios are small Noise handled Finite-state Weights useful for postprocessing All scenarios 0 1 P a probability distribution over all scenarios Probability Correct scenarios Noise

15 15 Outline of the talk The specification mining problem Our specification mining system Annotating traces with dependences Extracting and standardizing scenarios Probabilistic learning and postprocessing Verifying traces Experimental results Related work

16 16 Dependence annotation socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0) dependence annotator Traces Annotated traces

17 17 Dependence annotation Definers: socket.return accept.return close.so Users: accept.so read.so write.so close.so dependence annotator Traces Annotated traces socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

18 18 Dependence annotation dependence annotator Traces Annotated traces Definers: socket.return accept.return close.so Users: accept.so read.so write.so close.so socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

19 19 Outline of the talk The specification mining problem Our specification mining system Annotating traces with dependences Extracting and standardizing scenarios Probabilistic learning and postprocessing Experimental results Related work

20 20 Extracting scenarios scenario extractor Annotated traces Seeds Abstract scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

21 21 Extracting scenarios scenario extractor Annotated traces Seeds Abstract scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

22 22 Extracting scenarios scenario extractor Annotated traces Seeds Abstract scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

23 23 Simplifying scenarios scenario extractor Annotated traces Seeds Abstract scenarios socket(domain = 2, type = 1, proto = 0, return = 7) [seed] accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

24 24 Simplifying scenarios socket(return = 7) [seed] accept(so = 7, return = 8) write(so = 8) close(so = 8) close(so = 7) Drops attributes not used in dependences. scenario extractor Annotated traces Seeds Abstract scenarios

25 25 Standardizing scenarios Simplified scenarios Equivalent scenarios Abstract scenarios Standardization Two transformations: Naming: foo(val = 7)  foo(val = X) Reordering: foo(); bar();  bar(); foo(); Finds the least standardized scenario, in lexicographic order scenario extractor Annotated traces Seeds Abstract scenarios

26 26 Standardizing scenarios scenario extractor Annotated traces Seeds Abstract scenarios socket(return = 7) [seed] accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) close(so = 7) Use-def and def-def dependences

27 27 Standardizing scenarios Reorder scenario extractor Annotated traces Seeds Abstract scenarios socket(return = 7) [seed] accept(so = 7, return = 8) read(so = 8) write(so = 8) close(so = 8) close(so = 7) Use-def and def-def dependences

28 28 Standardizing scenarios Reorder Name scenario extractor Annotated traces Seeds Abstract scenarios socket(return = X) [seed] accept(so = X, return = Y) read(so = Y) write(so = Y) close(so = Y) close(so = X) Use-def and def-def dependences

29 29 Standardizing scenarios ABDEFGABDEFG Each interaction is a letter to the PFSA learner. scenario extractor Annotated traces Seeds Abstract scenarios socket(return = X) [seed] accept(so = X, return = Y) read(so = Y) write(so = Y) close(so = Y) close(so = X)

30 30 Outline of the talk The specification mining problem Our specification mining system Annotating traces with dependences Extracting and standardizing scenarios Probabilistic learning and postprocessing Experimental results Related work

31 31 PFSA learning Algorithm due to Raman et al.: 1.Build a weighted retrieval tree 2.Merge similar states automaton learner Abstract scenarios Specification

32 32 PFSA learning Algorithm due to Raman et al.: 1.Build a weighted retrieval tree 2.Merge similar states automaton learner Abstract scenarios Specification A B C E F D F 100 99 100 99 1 G G 1

33 33 PFSA learning B C E D F 100 99 100 99 1 A automaton learner Abstract scenarios Specification Algorithm due to Raman et al.: 1.Build a weighted retrieval tree 2.Merge similar states G 1 G 99

34 34 PFSA learning B C E D F 100 99 100 99 1 A automaton learner Abstract scenarios Specification Algorithm due to Raman et al.: 1.Build a weighted retrieval tree 2.Merge similar states G 100

35 35 Postprocessing: coring B C E D F 100 99 100 99 1 A automaton learner Abstract scenarios Specification 1.Remove infrequent transitions 2.Convert PFSA to NFA G 100

36 36 Postprocessing: coring B C E D F A automaton learner Abstract scenarios Specification 1.Remove infrequent transitions 2.Convert PFSA to NFA G

37 37 Outline of the talk The specification mining problem Our specification mining system Annotating traces with dependences Extracting and standardizing scenarios Probabilistic learning and postprocessing Experimental results Related work

38 38 Where to find bugs? in programs (static verification)? or in traces (dynamic verification)?

39 39 How we verify specifications extract scenarios standardize Check automaton membership... socket(domain = 2, type = 1, proto = 0, return = 7))... ACEGBACEGB ACEGBACEGB ACEGBACEGB socket(domain = 2, type = 1, proto = 0, return = 7))... socket(domain = 2, type = 1, proto = 0, return = 7))... socket(...) accept(...) read(...) write(...) close(...) socket(...) accept(...) read(...) write(...) close(...) socket(...) accept(...) read(...) write(...) close(...) Traces Scenarios (dep. graphs) Strings

40 40 Verifying traces... socket(return = 7) accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) close(so = 7)... socket(return = 7) accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8)... OK (both sockets closed)Bug! (socket 7 not closed) socket(return = X) [seed] accept(so = X, return = Y) close(fd = Y)close(fd = X) read(so = Y) write(so = Y)

41 41 Attempted to mine and verify two published X11 rules Experimental results Challenge: small, buggy training sets (16 programs)

42 42 Learning by trial and error Start with a rule learned from one, trusted trace. Then: Randomly select an unused trace Trace obeys rule? Add trace to training set; learn a new rule Expert: is trace buggy? yes no no (rule too specific) Report bug yes

43 43 1. A timestamp-passing rule 4 traces did not need inspection learned the rule! (compact: 7 states) bugs in 2 out of 16 programs (ups, e93) English specification was incomplete (3 traces) expert and corer agreed on 81% of the hot core 2. SetOwner(x) must be followed by GetSelection(x) failed to learn the rule (very small learning set) but bugs in 2 out of 5 programs (xemacs, ups) Results

44 44 Outline of the talk The specification mining problem Our specification mining system Annotating traces with dependences Extracting and standardizing scenarios Probabilistic learning and postprocessing Experimental results Related work

45 45 Related work Arithmetic pre/post conditions Daikon [Ernst et al], Houdini [Flanagan and Leino] properties orthogonal from us eventually, we may need to include and learn some arithmetic relationships Temporal relationships over calls intrusion detection: [Ghosh et al], [Wagner and Dean] software processes: [Cook and Wolf] error checking: [Engler et al SOSP 2001] lexical and syntactic pattern matching user must write templates (e.g., always follows ) design patterns: [Reiss and Renieris]

46 46 Conclusion Introduced specification mining, a new approach for learning correctness specifications Refined the problem into a problem of probabilistic learning from traces Developed and demonstrated a practical specifications miner

47 47 End of talk

48 48 How we mine specifications tracerrun dependence annotator Program Instrumented program Test inputs TracesAnnotated traces... socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0)...

49 49 How we mine specifications Program int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP] while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); } close(s);

50 50 How we mine specifications tracer Program Instrumented program int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP] while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); } close(s);

51 51 How we mine specifications tracerrun Program Instrumented program Test inputs Traces... socket(domain = 2, type = 1, proto = 0, return = 7) [SETUP socket 7] accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) [USE socket 8] close(so = 8, return = 0) close(so = 7, return = 0)...

52 52 How we mine specifications tracerrun dependence annotator Program Instrumented program Test inputs TracesAnnotated traces... socket(domain = 2, type = 1, proto = 0, return = 7) [SETUP socket 7] accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) [USE socket 8] close(so = 8, return = 0) close(so = 7, return = 0)...

53 53 How we mine specifications tracerrun scenario extractor dependence annotator Program Instrumented program Test inputs TracesAnnotated traces Scenario seeds Abstract scenarios socket(return = X) [seed] [SETUP socket X] accept(so = X, return = Y) [USE socket Y] close(so = Y) close(so = X)

54 54 How we mine specifications tracerrun scenario extractor automaton learner dependence annotator Program Instrumented program Test inputs TracesAnnotated traces Scenario seeds Abstract scenarios Specification socket(return = X) [seed] [SETUP X] accept(so = X, return = Y) close(fd = Y)close(fd = X) [USE Y]

55 55 Reducing the problem C (all correct traces) T (training traces) The problem: find an automaton that generates C, given T. I (all traces) Issues: What if C is not r.e.? Checkers and learners need finite specs.

56 56 Reducing the problem C (all correct traces) T (training traces) The problem: find an automaton that generates C, given T. I (all traces) Issues: What if C is not r.e.? Checkers and learners need finite specs.

57 57 Reducing the problem The problem: find an automaton that generates C, given T. Assume that C is regular. Issue: What if the program is not regular? C (all correct traces, regular) T (training traces) I (all traces) I C T Unrestricted

58 58 Reducing the problem The problem: find an automaton that generates C S, given T S. Assume that the size of scenarios is bounded. Issue: No connection between C S and T S ! C S (all correct scenarios, regular) T S (training scenarios) I S (all scenarios, bounded size) I C T Unrestricted Regular I C T

59 59 Reducing the problem The problem: find an automaton that generates C S, given T S. Assume that T S presents each element of C S at least once. Issue: Undecidable (Gold67) C S (all correct scenarios, regular) T S = c 0, c 1,... I S (all scenarios, bounded size) I C T Unrestricted Regular I C T ISIS CSCS TSTS Scenarios

60 60 Reducing the problem The problem: find a PFSA that generates P’, where P and P’ are close (by some distance metric). Assume P is generated by a PFSA. I C T Unrestricted Regular I C T Scenarios ISIS CSCS Complete presentation I S (all scenarios) T S = c 0, c 1,... ISIS CSCS TSTS 0 1 P a probability distribution over I S, generated by a PFSA

61 61 Digression: postprocessing PFSA = NFA with weights Specification = NFA Convert PFSA to specification: 1. Find hot core (that is, drop noise) drop infrequent scenarios drop infrequent parts of scenarios 2. Drop weights

62 62 Preparing input traces socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T1 7] accept(so:T2 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8) [USE socket:T4 8] close(so:T5 = 8, return = 0) close(so:T5 = 7, return = 0)

63 63 Preparing input traces socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T2 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8) [USE socket:T4 8] close(so:T5 = 8, return = 0) close(so:T5 = 7, return = 0)

64 64 Preparing input traces socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8) [USE socket:T4 8] close(so:T5 = 8, return = 0) close(so:T5 = 7, return = 0)

65 65 Preparing input traces socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8) [USE socket:T4 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0)

66 66 Preparing input traces socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8) [USE socket:T3 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0)

67 67 Preparing input traces socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0)

68 68 Extracting scenarios socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0) scenario extractor Annotated traces Seeds Abstract scenarios

69 69 Extracting scenarios socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0) scenario extractor Annotated traces Seeds Abstract scenarios

70 70 Extracting scenarios socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0) scenario extractor Annotated traces Seeds Abstract scenarios

71 71 Simplifying scenarios socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [seed] [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0) scenario extractor Annotated traces Seeds Abstract scenarios

72 72 Simplifying scenarios socket(return:T0 = 7) [seed] [SETUP socket:T0 7] accept(so:T0 = 7, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8) close(so:T0 = 7) Drop untyped attributes. scenario extractor Annotated traces Seeds Abstract scenarios

73 73 Standardizing scenarios Standardization puts equivalent scenarios into a canonical abstract form: Simplified scenarios Equivalent scenarios Abstract scenarios Standardization A search using two transformations: Naming: foo(val = 7)  foo(val = X) Reordering: foo(); bar();  bar(); foo(); scenario extractor Annotated traces Seeds Abstract scenarios

74 74 Standardizing scenarios socket(return:T0 = 7) [seed] [SETUP socket:T0 7] accept(so:T0 = 7, return:T0 = 8) [USE Y] close(so:T0 = 8) close(so:T0 = 7) scenario extractor Annotated traces Seeds Abstract scenarios

75 75 Standardizing scenarios socket(return:T0 = 7) [seed] [SETUP socket:T0 7] accept(so:T0 = 7, return:T0 = 8) write(so:T0 = 8) read(so:T0 = 8) close(so:T0 = 8) close(so:T0 = 7) scenario extractor Annotated traces Seeds Abstract scenarios

76 76 Standardizing scenarios socket(return:T0 = 7) [seed] [SETUP socket:T0 7] accept(so:T0 = 7, return:T0 = 8) read(so:T0 = 8) write(so:T0 = 8) close(so:T0 = 8) close(so:T0 = 7) Reorder scenario extractor Annotated traces Seeds Abstract scenarios

77 77 Standardizing scenarios socket(return:T0 = X) [seed] [SETUP socket:T0 X] accept(so:T0 = X, return:T0 = Y) read(so:T0 = Y) write(so:T0 = Y) close(so:T0 = Y) close(so:T0 = X) Reorder Name scenario extractor Annotated traces Seeds Abstract scenarios

78 78 Standardizing scenarios socket(return:T0 = X) [seed] [SETUP socket:T0 X] accept(so:T0 = X, return:T0 = Y) read(so:T0 = Y) write(so:T0 = Y) close(so:T0 = Y) close(so:T0 = X) ABCDEFGABCDEFG Each interaction is a letter to the PFSA learner. scenario extractor Annotated traces Seeds Abstract scenarios

79 79 Coring Coring removes PFSA transitions that occur infrequently and converts the PFSA into an NFA. [SETUP X] accept(so = X, return = Y) close(fd = Y) [USE Y] close(fd = X) socket(return = X) [seed] automaton learner Abstract scenarios Specification

80 80 Verification Do all traces of a program P satisfy a specification A?

81 81 Verification Do all traces of a program P satisfy a specification A? Does a trace T Definition: T satisfies A if every seed in T is surrounded by a scenario that satisfies A.

82 82 Verification Do all traces of a program P satisfy a specification A? Does a trace T Does a scenario S Language of A Abstract scenarios satisfying A Simplified scenarios satisfying A Concrete scenarios satisfying A Simplification Standardization S?

83 83 Experiments What we wanted to find out Hypothesis 1: the process will find bugs and reduce the number of traces that the expert must inspect. Hypothesis 2: the miner’s final specification will match the English rule. Hypothesis 3: the corer and the human will agree on the hot core. Gathered traces from 16 programs: 5 programs in the X11 distribution and 11 contributed programs

84 84 Testing vs. verification testing: program input is the output correct? input property verification: checker property does property hold? program X11 sockets sample properties: allocated memory is freed. locks are released. …

85 85 Testing vs. verification Completeness (“coverage”): verification (if sound) guarantees that program contains no bugs of a well-specified class. testingverification aspectsallsome controlsomeall datasomeall our focus

86 86 Verification: recent successes Recent successes. specifications languages: temporal logics, automata, … abstractors: SLAM, FeaVer checkers: model checking, theorem proving, type systems What’s still missing? ? specifications property holds? program checker abstract program L1L1 abstractor formal specification of correctness L2L2 property

87 87 So who formulates specifications? Programmers? Probably not. Why they won’t: too busy: yet another language to learn? specifications aren’t cool. specification languages are hard: LTL, anyone? Why they shouldn’t: may misunderstand usage rules. may not know all usage rules. Mining Specifications:  Convenient and easy: anyone can do it  Like in data mining, discover surprise rules.

88 88 Advantages of mining  Exploits the massive programmers’ effort reflected in the code. Programmers resolved many problems: incomplete system requirements. incomplete API documentation. implementation-dependent API rules. Want redundancy? (without redundant programming) ask multiple programmers (and vote).  Exploits the testers’ effort in devising test inputs

89 89 Our output: a specification x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y)

90 90 How do we mine? Underlying premise: Even bad software is debugged enough to show hints of correct behavior.  Maxim: Common usage is the correct usage.

91 91 Mining = machine learning Reduce the problem into the well-known problem of learning regular languages. Obstacles: 1. source code is too detailed and hard to analyze 2. what is “common” behavior? Solutions: 1. learn from dynamic behavior 2. learn probabilistically  learn from traces into probabilistic FSMs

92 92 Input: trace(s) 7 = socket(2, 1, 0); bind(7, 0x400120, 16); listen(7, 5); 8 = accept(7, 0x400200, 0x400240); read(8, 0x400320, 255); write(8, 0x400320, 12); read(8, 0x400320, 255); write(8, 0x400320, 7); close(8); 10 = accept(7, 0x400200, 0x400240); read(10, 0x400320, 255); write(10, 0x400320, 13); close(10); close(7); … x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y) 7 = socket(2, 1, 0); bind(7, 0x400120, 16); listen(7, 5); 8 = accept(7, 0x400200, 0x400240); read(8, 0x400320, 255); write(8, 0x400320, 12); read(8, 0x400320, 255); write(8, 0x400320, 7); close(8); 10 = accept(7, 0x400200, 0x400240); read(10, 0x400320, 255); write(10, 0x400320, 13); close(10); close(7); … 7 = socket(2, 1, 0); bind(7, 0x400120, 16); listen(7, 5); 8 = accept(7, 0x400200, 0x400240); read(8, 0x400320, 255); write(8, 0x400320, 12); read(8, 0x400320, 255); write(8, 0x400320, 7); close(8); 10 = accept(7, 0x400200, 0x400240); read(10, 0x400320, 255); write(10, 0x400320, 13); close(10); close(7); …

93 93 The mining algorithm dynamic execution (traces) trace abstraction usage scenarios (strings) (off-the-shelf) RegExp learner generalized scenarios (probabilistic FSA) user: extract heavy core (and approve) specification (NFA) dynamic checker dynamic exe. to be checked (trace) OK/bug

94 94 Trace abstraction: 4 challenges Traces interleave useful and useless events. sockets created by accept are independent, … Specifications must include both temporal and value-flow constraints. Only some of API calls’ arguments impose “true” dependences. accept does not alter the state of the bound socket, … Specifications may impose only partial order. filling in fields of a structure before a call, …

95 95 Finding dependendences 7 = socket(2, 1, 0); bind(7, 0x400120, 16); listen(7, 5); 8 = accept(7, 0x400200, 0x400240); read(8, 0x400320, 255); write(8, 0x400320, 12); read(8, 0x400320, 255); write(8, 0x400320, 7); close(8); 10 = accept(7, 0x400200, 0x400240); read(10, 0x400320, 255); write(10, 0x400320, 13); close(10); close(7); … Some args and return values are handles to data structures. Calls may write through the handle read through the handle read and write Def-use dependences connect writers to readers

96 h(_, ) a(, ) d(, ) b(_, ) e( ) Trace abstraction h(3, 5) c(10) a(4, 5) d(4, 7) b(0, 5) f(10) h(8, 11) e(7) f(50) d(15, 1) c(7) a(9, 11) b(6, 7) d(9, 14) f(20) e(7) … h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, 5) c(10) a(4, 5) d(4, 7) b(_, 5) f(10) h(_, 11) e(7) f(_) d(_, _) c(7) a(9, 11) b(_, 11) d(9, _) e(_) f(_) … h(_, X) a(Y, X) d(Y, Z) b(_, X) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z)

97 97 The output PFSA h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) 2 2 2 1 1 d(Y, Z) 1

98 98 Renaming and reordering the chop outline of the algorithm input: a chop (a dag of data dependences) output: the canonical chop 1. reorder: list all possible chop schedules trick: only list those with calls in lexicographic order 2. rename: abstract arguments in each schedule 3. select lexicographically least schedule lexicographic order: a(…) b(…) < b(…) b(…) a(X) b(…) < a(Y) b(…)

99 99 Checking: the meaning of the spec means: whenever seed(x) is executed, it must be preceded by a(x), b(x) and followed by c(x). does not mean: a(x) must be followed by b(x), seed(x), c(x) (because a is not a seed). seed(x)c(x)b(x)a(x)

100 100 Dynamic checking Used in our experiments checker mirrors the learner: specification (NFA) dynamic checker for each seed in the trace extract a chop if some substring from chop in NFA seed verified! else extract a larger chop (up to a bound) fail if no chop verifies dynamic execution to be checked (trace) OK/bug

101 101 Static checking Conversion to a “checkable” specification: seed(x)c(x)b(x)a(x) seed(x) c(x)b(x)a(x) ^b(x) ^seed(x) OK bug! ^c(x) | end seed(x)

102 102 Related work Arithmetic pre/post conditions Daikon, Houdini properties orthogonal from us eventually, we may need to include and learn some arithmetic relationships Temporal relationships over calls intrusion detection: [Ghosh et al], [Wagner and Dean] software processes: [Cook and Wolf] error checking: [Engler et al SOSP 2001] lexical and syntactic pattern matching user must write templates (e.g., always follows )

103 103 Ongoing work Mechanize tool. Find more gold.

104 104 Ongoing work Gathering traces Right now: hand-built tracers for stdio and X11 Ongoing: a tool to generate tracers from header files Def-use annotations Right now: by hand Ongoing: tools to help Future?: static analysis Selecting the core Right now: by hand and heuristics Ongoing: algorithms (lots of ideas, none tried) Find more bugs 80MB of gzipped stdio traces

105 105 Future work Mining Give gold to jewelers. SPIN Vault Verisoft SLAM ESP … ? code specificationsbugs inputs

106 106 Summary Semi-automatically formulating well-formed, non- trivial specifications is an important part of the verification tool chain. Contributions: introduced specifications mining phrased it as probabilistic learning from dynamic traces decomposed it into a sequence of subproblems (using an off-the-shelf learner) developed dynamic checker found bugs

107 107 The supply/demand pyramids LTL C C ++ Java Visual Basic javascript, html, XML skill (supply) effort (demand) s/w development requirements analysis verification and testing


Download ppt "Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications of correctness."

Similar presentations


Ads by Google