Presentation is loading. Please wait.

Presentation is loading. Please wait.

Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications.

Similar presentations


Presentation on theme: "Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications."— Presentation transcript:

1

2 Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications

3 2 Verification: beyond engine-less cars Recent successes. specifications languages checkers abstractors What’s still missing? ? specifications Drivers wanted.

4 3 So who formulates specifications? Programmers? Probably not. Why they won’t: too busy; Yet another language to learn? specifications aren’t cool. Why they shouldn’t: may misunderstand usage rules. may not know all usage rules. Mining Specifications:  Convenience.  Like in data mining, discover surprise rules.

5 4 Advantages of mining  Exploits the massive programmers’ effort reflected in the code. Programmers resolved many problems: incomplete system requirements. incomplete API documentation. implementation-dependent rules. Want redundancy? (without redundant programming) ask multiple programmers (and vote).

6 5 Our output: a specification x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y)

7 6 How do we mine? Underlying premise: Even bad software is debugged enough to show hints of correct behavior.  Maxim: Common usage is the correct usage.

8 7 Mining = machine learning Reduce the problem into the well-known problem of learning regular languages. Obstacles: 1. bugs from source code may be learned into specification 2. what is “common” behavior? Solutions: 1. learn from dynamic behavior 2. learn probabilistically  learn from traces into probabilistic FSMs

9 8 Input: trace(s) 7 = socket(2, 1, 0); bind(7, 0x400120, 16); listen(7, 5); 8 = accept(7, 0x400200, 0x400240); read(8, 0x400320, 255); write(8, 0x400320, 12); read(8, 0x400320, 255); write(8, 0x400320, 7); close(8); 10 = accept(7, 0x400200, 0x400240); read(10, 0x400320, 255); write(10, 0x400320, 13); close(10); close(7); … x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y)

10 9 The mining algorithm dynamic execution (traces) trace abstraction usage scenarios (strings) (off-the-shelf) RegExp learner generalized scenarios (probabilistic NFA) extract heavy core (and approve) specification (NFA) dynamic checker dynamic exe. to be checked (trace) OK/bug

11 10 Trace abstraction: 4 challenges Traces interleave useful and useless events. RegExp learner cannot separate them. Specifications must include both temporal and value-flow constraints. RegExp learner only good with temporal constraints. Only some of API calls’ arguments impose “true” dependences. Infeasible to learn value-flow constraints on all arguments. Specifications may impose only partial order. Encoding all legal partial orders would produce a huge FSM.

12 h(_, ) a(, ) d(, ) b(_, ) e( ) Trace abstraction h(3, 5) c(10) a(4, 5) d(4, 7) b(0, 5) f(10) h(8, 11) e(7) f(50) d(15, 1) c(7) a(9, 11) b(6, 7) d(9, 14) f(20) e(7) … h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, 5) c(10) a(4, 5) d(4, 7) b(_, 5) f(10) h(_, 11) e(7) f(_) d(_, _) c(7) a(9, 11) b(_, 11) d(9, _) e(_) f(_) … h(_, X) a(Y, X) d(Y, Z) b(_, X) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z)

13 12 Attempted to learn and verify two published X Windows rules As of Friday: 1. A timestamp-passing rule learned the rule! (compact: 6 states) bugs in 2 out of 17 programs (ups, e93) 2. SetOwner(x) must be followed by GetSelection(x) failed to learn the rule (small learning set) but bugs in 2 out of 5 programs (xemacs, ups) Preliminary experiments

14 13 Related work Arithmetic pre/post conditions Daikon, Houdini properties orthogonal from us eventually, we may need to include and learn some arithmetic relationships Temporal relationships over calls intrusion detection: [Ghosh et al], [Wagner and Dean] software processes: [Cook and Wolf] error checking: [Engler et al SOSP 2001] lexical and syntactic pattern matching user must write templates (e.g., always follows )

15 14 Ongoing work Mechanize tool. Find more gold.

16 15 Future work Mining Give gold to jewelers. SPIN Vault Verisoft SLAM ESP … ? code specificationsbugs inputs

17 16 Summary Semi-automatically creating well-formend, non- trivial specifications is an important part of the verification tool chain. Contributions: introduced specifications mining phrased it as probabilistic learning from dynamic traces decomposed it into a sequence of subproblems (using an off-the-shelf learner) developed dynamic checker found bugs

18 17 Discussion Expressibility what classes of properties can/should we learn? can we learn more than we can check? can a single-threaded specification avoid race conditions?

19 Backup Slides


Download ppt "Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications."

Similar presentations


Ads by Google