Presentation is loading. Please wait.

Presentation is loading. Please wait.

A SYSTEMATIC STUDY OF AUTOMATED PROGRAM REPAIR: FIXING 55 OUT OF 105 BUGS FOR $8 EACH Claire Le Goues Michael Dewey-Vogt Stephanie Forrest Westley Weimer.

Similar presentations


Presentation on theme: "A SYSTEMATIC STUDY OF AUTOMATED PROGRAM REPAIR: FIXING 55 OUT OF 105 BUGS FOR $8 EACH Claire Le Goues Michael Dewey-Vogt Stephanie Forrest Westley Weimer."— Presentation transcript:

1 A SYSTEMATIC STUDY OF AUTOMATED PROGRAM REPAIR: FIXING 55 OUT OF 105 BUGS FOR $8 EACH Claire Le Goues Michael Dewey-Vogt Stephanie Forrest Westley Weimer http://genprog.cs.virginia.edu 1

2 Claire Le Goues, ICSE 2012 PROBLEM: BUGGY SOFTWARE http://genprog.cs.virginia.edu “Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.” – Mozilla Developer, 2005 Annual cost of software errors in the US: $59.5 billion (0.6% of GDP). Average time to fix a security-critical error: 28 days. 2 90%: Maintenance 10%: Everything Else

3 Claire Le Goues, ICSE 2012 HOW BAD IS IT? http://genprog.cs.virginia.edu 3

4 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 4

5 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 5

6 Claire Le Goues, ICSE 2012 Tarsnap: 125 spelling/style 63 harmless 11 minor + 1 major 75/200 = 38% TP rate $17 + 40 hours per TP …REALLY? http://genprog.cs.virginia.edu 6

7 Claire Le Goues, ICSE 2012 Tarsnap: 125 spelling/style 63 harmless 11 minor + 1 major 75/200 = 38% TP rate $17 + 40 hours per TP …REALLY? http://genprog.cs.virginia.edu 7

8 Claire Le Goues, ICSE 2012 …REALLY? http://genprog.cs.virginia.edu 8

9 Claire Le Goues, ICSE 2012 SOLUTION: PAY STRANGERS http://genprog.cs.virginia.edu 9

10 Claire Le Goues, ICSE 2012 SOLUTION: PAY STRANGERS http://genprog.cs.virginia.edu 10

11 Claire Le Goues, ICSE 2012 SOLUTION: AUTOMATE http://genprog.cs.virginia.edu 11

12 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC 1, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 12 1 C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “GenProg: A generic method for automated software repair,” Transactions on Software Engineering, vol. 38, no. 1, pp. 54– 72, 2012. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest, “Automatically finding patches using genetic programming,” in International Conference on Software Engineering, 2009, pp. 364–367.

13 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC 1, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 13 1 C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “GenProg: A generic method for automated software repair,” Transactions on Software Engineering, vol. 38, no. 1, pp. 54– 72, 2012. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest, “Automatically finding patches using genetic programming,” in International Conference on Software Engineering, 2009, pp. 364–367.

14 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 14

15 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 15

16 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 16

17 Claire Le Goues, ICSE 2012 INPUT OUTPUT EVALUATE FITNESS DISCARD ACCEPT MUTATE

18 Claire Le Goues, ICSE 2012 DISCARD INPUT EVALUATE FITNESS MUTATE ACCEPT OUTPUT

19 Claire Le Goues, ICSE 2012 Search: random (GP) search through nearby patches. Approach: compose small random edits. Where to change? How to change it? http://genprog.cs.virginia.edu 19 BIRD’S EYE VIEW

20 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 20 Input: 2 56 1 3 4 8 7 9 1 1010 1212

21 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 21 Input: 2 56 1 3 4 8 7 9 1 1010 1212 Legend: High change probability. Low change probability. Not changed.

22 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 22 2 56 1 3 4 8 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X

23 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 23 2 56 1 3 4 8 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X

24 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 24 2 56 1 3 4 8 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X

25 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 25 2 56 1 3 4 8 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X

26 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 26 2 56 1 3 4 8 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X 4

27 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 27 2 56 1 3 4 8 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X 4

28 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 28 2 56 1 3 4 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X 4 4’

29 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 29 2 56 1 3 4 7 9 1 1010 1212 An edit is: Replace statement X with statement Y Insert statement X after statement Y Delete statement X 4 4’

30 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 30

31 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 31

32 Claire Le Goues, ICSE 2012 SCALABLE: SEARCH SPACE http://genprog.cs.virginia.edu 32 http://genprog.cs.virginia.edu 32 http://genprog.cs.virginia.edu 32 2 56 1 3 4 8 7 9 1 1010 1212

33 Claire Le Goues, ICSE 2012 SCALABLE: SEARCH SPACE http://genprog.cs.virginia.edu 33 http://genprog.cs.virginia.edu 33 http://genprog.cs.virginia.edu 33 2 56 1 3 4 8 7 9 1 1010 1212

34 Claire Le Goues, ICSE 2012 SCALABLE: SEARCH SPACE http://genprog.cs.virginia.edu 34 http://genprog.cs.virginia.edu 34 http://genprog.cs.virginia.edu 34 2 56 1 3 8 7 9 1 1010 1212 4

35 Claire Le Goues, ICSE 2012 SCALABLE: SEARCH SPACE http://genprog.cs.virginia.edu 35 http://genprog.cs.virginia.edu 35 http://genprog.cs.virginia.edu 35 2 56 1 3 8 7 9 1 1010 1212 4 Fix localization: intelligently choose code to move.

36 Claire Le Goues, ICSE 2012 SCALABLE: REPRESENTATION 1 1 2 2 5 5 4 4 Naïve: 1 1 2 2 4 4 5 5’ http://genprog.cs.virginia.edu 36 1 1 3 3 2 2 5 5 4 4 Input: New: Delete(3) Replace(3,5)

37 Claire Le Goues, ICSE 2012 SCALABLE: REPRESENTATION 1 1 2 2 5 5 4 4 Naïve: 1 1 2 2 4 4 5 5 5’ http://genprog.cs.virginia.edu 37 1 1 3 3 2 2 5 5 4 4 Input: New: Delete(3) Replace(3,5) New fitness, crossover, and mutation operators to work with a variable-length genome.

38 Claire Le Goues, ICSE 2012 SCALABLE: PARALLELISM http://genprog.cs.virginia.edu 38 Fitness: Subsample test cases. Evaluate in parallel. Random runs: Multiple simultaneous runs on different seeds.

39 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 39

40 Claire Le Goues, ICSE 2012 GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR. AUTOMATED PROGRAM REPAIR http://genprog.cs.virginia.edu 40

41 Claire Le Goues, ICSE 2012 COMPETITIVE http://genprog.cs.virginia.edu How many bugs can GenProg fix? How much does it cost? 41

42 Claire Le Goues, ICSE 2012 Goal: systematically test GenProg on a general, indicative bug set. General approach: Avoid overfitting: fix the algorithm. Systematically create a generalizable benchmark set. Try to repair every bug in the benchmark set, establish grounded cost measurements. http://genprog.cs.virginia.edu SETUP 42

43 Claire Le Goues, ICSE 2012 Goal: systematically evaluate GenProg on a general, indicative bug set. General approach: Avoid overfitting: fix the algorithm. Systematically create a generalizable benchmark set. Try to repair every bug in the benchmark set, establish grounded cost measurements. http://genprog.cs.virginia.edu SETUP 43

44 Claire Le Goues, ICSE 2012 CHALLENGE: INDICATIVE BUG SET http://genprog.cs.virginia.edu 44

45 Claire Le Goues, ICSE 2012 Goal: a large set of important, reproducible bugs in non-trivial programs. Approach: use historical data to approximate discovery and repair of bugs in the wild. SYSTEMATIC BENCHMARK SELECTION http://genprog.cs.virginia.edu 45

46 Claire Le Goues, ICSE 2012 Consider top programs from SourceForge, Google Code, Fedora SRPM, etc: Find pairs of viable versions where test case behavior changes. Take all tests from most recent version. Go back in time through the source control. Corresponds to a human-written repair for the bug tested by the failing test case(s). http://genprog.cs.virginia.edu SYSTEMATIC BENCHMARK SELECTION 46

47 Claire Le Goues, ICSE 2012 BENCHMARKS ProgramLOCTestsBugsDescription fbc97,0007733Language (legacy) gmp145,0001462Multiple precision math gzip491,000125Data compression libtiff77,0007824Image manipulation lighttpd62,0002959Web server php 1,046,00 0 8,47144Language (web) python407,00035511Language (general) wireshark 2,814,00 0 637Network packet analyzer Total 5,139,00 0 10,19 3 105 http://genprog.cs.virginia.edu 47

48 Claire Le Goues, ICSE 2012 CHALLENGE: GROUNDED COST MEASUREMENTS http://genprog.cs.virginia.edu 48

49 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 49

50 Claire Le Goues, ICSE 2012 http://genprog.cs.virginia.edu 50

51 Claire Le Goues, ICSE 2012 READY http://genprog.cs.virginia.edu 51

52 Claire Le Goues, ICSE 2012 GO http://genprog.cs.virginia.edu 52

53 Claire Le Goues, ICSE 2012 13 HOURS LATER http://genprog.cs.virginia.edu 53

54 Claire Le Goues, ICSE 2012 SUCCESS/COST Program Defects Repaire d Cost per non-repairCost per repair HoursUS$HoursUS$ fbc1/38.525.566.524.08 gmp1/29.936.611.600.44 gzip1/55.113.041.410.30 libtiff17/247.815.041.050.04 lighttpd5/910.797.251.340.25 php28/4413.008.801.840.62 python1/1113.008.801.220.16 wireshark1/713.008.801.230.17 Total55/10511.22h1.60h http://genprog.cs.virginia.edu $403 for all 105 trials, leading to 55 repairs; $7.32 per bug repaired. 54

55 Claire Le Goues, ICSE 2012 JBoss issue tracking: median 5.0, mean 15.3 hours. 1 IBM: $25 per defect during coding, rising at build, Q&A, post-release, etc. 2 Tarsnap.com: $17, 40 hours per non-trivial repair. 3 Bug bounty programs in general: At least $500 for security-critical bugs. One of our php bugs has an associated security CVE. 1 C. Weiß, R. Premraj, T. Zimmermann, and A. Zeller, “How long will it take to fix this bug?” in Workshop on Mining Software Repositories, May 2007. 2 L. Williamson, “IBM Rational software analyzer: Beyond source code,” in Rational Software Developer Conference, Jun. 2008. 3 http://www.tarsnap.com/bugbounty.html http://genprog.cs.virginia.edu PUBLIC COMPARISON 55

56 Claire Le Goues, ICSE 2012 GenProg: scalable, automatic bug repair. Algorithmic improvements for scalability: fix localization, internal representation, parallelism. Systematic study: Indicative, systematically-generated set of bugs that humans care about. Repaired 52% of 105 bugs in 96 minutes, on average, for $7.32 each. Benchmarks/results/source code/VM images available: http://genprog.cs.virginia.edu 56 CONCLUSIONS/CONTRIBUTIONS

57 Claire Le Goues, ICSE 2012 I LOVE QUESTIONS. http://genprog.cs.virginia.edu 57 (Examples: “Which bugs can GenProg fix?” “What happens if you run for more than 13 hours/change the probability distributions/pick a different crossover/etc?” “How do you know the patches are any good?” “How do your patches compare to human patches?” …)

58 Claire Le Goues, ICSE 2012 WHICH BUGS…? Slightly more likely to fix bugs where the human: restricts the repair to statements. touched fewer files. As fault space decreases, success increases, repair time decreases. As fix space increases, repair time decreases. http://genprog.cs.virginia.edu 58

59 Claire Le Goues, ICSE 2012 FINDING BUGS IS HARD Opaque or non-automated GUI testing. Firefox, Eclipse, OpenOffice Inaccessible or small version control histories. bash, cvs, openssh Few viable versions for recent tests. valgrind Require incompatible automake, libtool Earlier versions of gmp No bugs GnuCash, openssl Non-deterministic tests... http://genprog.cs.virginia.edu

60 Claire Le Goues, ICSE 2012 1.class test_class { 2. public function __get($n) 3. { return $this; %$ } 4. public function b() 5. { return; } 6.} 7.global $test3; 8.$test3 = new test_class(); 9.$test3->a->b(); EXAMPLE: PHP BUG #54372 http://genprog.cs.virginia.edu Relevant code: function zend_std_read_property in zend_object_handlers.c Note: memory management uses reference counting. Problem: this line: 449.zval_ptr_dtor(object) If object points to $this and $this is global, its memory is completely freed, even though we could access $this later. Expected output: nothing Buggy output: crash on line 9. 60

61 Claire Le Goues, ICSE 2012 GenProg : % 448c448,451 > Z_ADDROF_P(object); > if (PZVAL_IS_REF(object)) > { > SEPARATE_ZVAL(&object); > } zval_ptr_dtor(&object) EXAMPLE: PHP BUG #54372 http://genprog.cs.virginia.edu 61 Human : % 449c449,453 < zval_ptr_dtor(&object); > if (*retval != object) > { // expected > zval_ptr_dtor(&object); > } else { > Z_DELREF_P(object); > }

62 Claire Le Goues, ICSE 2012 Is automatically-patched code more or less maintainable? Approach: Ask 102 humans maintainability questions about patched code (human vs. GenProg). Results: No difference in accuracy/time between human accepted and GenProg patches. Automatically-documented GenProg patches result in higher accuracy and lower effort than human patches. Zachary P. Fry, Bryan Landau, Westley Weimer: A Human Study of Patch Maintainability. International Symposium on Software Testing and Analysis (ISSTA) 2012: to appear http://genprog.cs.virginia.edu 62 PATCH QUALITY

63 Claire Le Goues, ICSE 2012 PATCH REPRESENTATION ProgramFaultLOCRepair Ratio gcdinfinite loop221.07 uniq-utxsegfault11461.01 look-utxsegfault11691.00 look-svrinfinite loop13631.00 units-svrsegfault15043.13 deroff-utxsegfault22361.22 nullhttpdbuffer exploit55751.95 indentinfinite loop99061.70 flexsegfault187753.75 atrisbuffer exploit215530.97 Average63251.68 http://genprog.cs.virginia.edu 63


Download ppt "A SYSTEMATIC STUDY OF AUTOMATED PROGRAM REPAIR: FIXING 55 OUT OF 105 BUGS FOR $8 EACH Claire Le Goues Michael Dewey-Vogt Stephanie Forrest Westley Weimer."

Similar presentations


Ads by Google