Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis Yungbum Jung, Jaehwang Kim, Jaeho Shin, Kwangkeun Yi Programming Research.

Similar presentations


Presentation on theme: "Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis Yungbum Jung, Jaehwang Kim, Jaeho Shin, Kwangkeun Yi Programming Research."— Presentation transcript:

1 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis Yungbum Jung, Jaehwang Kim, Jaeho Shin, Kwangkeun Yi Programming Research Lab. Seoul National University

2 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin2 Motivation : an Industry’s Challenge In 2004, a company’s SQA dept. asked us for a C buffer-overrun static analyzer that must be sound must have a reasonable cost must be domain-unaware Our path Sound analyzer: drive cost-accuracy balance to a limit Statistical filter: sift out inevitable false alarms and rank alarms by their true probabilities

3 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin3 Outline Airac, Our Analyzer Internals Performance Statistical Analysis Symptoms Models Bayesian Analysis Linear Logistic Regression Sifting out, Ranking

4 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin4 Airac Array Index Range Analyzer for C Our static analyzer Is an abstract interpreter Does numerical interval analysis Is sound in sense of detecting all possible buffer overruns Covers full ANSI C + some GNU extensions

5 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin5 Abstraction Usual abstraction for stateful programs Set of concrete machine transition traces Map from program points to abstract states PgmPt  State α

6 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin6 Abstract Domains Machine= State x PgmPt State= Stk x Mem x Dmp Mem= Addr  Val Val= Interval x 2 Addr x 2 Array Addr= PgmVar + AllocSite + AllocSite x Field Array= AllocSite x Base x Size AllocSite= PgmPt [a, b] ∈ Interval = Base = Size...

7 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin7 Techniques Used Accuracy improvement by narrowing after widening flow-sensitivity context pruning (limited to linear expressions) static inlining (parameterized) static loop unrolling (parameterized) Cost reduction by careful worklist order: lazy at join points selective join/compare stack obviation

8 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin8 Stack Obviation Size of Stk proportional to program size Most of the analysis time = join + compare OK to skip join/compare for Stk if changes of Stk always reflected on Mem By simple syntactic transformation e1 ? e2 : e3  { if (e1) t = e2 else t = e3; t } e[f()]  t = f(); e[t] 3~5 times speed up

9 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin9 Optimistic Assumption: i  [0, 9] j  [0, 18] Error Recovery During Analysis 1: int a[10], i, j; 2: for (i=0;i<10;i++) { 3: a[i] =2 * i; 4: } 5: j = a[i]; 6: a[i] = … … buffer overrun since i  [10, 10]

10 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin10 Warnings about Performance Assume typeful C programs arrays must be used as the same type declared Artificial semantics after errors e.g. overrun, null dereference No side-effect for library functions No main() then analyze procedures in their defined order No alarms about buffers whose size is top Top value for free variables

11 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin11 Performance 1/2 Linux kernel 2.6.4AlarmsReal ErrorsLOCTime (sec) vmax302.c (79)112463 xfrm_user.c (235)211,201109 usb-midi.c (332)1042,2063617 atkbd.c (332)52811285 keyboard.c (411)211,2569 af_inet.c (48)111,27379 eata_pio.c (183)319848 cdc_acm.c (468)53849119 ip6_output.c (198)001,11045 mptbase.c (777)216,1588251 aty128fb.c (98)212,4663671 Performed on a Linux 2.6 box with Pentium4 3.2GHz, 4GB RAM

12 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin12 Performance 2/2 GNU SoftwareAlarmsReal ErrorsLOCTime (sec) tar-1.13 (2,630)66120,258577 bison-1.875 (5,164)50015,907809 sed-4.0.8 (461)2906,0531154 gzip-1.2.4a (799)1707,327794 grep-2.5.1 (187)209,297604 Commercial SoftwareAlarmsReal ErrorsLOCTime (min) A189280,3798 B196563,584,664789 C7815119,21182 D4357806,829112 E197112517,3148

13 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin13 Statistical Post Analysis 1. We collect Samples of true and false alarm Symptoms of each alarm 2. From them, compute trueness of alarms i.e. probability being true given its symptoms 3. With trueness we can Sift out false alarms Report truer alarms first

14 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin14 Symptoms Syntactic symptoms - AfterLoop, AfterBranch, AfterReturn, InNestedLoopBody, InNestedBranchBody + InLoopCond, InBranchCond, InFunParam, InNestedFunParam, InRightOfAnd Semantic symptoms - JoinN, NotNarrowed, ComplexData, InCyclicCallChain + Prunning, PassedValue, ConstantVariable, ConstantIndex, ConstantArrayConstantIndex Result symptoms - TopIndex, HalfInfiniteIndex + FiniteOffsetFiniteArray, FiniteIndex Common-sense + shallow inside info f g h [9, 10]

15 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin15 Bayesian Analysis For each alarm, we compute its conditional probability being true given its symptoms Numbers from “learning samples” Estimated using Monte-Carlo method We assume symptoms occur independently (naïve Bayesian filtering)

16 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin16 Sifting Out Threshold User’s knob: his/her risk ratio (R s /R r ) Minimize risk expectation Risk expectation of an alarm with probability p when Silencing= R s x p Reporting= R r x (1 – p) We silence if R s x p < R r x (1 – p) Hence, sift out when p < R r / (R r + R s ) Risk oftrue errorsfalse alarms silencingRsRs 0 reporting0RrRr = 1 / (1 + R s /R r )

17 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin17 Experiments With alarms from Parts of the Linux kernel Programs in algorithm text-books Learning and testing 50%/50% randomly chosen 15 times repeated

18 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin18 Sifting Out Alarms R s = 3 x R r  threshold = 0.25 74.84% of false alarms filtered out :-) 31.40% of true alarms were also swept out :-(

19 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin19 Ranking Alarms Show user “truer” alarms first 15.17% of false alarms are mixed up until the user sees 50% of the true alarms

20 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin20 Binary Logistic Regression Trueness of an alarm given its binary symptom vector Generalized linear model Coefficients from learning set For example,

21 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin21 Bayesian vs. Logistic Regression 1/2 With threshold 0.25, Bayesian: 74.84% of false, 31.40% of true Logistic Regression: 90.05% of false, 20.85% of true alarms can be sifted out

22 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin22 Bayesian vs. Logistic Regression 2/2 Until user sees 50% of true alarms Bayesian:15.17% Logistic Regression: 4.10% of false alarms were mixed up Conjecture: Logistic regression model respects symptom dependency?

23 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin23 Related Work Buffer overrun detection ARCHER [Xie, Chou & Engler 2003] SPLINT [Zitser, Lippmann & Leek 2004] CSSV [Dor, Rodeh & Sagiv 2003] ASTRÉE [Cousot et al. 2005, 2003] Statistical approach Z-ranking [Kremenek & Engler 2003] Error Correlation [Kremenek et al. 2004] unsound require annotation domain-aware

24 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin24 Conclusion Our “sound” static analyzer, Airac is realistic False alarms are inevitable in domain-unaware situation Statistical approaches helped viable approach to handle false alarms natural symptoms seem to work orthogonal to other static analysis techniques generic, depends on learning set

25 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin25 Thank you Questions? Demo available at http://ropas.snu.ac.kr/airac

26 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin26 Scalability O(n 3 )

27 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin27 Widening / Narrowing Definition …

28 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin28 GNU tar alarm rankings 0.9901198 real 0.9855307 false 0.9850591 false 0.9780444 false 0.9369590 false 0.3175148 false 0.3007076 false 0.2456504 false 0.1879507 false 0.1838634 false 0.1504679 false 0.1472117 false 0.1454532 false 0.1450839 false 0.1370842 false 0.1283671 false 0.1281401 false 0.1272549 false 0.1153716 false 0.1136177 false 0.1129601 false 0.1123550 false 0.1117830 false 0.1100936 false 0.1073766 false 0.1050893 false 0.1030711 false 0.0962164 false 0.0876318 false 0.0855448 false 0.0828855 false 0.0812378 false 0.0779431 false 0.0773685 false 0.0747280 false 0.0737355 false 0.0679391 false 0.0661072 false 0.0643442 false 0.0613386 false 0.0548322 false 0.0511073 false 0.0506068 false 0.0498492 false 0.0490331 false 0.0423234 false 0.0419616 false 0.0414079 false 0.0412521 false 0.0409813 false 0.0403298 false 0.0400940 false 0.0372400 false 0.0350889 false 0.0349503 false 0.0309685 false 136.58820000000000050022208597511053085327148437500000 false 101.17810000000000059117155615240335464477539062500000 false 99.72029999999999461124389199540019035339355468750000 false 96.23150000000001114131009671837091445922851562500000 false * 86.29980000000000472937244921922683715820312500000000 real 73.80840000000000600266503170132637023925781250000000 false 73.18860000000000809450284577906131744384765625000000 false 72.77779999999999915871740086004137992858886718750000 false 68.41880000000000450199877377599477767944335937500000 false 62.67739999999999156443664105609059333801269531250000 false 42.85080000000000666204869048669934272766113281250000 false 41.88710000000000377440301235765218734741210937500000 false 41.29950000000000898126018000766634941101074218750000 false 40.99270000000000635509422863833606243133544921875000 false 40.85649999999999693045538151636719703674316406250000 false 36.94050000000000011368683772161602973937988281250000 false 36.83960000000000434283720096573233604431152343750000 false 35.90990000000000748059392208233475685119628906250000 false 30.63400000000000744648787076584994792938232421875000 false 27.87509999999999621422830387018620967864990234375000 false 27.25529999999999830606611794792115688323974609375000 false 26.72379999999999711235432187095284461975097656250000 false 22.30869999999999819806362211238592863082885742187500 false 21.33930000000000148929757415316998958587646484375000 false 21.21289999999999764668245916254818439483642578125000 false 20.75169999999999959072738420218229293823242187500000 false 17.13429999999999964188646117690950632095336914062500 false 15.29690000000000082991391536779701709747314453125000 false 10.08619999999999805595507496036589145660400390625000 false 9.37820000000000675299816066399216651916503906250000 false 8.84739999999999682245288568083196878433227539062500 false 0.59149999999999636202119290828704833984375000000000 false -5.07939999999999969304553815163671970367431640625000 false -8.99279999999999901660885370802134275436401367187500 false -24.59400000000000119371179607696831226348876953125000 false -28.79900000000000304112290905322879552841186523437500 false -36.38089999999999690771801397204399108886718750000000 false -41.32750000000000056843418860808014869689941406250000 false -66.18330000000000268300937023013830184936523437500000 false -71.12990000000000634372554486617445945739746093750000 false

29 Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis SAS 2005 Jaeho Shin29 Other Possible Questions 다른 분석에 응용 가능성 About symptoms Destructive update conditions …


Download ppt "Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis Yungbum Jung, Jaehwang Kim, Jaeho Shin, Kwangkeun Yi Programming Research."

Similar presentations


Ads by Google