/* iComment: Bugs or Bad Comments? */

/* iComment: Bugs or Bad Comments? */
Lin Tan, Ding Yuan, Gopal Krishna, Yuanyuan Zhou Published in SOSP Presented by Kevin Boos

In a Nutshell iComment: static analysis + NLP
Detects code-comment mismatches Uses both source code and comments

Roadmap iComment Paper Complexity Authors’ other works Motivation
Challenges Contributions Approach & Methodology Results Related Work Complexity Authors’ other works

Motivation Software bugs affect reliability.
Mismatches between code and developer assumptions // Caller must acquire lock. static int reset_hardware(...) { //access shared data. } static int in2000_bus_reset(...) { reset_hardware(...); Comments are around 30% of source code. Bugs cost US economy $60 billion annually, etc.

Prevalence of Comments
Comments = developer assumptions Must hold locks, interrupts must be disabled, etc. Other tools do not utilize comments! Ignore valuable information (dev. intentions) Software Lines of Code Lines of Comments Linux 5 million 1 million Mozilla 3.3 million 0.5 million

Code vs. Comments Code Comment Implication Precise Imprecise Comments are harder to analyze. Can be tested Can NOT be tested Software evolution makes comments less reliable. Harder to understand Easier to understand Developers read comments before code. Wrong comments mislead programmers. Developer assumptions can’t always be inferred from source code Comments and code are redundant or should be…

Inconsistencies What’s wrong: comments or code? Bad code might be bugs
Developer mistake Out of date Copy and paste error (clone detection) Bad code might be bugs Bad comments cause future bugs

Challenges Parsing and understanding comments
Natural language is ambiguous and varying /* We need to acquire the IRQ lock before calling … */ /* Lock must be acquired on entry to this function. */ /* Caller must hold instance lock! */ NLP only captures sentence structure No concept of understanding Decent accuracy Comments may be grammar disasters… 1. Parts of SpeechTagging (acc: 97%) 2. Chunking (acc: 90%) 3. Semantic Role Labeling (acc: 70%)

Contributions First step towards automatically analyzing comments
Combines NLP, machine learning, static analysis Identifies inconsistent code & comments Real-world applicability Discovered 60 new bugs or bad comments Only two topics: locks & calls Over 1/3 of the inconsistencies discovered were confirmed by developers

Approach Two types of comments Check comment rules topic-by-topic
Explanatory: /* set the access flags */ Assumptions/Rules: /* don’t call with lock held */ Check comment rules topic-by-topic General framework Users choose the hot topics

Rule Templates Other templates exist (see paper)
<Lock L> must be held before entering <Function F>. <Lock L> must NOT be held before entering <Function F>. <Lock L> must be held in <Function F>. <Lock L> must NOT be held in <Function F>. <Function A> must be called from <Function B> <Function A> must NOT be called from <Function B> Other templates exist (see paper) User can add more templates

Handling Comments Extract comments Classify comments (rule generation)
NLP, keyword filters, correlated word filters Classify comments (rule generation) Manually label small subset Create decision tree with machine learning Decision tree matches comments to templates Fill template parameters with actual variables Training is optional for users •The training is optional for the users • Done by us before releasing iComment (only once per topic). • Feasible because: • Programmers share wording and phrasing (confirmed by our correlated word results) • Cross-software training results show decision trees trained on one software can classify comments from other software with high accuracy (~89%) • Took only about 2 hours to manually classify comments of 2 topics for Linux, Mozilla, Apache and Wine

Rule Checker Static analysis Display the inconsistencies
Flow sensitive and context sensitive Scope of comments Display the inconsistencies Sorted by ranking (support probability) Scope of comments -- applicability, where do they occur and what do they reference / apply to? Support probability: SP = numSupport / (numSupport + numViolations) numSupport is the number of cases where the rule holds) numViolations is the number of cases where the rule is violated)

Evaluation Four large software projects
SLOC #Cmts. Language Description Linux 5.0 M 1.0 M C OS Mozilla 3.3 M 0.51M C, C++ Browser Suite Wine 1.5 M 0.22 M Runs Windows Apps in Linux Apache 0.27 M 0.06 M Web Server Four large software projects Two topics: locks and function calls Average training data: 18%

Results Automatically detected 60 new bugs and bad comments
Software Mismatch Bugs Bad Cmts. FP Rules Linux 51 (14) 30 (11) 21 (3) 32 1209 Mozilla 6 (5) 2 (1) 4 (4) 3 410 Wine 2 1 149 Apache 64 Total 60 (19) 33 (12) 27 (7) 38 1832 Automatically detected 60 new bugs and bad comments 19 new bugs and bad comments already confirmed by developers False positives exist (38%) Incorrectly generated rules Inaccuracy of checking rule Of the 60 inconsistencies, 33 were new bugs, 27 were bad comments. AT least 37 mismatches are impossible to detect with previous work. Accuracy = (Total number of correctly identified comments) / (total number of comments extracted) Precision = T+ / (T F+) Recall = T+ / (T F-) False positives are “reasonable”…. Due in part to how the static analysis doesn’t handle pointer or array aliasing, structs,etc.

Training Accuracy Accuracy: % of correct mismatches Linux Mozilla Wine
Apache 90.8% 91.3% 96.4% 100% —— Software-specific training —— Training SW Mozilla Wine Apache Linux 81.5% 78.6% 83.3% Linux+Mozilla —— 89.3% 88.9% This slide shows the accuracy of rule extraction (through the machine learned decision trees) —— Cross-software training ——

Related Work Extracting rules from source code Annotations
iComment employs static analysis but not dynamic traces Annotations Poor adoption rates Requires manual effort per comment Documentation generation No usage of NLP iComment also analyzes unstructured comments

Complexity Detecting inconsistencies Code maintenance
NLP Abstracted away by tools Machine learning Simple manual training rules Code maintenance Developers may forget to be thorough Automatic bug detection Locking errors are extremely complex So now that we know what iComment is capable of, we can how it can help developers manage complexity. Sometimes when developers make updates in the process of maintaining code, they’ll apply a manual update to one section but forget to update other parts.

Author Bio Primary author: Lin Tan Improving software reliability
Comments Source code Execution traces Manual input HotComments – prior ideas paper Professor at University of Waterloo Lin Tan's overarching research theme is improving software reliability by leveraging code comments. A lot of previous work uses source code, execution traces, and human aid to trace bugs and identify problems. She has 3 patents but none of them relate to leveraging code comments. Preceded by "HotComments", which was simply an ideas paper that posited the idea of analyzing code comments to detect inconsistencies between comments and source code.

Author Bio Secondary author: Ding Yuan
Reliability of large software systems Better logging Enhanced output Ding Yuan is a new professor at University of Toronto Better logging systems for system administrators so that they can better diagnose system failures in a post-mortem debugging environment. His other main works focus on enhancing the output of loggers, even without source code available.

Author Bio Professor: Yuanyuan Zhou
Better debuggers, software reliability Founded PatternInsight The authors were working under YuanYuan Zhou at UIUC, who focuses on software reliability and how to improve debugging practices. She has since moved to UC San Diego and founded a startup called PatternInsight which incorporates tools from iComment and other static analysis softwares to find bugs using both source code and execution logs. Code Insight is the product that is most similar to iComment, and it has major corporate customers like Intel, Cisco, Motorola, Qualcomm, EMC, etc. They then followed up with aComment, a work that analyzes code, specifically function calls and comments, to generate special annotations that specify invariants to which the code must hold. aComment uses these annotations to detect bugs related to interrupts and concurrency issues. It computes function preconditions and postconditions that iComment wasn't able to ascertain just from looking at comments. When iComment looks at a comment for locking rules, 25% of the comments specified rules/conditions for the lock to be held or not held. But only 5% of comments about interrupts actually specify rules; the other 95% are just discussion or noteworthy points about interrupts. It uses context clues like "BUG_ON(!irqs_disabled) to assert that interrupts are disabled on a function entry. a similar work like iComment but it works with JavaDoc.

PatternInsight Startup

Conclusion Comment-code inconsistencies are bad
Poorer software quality and reliability First work to automatically analyze comments Uses NLP and static code analysis Detected real bugs in Linux/Mozilla Manages complexity of code consistency and maintenance

/* iComment: Bugs or Bad Comments? */

Similar presentations

Presentation on theme: "/* iComment: Bugs or Bad Comments? */"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

/* iComment: Bugs or Bad Comments? */

Similar presentations

Presentation on theme: "/* iComment: Bugs or Bad Comments? */"— Presentation transcript:

Similar presentations

About project

Feedback