International Conference on Program Comprehension (ICPC) 2008 A Traceability Technique for Specifications Aharon Abadi, Mordechai Nisenson and Yahalomit.

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

CS Data Structures I Chapter 6 Stacks I 2 Topics ADT Stack Stack Operations Using ADT Stack Line editor Bracket checking Special-Palindromes Implementation.
Chapter 1: The Database Environment
Copyright © 2002 Pearson Education, Inc. Slide 1.
Distributed Systems Architectures
Chapter 7 System Models.
Requirements Engineering Process
Chapter 6 Structures and Classes. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-2 Learning Objectives Structures Structure types Structures.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Library 1 Electronic Resources in the EUI Library Veerle Deckmyn, Library Director Aimee Glassel, Electronic Resources Librarian September 2, 2009.
1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Chapter 3 Critically reviewing the literature
|epcc| NeSC Workshop Open Issues in Grid Scheduling Ali Anjomshoaa EPCC, University of Edinburgh Tuesday, 21 October 2003 Overview of a Grid Scheduling.
Spectral Clustering Eyal David Image Processing seminar May 2008.
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)
Configuration management
Information Systems Today: Managing in the Digital World
Vanderbilt Business Objects Users Group 1 Reporting Techniques & Formatting Beginning & Advanced.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
Chapter 6 The Mathematics of Diversification
Text Categorization.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Traditional IR models Jian-Yun Nie.
© 2012 National Heart Foundation of Australia. Slide 2.
Page 1 of 43 To the ETS – Bidding Query by Map Online Training Course Welcome This training module provides the procedures for using Query by Map for a.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Executional Architecture
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
1 IPSI 2003 © 2003 T. Abou-Assaleh, N. Cercone, & V. Keselj An Overview of the Theory of Relaxed Unification Tony Abou-Assaleh Nick Cercone & Vlado Keselj.
25 seconds left…...
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Januar MDMDFSSMDMDFSSS
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
12 January 2009SDS batch generation, distribution and web interface 1 ExESS IT tool for SDS batch generation, distribution and web interface ExESS IT tool.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
1 Chap 14 Ranking Algorithm 指導教授 : 黃三益 博士 學生 : 吳金山 鄭菲菲.
14.1 Chapter 14 Wireless LANs Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Barbara Kitchenham Emilia Mendes Guilherme Travassos.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Chapter 5: Information Retrieval and Web Search
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Chapter 6: Information Retrieval and Web Search
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
Presentation transcript:

International Conference on Program Comprehension (ICPC) 2008 A Traceability Technique for Specifications Aharon Abadi, Mordechai Nisenson and Yahalomit Simionovici

ICPC A Comparison of Traceability Techniques for Specifications Outline Motivation Goals Our Solution: Outline of Traceability Link Process IR Techniques Experiments Conclusions Future work

ICPC A Comparison of Traceability Techniques for Specifications Traceability The ability to link between different artifacts –Example artifacts: code, user manuals, design documentation, development wikis, etc. In particular, link code to: –Relevant requirements –Sections in design documents –Test-cases –Other structured and free-text artifacts Also, link from requirements, design documents, etc. to code

ICPC A Comparison of Traceability Techniques for Specifications Whats Traceability Good For? Program Comprehension –Top-down –Bottom-up Particularly relevant for the maintenance of legacy systems Impact analysis –Keeping non-code artifacts up-to-date Requirement Tracing –Discover what code needs to change to handle a new req. –Aid in determining whether a specification is completely implemented and covered by tests

ICPC A Comparison of Traceability Techniques for Specifications Challenges Scalability –Large # of artifacts Heterogeneity –Large # of different document formats and programming languages Noisy –Free text information (natural language): conjuctions, prepositions, abbreviations, etc. –Some information may be outdated, or just plain wrong Prior work: –Recovering Traceability Links in Software Artifact Management Systems using information retrieval methods [Lucia et al., 2007] –Recovering Traceability Links between Code and Documentation [Antoniol et al., 2002, Deerwester et al., 1990, Marcus and Maletic, 2003]

ICPC A Comparison of Traceability Techniques for Specifications Outline Motivation Goals Our Solution: Outline of Traceability Link Process IR Techniques Experiments Conclusions Future work

ICPC A Comparison of Traceability Techniques for Specifications Example /* * The File interface provides… */ public class FileImpl extends FilePOA{ private String nativefileName; /** * Creates a new File… */ public FileImpl(String nativePath...){ … } /** *… */ Private String f(..){…} }

ICPC A Comparison of Traceability Techniques for Specifications Goals Examine the effectiveness of IR techniques for traceability between code and documentation on real world data Most prior work compared 2 specific algorithms, LSI and VSM –Is LSI really better? –How does LSI stack up with other dimensionality reduction techniques? –How does it compare with other non-dimensionality reduction techniques? How do different levels of abstraction affect the choice of the best methods? –How to fit a method and parameters to a dataset?

ICPC A Comparison of Traceability Techniques for Specifications Outline Motivation Goals Our Solution: Outline of Traceability Link Process IR Techniques Experiments Conclusions Future work

ICPC A Comparison of Traceability Techniques for Specifications Traceability Link Process Text Preprocessing Sectoring Document Pre-processing IR-Index Words expansion words extraction Query Construction Words ranking documents sections Off line processes partial code (word 1,rank 1 ),…,(word m, rank m ) sections Text Preprocessing (word 1,rank 1 ),…,(word m, rank m )

ICPC A Comparison of Traceability Techniques for Specifications Text Preprocessing Text Preprocessing … Copyright owners grant member companies of the OMG permission to make a limited … … copyright owner grant member compani omg permiss make limit … Lower-case, stop-words, number etc. Stemming

ICPC A Comparison of Traceability Techniques for Specifications /* * The File interface provides… */ public class FileImpl extends FilePOA{ private String nativefileName; /** * Creates a new File… */ public FileImpl(String nativePath...){ … } /** *… */ Private String f(..){…} } Words Extraction words extraction FileImpl Class Name Public Function names Public function arguments and return type Comments Super class name FileImpl nativePath FilePOA Creates a new File… The File interface provides…

ICPC A Comparison of Traceability Techniques for Specifications Words Expansion Words expansion …NativePath, fileName, delete_all_elements… … NativePath,Native,Path, fileName, File,Name, delete_all_elements, Delete,all,elements … Use well-known coding standards for sub-words separation

ICPC A Comparison of Traceability Techniques for Specifications Outline Motivation Goals Our Solution: Outline of Traceability Link Process IR Techniques Experiments Conclusions Future work

ICPC A Comparison of Traceability Techniques for Specifications Information Retrieval (IR) Methods Vector Space Model (VSM) [Salton et al., 1975] implemented by Lucene –Each document, d, is represented by a vector of ranks of the terms in the vocabulary: v d = [ r d ( w 1 ), r d ( w 2 ), …, r d ( w | V | )] –The query is similarly represented by a vector –The similarity between the query and document is the cosine of the angle between their respective vectors Jensen Shannon Similarity Model [Abadi et al., 2008] –Each document, d, is represented by its empirical probability distribution over words: p d ( w ) –The query is similarly represented –The similarity score is calculated as 1 – JS ( p q, p d ), where JS is the Jensen- Shannon Divergence

ICPC A Comparison of Traceability Techniques for Specifications Dimensionality Reduction Methods LSI [Deerwester et al., 1990] –Commonly used in prior studies –An algebraic method –Dimensions represent orthogonal topics PLSI [Hofmann, 1999] –Probabilistic extension to LSI –Based on the assumption that documents are mixtures of topics distributions –Words and documents are conditionally independent given the topic SDR [Globerson and Tishby, 2003] –Based on information theory –Topics are sufficient statistics in information theory terms –These statistics are functions that capture maximum mutual information between words and documents

ICPC A Comparison of Traceability Techniques for Specifications Outline Motivation Goals Our Solution: Outline of Traceability Link Process IR Techniques Experiments Conclusions Future work

ICPC A Comparison of Traceability Techniques for Specifications Datasets Software Communication Architecture (SCA) is an open architecture framework that defines how software and hardware elements operate within a software defined radio. Common Object Request Broker Architecture (CORBA) is OMG's open, vendor-independent architecture and infrastructure that computer applications use to work together over networks. DatasetSize (MB)SectionsVocabulary size SCA CORBA Documentation details: Queries details: Dataset# classes# relevant results / query Total # of relevant results SCA76 – 1365 CORBA45 – 2058

ICPC A Comparison of Traceability Techniques for Specifications IR Quality Measures n: n: Average precision:

ICPC A Comparison of Traceability Techniques for Specifications MAP versus Method

ICPC A Comparison of Traceability Techniques for Specifications Mean Average Precision (MAP) versus Dimension

ICPC A Comparison of Traceability Techniques for Specifications Precision versus Recall

ICPC A Comparison of Traceability Techniques for Specifications Dimensionality of Datasets SCACORBA PLSI Results

ICPC A Comparison of Traceability Techniques for Specifications Precision versus Recall over Algorithms for SCA

ICPC A Comparison of Traceability Techniques for Specifications Precision versus Recall over Algorithms for CORBA

ICPC A Comparison of Traceability Techniques for Specifications MAP versus Method – Combined over SCA & CORBA

ICPC A Comparison of Traceability Techniques for Specifications Outline Motivation Our Solution: Outline of Traceability Link Process Similarity measures IR Techniques IR Quality Measures Experiments Conclusions Future work

ICPC A Comparison of Traceability Techniques for Specifications Conclusions Our Most significant results are: –Traceability between code and documentation in real world systems is effective via IR techniques. –For realistic datasets the Vector Space Model and Jensen Shannon model, which did not perform dimensionality reduction where shown to be the most effective. –SDR was shown to be the best dimensionality reduction model, specifically it is better then LSI. –As the documentation links are more abstract, the performance of VSM, JS model and SDR become equivalent. Additional results: –SDR was shown to be robust to datasets abstractness level –LSI and PLSI are sensitive to datasets abstractness level –We believe that PLSI poor performance is due to the difficulty of modeling very short documents, which could result in severe overfitting

ICPC A Comparison of Traceability Techniques for Specifications Outline Motivation Our Solution: Outline of Traceability Link Process Similarity measures IR Techniques IR Quality Measures Experiments Conclusions Future work

ICPC A Comparison of Traceability Techniques for Specifications Future work Development of new measures for evaluation of different IR algorithms and datasets, specifically for traceability –Example: developing a measure of abstractness for a specification which will help with tuning of parameters such as dimensionality Using dimensionality reduction techniques for creating thesaurus from the indexed data and using it for adding synonyms to the query Traceability for other types of documents and links Investigate alternative methods for query construction

ICPC A Comparison of Traceability Techniques for Specifications References A.D. Lucia, F.Fasano, R. Oliveto, and G. Tortora. Recovering Traceability Links in Software Artifact Management Systems using Information Retrieval Methods. ACM Trans. Softw. Eng. Methodol., 16(4):13, G. Antoniol, G. Canfora, G. Casazza, A.D. Lucia, and E. Merlo. Recovering Traceability Links Between Code and Documentation. IEEE Trans. Softw. Eng., 28(10): , S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science, 41(6): , A. Marcus and J. I. Maletic. Recovering Documentation to Source Code Traceability Links using Latent Semantic Indexing. In ICSE 03: Proceedings of the 25 th International Conference on Software Engineering, , G.Salton, A. Wong, and C.S. Yang. A Vector Space Model for Automatic Indexing. Commun. ACM, 18(11): , T.Hofmann, Probabilistic Latent Semantic Indexing. In SIGIR, 50-57, A. Globerson and N. Tishby. Sufficient Dimensionality Reduction. Journal of Machine Learning Research, 3: , 2003.

ICPC A Comparison of Traceability Techniques for Specifications Thank You!