Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.

Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1

Motivation Concern Localization –Locating code units that match a text descriptions –Text descriptions: bug reports or feature requests –Code units: classes or methods’ source code Documents are compared –Based on words (IR) or topics (topic modeling) that they contain  compared at one level of abstraction i.e. word/topic level 2

Motivation A word can be abstracted at multiple levels of abstraction. 3 Eindhoven North Brabant Netherlands Western Europe European Continent Level 1 Level 2 Level 3 Level N …

Multi-Abstraction Concern Localization 4 Level 1 Level 2 Level 3 Level N … Level 1 Level 2 Level 3 Level N … Source Code Bug Report or Feature Request compare

Multi-Abstraction Concern Localization Locating code units that match a textual descriptions –By comparing documents at multiple abstraction levels. –By leveraging multiple topic models 3 main components –Text preprocessing –Hierarchy creation –Multi-abstraction retrieval technique 5

Abstraction Hierarchy Method Corpus Concerns Preprocessing Hierarchy Creation Level 1 Level 2 Level N …. Standard Retrieval Technique + Multi- Abstraction Retrieval Ranked Methods Per Concern Overall framework 6

Hierarchy Creation We apply Latent Dirichlet Allocation (LDA) a number of times LDA (with default setting) accepts –Number of topics K –A set of documents LDA returns –K topics, each is a distribution of words –Probability of topic t to appear in document d 7

Hierarchy Creation Each application of LDA creates a topic model with K topics –Assigned to a document –Corresponds to an abstraction level Abstraction hierarchy of height L –Height = number of topic models –Created by L LDA applications 8

Multi-Abstraction Vector Space Model Multi-Abstraction Vector Space Model (VSM) –Standard VSM + Abstraction Hierarchy In standard Vector Space Model –Document is represented as a vector of weights –Each element corresponds to a word Its value is the weight of the word Term frequency-inverse document frequency (tf-idf) 9

Multi-Abstraction Vector Space Model We extend document vectors Added elements: –Topics of topic models in the abstraction hierarchy –Their values are the probabilities of the topics to appear in the documents Example: –Document vector has length of 10 –Abs. hierarchy has 3 topic models of size 50,100,150 –Extended document vector is of size: 10+ (50+100+150) = 310 10

Experiments Dataset: –285 AspectJ faulty versions extracted from iBugs Evaluation Metric: –Mean Average Precision (MAP) 11 HierarchiesNumber of Topics H150 H250, 100 H350, 100, 150 H450, 100, 150, 200

Empirical Result MAPImprovement Over Baseline Baseline (VSM)0.0669N/A H10.07156.82% H20.077716.11% H30.078717.65% H40.079919.36%  The MAP improvement of H4 is 19.36%  The MAP is improved when the height of the abstraction hierarchy is increased 12

Empirical Result Improvement(p)H1H2H3H4 212730 25222522 18141211 113644241 108158176181 Number of concerns with various Improvements:  The improvements are positive for most of the concerns 13

Conclusion We propose a multi-abstraction concern localization framework We also propose a multi-abstraction vector space model Our experiments on 285 AspectJ bugs show that MAP improvement is up to 19.36% 14

Future work Extend experiments by investigating: –Different numbers of topics in each level of the hierarchy –Different hierarchy heights –Different topic models Topic ModelWord Ordering Word Correlation Latent Dirichlet AllocationBag of WordsNo Pachinko Allocation ModelBag of WordsYes Syntactic Topic ModelSequence of WordsNo 15

Future work Analyze the effects of document lengths: –For different number of topics –For different hierarchy heights  Experiment with Panichella et al. ‘s method [1] to infer good LDA configurations for our approach –[1] A. Panichella, B. Dit, R.Oliveto, M.D. Penta, D. Poshyvanyk, and A.D Lucia. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. (ICSE 2013) 16

17 Thank you! Questions? Comments? Advice? {btdle.2012, shaoweiwang.201, davidlo}@smu.edu.sg

Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.

Similar presentations

Presentation on theme: "Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.

Similar presentations

Presentation on theme: "Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1."— Presentation transcript:

Similar presentations

About project

Feedback