Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Information retrieval – LSI, pLSI and LDA
Multi-AbstractionRetrievalMulti-AbstractionRetrieval MotivationMotivation ExperimentsExperiments Overall Framework Multi-Abstraction Concern Localization.
Title: The Author-Topic Model for Authors and Documents
Final Project Presentation Name: Samer Al-Khateeb Instructor: Dr. Xiaowei Xu Class: Information Science Principal/ Theory (IFSC 7321) TOPIC MODELING FOR.
Statistical Topic Modeling part 1
A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of Illinois) Ryan McDonald (Google Inc.) ACL 2008.
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Noor Fouad Al-Emadi IEEE 1074 Standard for Developing Life Cycle Processes.
Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.
Latent Dirichlet Allocation a generative model for text
MusicSense: Contextual Music Recommendation using Emotional Allocation Modeling Rui Cai, Chao Zhang, Chong Wang, Lei Zhang, and Wei-Ying Ma Proceedings.
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.
FACT2 Learning Analytics Task Group. LATG Task Group Charge 1.Identify a STRATEGY and course of action for further exploration and implementation of Learning.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis.
Probabilistic Question Recommendation for Question Answering Communities Mingcheng Qu, Guang Qiu, Xiaofei He, Cheng Zhang, Hao Wu, Jiajun Bu, Chun Chen.
Online Learning for Latent Dirichlet Allocation
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
UNIVERSITAS SCIENTIARUM SZEGEDIENSIS UNIVERSITY OF SZEGED D epartment of Software Engineering New Conceptual Coupling and Cohesion Metrics for Object-Oriented.
Software Measurement & Metrics
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Link Distribution on Wikipedia [0407]KwangHee Park.
State of Georgia Release Management Training
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Homework 5 Corrections: Only need to compute Sum-Squared-Error and Average Entropy of clusters, not “cohesion” and “separation”
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Applying Combinatorial Testing to Data Mining Algorithms
Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th.
Online Multiscale Dynamic Topic Models
The topic discovery models
An Additive Latent Feature Model
The topic discovery models
Latent Dirichlet Analysis
Matching Words with Pictures
a chicken and egg problem…
The topic discovery models
Topic Modeling Nick Jordan.
Resource Recommendation for AAN
Latent Dirichlet Allocation
CS246: Latent Dirichlet Analysis
Junghoo “John” Cho UCLA
Automated Analysis and Code Generation for Domain-Specific Models
Topic Models in Text Processing
Hierarchical Relational Models for Document Networks
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Process Wind Tunnel for Improving Business Processes
Presentation transcript:

TopicXP: Exploring Topics in Source Code using Latent Dirichlet Allocation Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk 26th IEEE International Conference on Software Maintenance Timişoara, Romania September 16, 2010 Good evening. My name is Malcom Gethers and I’m a PhD student in the SEMERU Group at the College of William and Mary. Today I will demonstrating a tool we developed, namely TopicXP. This tool assist developers with program understanding by utilizing the notion of topics which are obtained by modeling source code using the topic model latent dirichlet allocation. Additionally, the tool leverages structural information to provide developers with additional understanding of how topics in the source code relate. Before I begin my demonstration let me provide you with background on LDA and Maximal Weighted Entropy, a cohesion metric which the tool implements and utilizes.

Latent Dirichlet Allocation (LDA) LDA is a topic model which models documents as a probabilistic mixture of topics. The model is emerging as a useful tool for various software maintenance tasks. As input LDA accepts a collection of documents. Each document corresponds to a collection of words. Given a collection of documents as well as a parameter indicating the desired number of topics LDA infers topics from the provided documents. Topics are represented as a probabilistic distribution over the set of terms which appear in the collection of documents. For example, a topic related to a given term would have a high probability associated with that term compared to other terms within the corpus. After topics are inferred LDA models each document as a probabilistic distribution over the set of topics. So, a document which discusses a particular topic would be indicated by a high probability of the topic for the document. Probabilistic Topic Models (Latent Dirichlet Allocation –LDA [Blei’03]) Models documents as mixture of topics

Maximal Weighted Entropy (MWE) Maximal Weighted Entropy is a cohesion measure which combines Latent dirichlet allocation and Information entropy. This metric determines the cohesiveness of classes based on how topics are implemented across methods within a class. For example, classes where a topic is consistently discussed in all methods would result in high cohesion. In order to measure cohesion of a class we must analyze the topic distribution of each method within that class. The notion of Occupancy and Distribution are applied to capture the degree to which a topic is relevant to the class and the entropy of the topic across all methods in the class respectively. So, for each topic we evaluate the probability of it appearing in each method. After obtaining that information Occupancy and Distribution can be computed for the given topic. MWE is computed as the maximal of the product of Occupancy and Distribution across all topics. With this metric we are able to leverage LDA and Information entropy to measure cohesiveness of classes. Occupancy(tj) captures the average probability of topic tj Distribution (tj) captures distribution of tj using information entropy MWE(Cj)=max(Occupancy(tj) x Distribution (tj))

Demonstration

SEMERU @ William and Mary Thank you. Questions? SEMERU @ William and Mary http://www.cs.wm.edu/semeru/TopicXP