Caimei Lu et al. (KDD 2010) Presented by Anson Liang.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Unsupervised Modeling of Twitter Conversations
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook Department of Computer Science, KAIST Dabi Ahn, Taehun Kim,
Title: The Author-Topic Model for Authors and Documents
Final Project Presentation Name: Samer Al-Khateeb Instructor: Dr. Xiaowei Xu Class: Information Science Principal/ Theory (IFSC 7321) TOPIC MODELING FOR.
Statistical Topic Modeling part 1
MICHAEL PAUL AND ROXANA GIRJU UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics.
Generative Topic Models for Community Analysis
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Personalized Search Result Diversification via Structured Learning
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Latent Dirichlet Allocation a generative model for text
British Museum Library, London Picture Courtesy: flickr.
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Popularity-Aware Topic Model for Social Graphs Junghoo “John” Cho UCLA.
ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Probabilistic Topic Models
27. May Topic Models Nam Khanh Tran L3S Research Center.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
KDD-2008 Anticipating Annotations and Emerging Trends in Biomedical Literature Fabian Mörchen, Mathäus Dejori, Dmitriy Fradkin, Julien Etienne, Bernd Wachmann.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Bayesian Machine learning and its application Alan Qi Feb. 23, 2009.
Abdul Wahid, Xiaoying Gao, Peter Andreae
Techniques for Dimensionality Reduction
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
Link Distribution on Wikipedia [0407]KwangHee Park.
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS) Authors: Qiming Diao, Minghui Qiu, Chao-Yuan Wu Presented by Gemoh Mal.
Collaborative Deep Learning for Recommender Systems
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
October Rotation Paper-Reviewer Matching Dina Elreedy Supervised by: Prof. Sanmay Das.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th.
Neighborhood - based Tag Prediction
The topic discovery models
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi
The topic discovery models
Community-based User Recommendation in Uni-Directional Social Networks
a chicken and egg problem…
The topic discovery models
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Michal Rosen-Zvi University of California, Irvine
Junghoo “John” Cho UCLA
Topic Models in Text Processing
INF 141: Information Retrieval
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu
Presentation transcript:

Caimei Lu et al. (KDD 2010) Presented by Anson Liang

Documents: articles, web pages, etc. Tags: e.g. economy, science, facebook, twitter … Users: Add Annotate

E.g.  Delicious.com ◦ Social bookmarking

Web pages Tags Users

 Different users have different perspectives! Movie, action, inspiration Movie, action, violent

 Objectives ◦ Simulate the generation process of social annotations ◦ Automatically generate tags for documents based on user perspective

 Topic Analysis using Generative Models ◦ Latent Dirichlet Allocation (LDA) model  Discover topics for a given document  A document has many words  A document may belong to many topics  A word may belong to many topics  A topic may be associated with many words Documents Words Topics 1..N N..N

 LDA – Plate notation  α is the parameter of the uniform Dirichlet prior on the per-document topic distributions.  β is the parameter of the uniform Dirichlet prior on the per-topic word distribution.  θ i is the topic distribution for document i,  z ij is the topic for the j th word in document i, and  w ij is the specific word.  The w ij are the only observable variables, and the other variables are latent variables.observable variableslatent variables  φ is a K*V (V is the dimension of the vocabulary) Markov matrix each row of which denotes the word distribution of a topic. Ref:

 Conditionally-independent LDA (Ramage et al.) ◦ Tags are generated from the topics (the same source as the words)

 Correlated LDA (Bundschus et al.) 1.Generates topics associated with the words for a document 2.Generates tags from the topics associated with the words

 Problem ◦ Assume words and tags are generated from the same source  However, words are generated by the authors, and tags are generated by users (readers)  User information is missed in the tag generation process!

 Three classification schemas of social tags

 Overview ◦ Two factors act on the tag generation process: 1.The topics of the document 2.The perspective adopted by the user

 Model Formulation Standard LDA Switch 0/1 X=1: Tags are generated from the topics learned from the words X=0: Tags are generated from user perspectives User-perspective distribution Perspective-tag distribution Topic-word distribution Topic-tag distribution Document-topic distribution Probability that each tag is generated from topics

User-perspective distribution Perspective-tag distribution Topic-word distribution Topic-tag distribution Document-topic distribution

Probability that each tag is generated from topics

 Parameter Estimation 1. document-topic distribution θ(d) 2. topic-word distribution φ (w) 3. topic-tag distribution φ (t) 4. user-perspective distribution θ(u) 5. perspective-tag distribution ψ 6. binomial distribution λ  Gibbs Sampling!

 Parameter Estimation

 Datasets & Setup ◦ A social tagging dataset from delicious.com after refining  documents  4414 users  unique tags  uniques words ◦ 90% for training, 10% for testing

 Evaluation Criterion ◦ Perplexity  Reflects the ability of the model to predict tags for new unseen documents  A lower perplexity score -> better generalization performance

 Number of topics vs. number of perspectives

 Tag Perplexity

 Discovered Topics ◦ Observed high semantic correlations among the top words and tags for each topic

 Discovered Perspectives ◦ Perspectives are more complicated than topics ◦ The correlation among the tags assigned to each perspective is not as obvious as those for topics

 The generation sources of tags Tags are generated from the topics learned from the words Tags are generated from user perspectives

 Apply the results of the model for 1.tag recommendation 2.Personalized information retrieval

 Topic-perspective LDA model ◦ Simulate the tag generation process in a more meaningful way ◦ Tags are generated from both the topics in the document and the user perspective ◦ May be applied in computer vision tagging problems

 Thank you!