Title: The Author-Topic Model for Authors and Documents

Slides:

Advertisements

Similar presentations

Recommender System A Brief Survey.

Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.

Xiaolong Wang and Daniel Khashabi

Information retrieval – LSI, pLSI and LDA

Hierarchical Dirichlet Processes

An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.

LDA Training System 8/22/2012.

Probabilistic Clustering-Projection Model for Discrete Data

Final Project Presentation Name: Samer Al-Khateeb Instructor: Dr. Xiaowei Xu Class: Information Science Principal/ Theory (IFSC 7321) TOPIC MODELING FOR.

Statistical Topic Modeling part 1

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.

Generative Topic Models for Community Analysis

Caimei Lu et al. (KDD 2010) Presented by Anson Liang.

Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.

Topic Modeling with Network Regularization Md Mustafizur Rahman.

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Latent Dirichlet Allocation a generative model for text

British Museum Library, London Picture Courtesy: flickr.

Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)

Models for Authors and Text Documents Mark Steyvers UCI In collaboration with: Padhraic Smyth (UCI) Michal Rosen-Zvi (UCI) Thomas Griffiths (Stanford)

LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Example 16,000 documents 100 topic Picked those with large p(w|z)

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,

Online Learning for Latent Dirichlet Allocation

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.

Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

27. May Topic Models Nam Khanh Tran L3S Research Center.

Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.

Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.

Integrating Topics and Syntax -Thomas L

Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Building Topic Models in a Federated Digital Library Through Selective Document Exclusion ASIST 2011 New Orleans, LA October 10, 2011 Miles Efron Peter.

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Topic Modeling using Latent Dirichlet Allocation

Latent Dirichlet Allocation

Abdul Wahid, Xiaoying Gao, Peter Andreae

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.

Automatic Labeling of Multinomial Topic Models

Web-Mining Agents Topic Analysis: pLSI and LDA

Analysis of Social Media MLD , LTI William Cohen

Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Collaborative Deep Learning for Recommender Systems

Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th.

Topic Modeling Nick Jordan.

Bayesian Inference for Mixture Language Models

Stochastic Optimization Maximization for Latent Variable Models

Topic models for corpora and for graphs

Michal Rosen-Zvi University of California, Irvine

Dynamic Supervised Community-Topic Model

Latent Dirichlet Allocation

Topic models for corpora and for graphs

Topic Models in Text Processing

Presentation transcript:

Title: The Author-Topic Model for Authors and Documents Authors: Rosen-Zvi, Griffiths, Steyvers, Smyth Venue: the 20th Conference on Uncertainty in Artificial Intelligence Year: 2004 To give you guys a little personal connection… Presenter: Peter Wu Date: Apr 7, 2015

Title: The Author-Topic Model for Authors and Documents Authors: Rosen-Zvi, Griffiths, Steyvers, Smyth Venue: the 20th Conference on Uncertainty in Artificial Intelligence Year: 2004 This paper introduced a new version of generative model for topic modeling Presenter: Peter Wu Date: Apr 7, 2015

Title: The Author-Topic Model for Authors and Documents Authors: Rosen-Zvi, Griffiths, Steyvers, Smyth, Venue: the 20th Conference on Uncertainty in Artificial Intelligence Year: 2004 Extension: added the modeling of authors interest Built upon the original topic model… The extension part is that they added… If you are familiar with the original LDA… Presenter: Peter Wu Date: Apr 7, 2015 Blei, Ng, & Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.

Outline Motivation Model formulation Parameter estimation Evaluation Application

Outline Motivation Model formulation Parameter estimation Evaluation Generative process; plate notation; a comparison with LDA Parameter estimation Gibbs sampling Evaluation Application Describe how the generative process works; and a convenient visualization of such process; then comparison to show the difference and innovation of this paper

Motivation Learning the interests of authors is a fundamental problem raised by large collection of documents. Previous works usually adopt a discriminative approach and features chosen are usually superficial. The authors introduced a generative model that represents each author with a distribution of weights over latent topics. First, it’s an important task because we have large collections; we want to learn the interests but not obvious Of course there are people ; The notion of authors interests is captured. Note that the topics are latent clusters

Motivation Learning the interests of authors is a fundamental problem raised by large collection of documents. Previous works usually adopt a discriminative approach and features chosen are usually superficial. The authors introduced a generative model that represents each author with a distribution of weights over latent topics. Unsupervised clustering algorithm: (only) the number of topics T needs to be specified.

Model Formulation It’s storytelling time for a generative model! Suppose we have a corpus of D documents that: spans a vocabulary of V words is collectively composed by A authors In this corpus, each document d: contains words wd (a subset of the V words with cardinality Nd; order doesn’t matter) is composed by authors ad (a subset of the A authors) We see what we observe and hypothesize a story of how what we see is generated; what we observe here is…

Model Formulation It’s storytelling time for a generative model! This is what we observe! It’s storytelling time for a generative model! Suppose we have a corpus of D documents that: spans a vocabulary of V words is collectively composed by A authors In this corpus, each document d: contains words wd (a subset of the V words with cardinality Nd; order doesn’t matter) is composed by authors ad (a subset of the A authors)

Model Formulation It’s storytelling time for a generative model! This is what we observe! It’s storytelling time for a generative model! Suppose we have a corpus of D documents that: spans a vocabulary of V words is collectively composed by A authors In this corpus, each document d: contains words wd (a subset of the V words with cardinality Nd; order doesn’t matter) is composed by authors ad (a subset of the A authors) How could such a corpus be created? Now we ask the question how could such a corpus be created

Model Formulation (Cont’d) How could what we observe be created? We introduce a latent layer of topic clusters, whose number T is specified by human, just like any unsupervised clustering algorithm (e.g., k-means). Suppose each of the A authors writes about the T topics with different probabilities: author k (𝑘∈ 1,…,𝐴 ) writes about topic j (𝑗∈ 1,…,𝑇 ) with probability 𝜃 𝑘𝑗 . Probabilities 𝜃 𝑘𝑗 form an A×T matrix, representing author-topic distributions. Suppose each of the T topics is represented by a distribution of different weights over the V words in the vocabulary: given a topic j (𝑗∈ 1,…,𝑇 ), word wm (𝑚∈ 1,…,𝑉 ) has a probability 𝜑 𝑗𝑚 to be used. Probabilities 𝜑 𝑗𝑚 form a T×V matrix, representing topic-word distributions. To generate each word in each document: Uniformly choose an author x among the document’s authors ad Sample a topic z from author x’s topic distribution 𝜽 𝑥∙ (author x’s row in the A×T matrix) Sample a word from topic z’s word distribution 𝝋 𝑧∙ (topic j’s row in the T×V matrix) Where does these latent topics come into play? It comes between authors and words and generate two distributions. Now we can start the generative process, or drawing; Now a quick quiz

Model Formulation (Cont’d) How could what we observe be created? We introduce a latent layer of topic clusters, whose number T is specified by human, just like any unsupervised clustering algorithm (e.g., k-means). Suppose each of the A authors writes about the T topics with different probabilities: author k (𝑘∈ 1,…,𝐴 ) writes about topic j (𝑗∈ 1,…,𝑇 ) with probability 𝜃 𝑘𝑗 . Probabilities 𝜃 𝑘𝑗 form an A×T matrix, representing author-topic distributions. Suppose each of the T topics is represented by a distribution of different weights over the V words in the vocabulary: given a topic j (𝑗∈ 1,…,𝑇 ), word wm (𝑚∈ 1,…,𝑉 ) has a probability 𝜑 𝑗𝑚 to be used. Probabilities 𝜑 𝑗𝑚 form a T×V matrix, representing topic-word distributions. To generate each word in each document: Uniformly choose an author x among the document’s authors ad Sample a topic z from author x’s topic distribution 𝜽 𝑥∙ (author x’s row in the A×T matrix) Sample a word from topic z’s word distribution 𝝋 𝑧∙ (topic j’s row in the T×V matrix) What distribution is this?

Model Formulation (Cont’d) How could what we observe be created? We introduce a latent layer of topic clusters, whose number T is specified by human, just like any unsupervised clustering algorithm (e.g., k-means). Suppose each of the A authors writes about the T topics with different probabilities: author k (𝑘∈ 1,…,𝐴 ) writes about topic j (𝑗∈ 1,…,𝑇 ) with probability 𝜃 𝑘𝑗 . Probabilities 𝜃 𝑘𝑗 form an A×T matrix, representing author-topic distributions. Suppose each of the T topics is represented by a distribution of different weights over the V words in the vocabulary: given a topic j (𝑗∈ 1,…,𝑇 ), word wm (𝑚∈ 1,…,𝑉 ) has a probability 𝜑 𝑗𝑚 to be used. Probabilities 𝜑 𝑗𝑚 form a T×V matrix, representing topic-word distributions. To generate each word in each document: Uniformly choose an author x among the document’s authors ad Sample a topic z from author x’s topic distribution 𝜽 𝑥∙ (author x’s row in the A×T matrix) Sample a word from topic z’s word distribution 𝝋 𝑧∙ (topic j’s row in the T×V matrix) Multinomial!

Plate Notation Generative process conveniently visualized Matrix 𝜃 and 𝜑 are the parameters we need to estimate in order to learn about the authors’ interests and topic patterns of the observed corpus Vector 𝛼 and 𝛽 are called called Dirichlet priors, which are hyper-parameters governing the multinomial distribution documented in each row of matrix 𝜃 and 𝜑. They are pre-specified to be symmetric parameters and we don’t need to estimate them. All what we discussed last slide can be visualized by this diagram called plate notation We can see that to generate the words matrix theta and fi are the parameters we need to estimate What are alpha and beta?

A Crash Course on Dirichlet Distribution Beta distribution Dirichlet distribution Parameters: 𝜶= 𝛼 1 ,…,𝛼 𝐾 >0,𝐾≥2, determining the shape of the PDF Support: 𝒙={𝑥 1 ,…, 𝑥 𝐾 }∈ 0,1 , 𝑖=1 𝐾 𝑥 𝑖 =1 PDF: Sampling result: a vector of values between 0 and 1 that sum up to 1 that can be used as the 𝒑={𝑝 1 ,…, 𝑝 𝐾 } parameters in multinomial distribution 𝑀𝑢𝑙𝑡𝑖(𝑛,𝒑) Parameters: 𝛼>0, 𝛽>0, determining the shape of the PDF Support: 𝑥∈[0,1] PDF: Sampling result: a value between 0 and 1 that can be used as the p parameter in binomial distribution 𝐵(𝑛,𝑝) For convenience we set

A Comparison between ATF and LDA Author-Topic Model (Rosen-Zvi et al, 2004) Latent Dirichlet Allocation (Blei et al, 2003) Now we finished formulating the model we’ll move onto parameter estimation

Outline Motivation Model formulation Parameter estimation Evaluation Generative process; plate notation; comparison with LDA Parameter estimation Gibbs sampling Evaluation Application

Parameter Estimation Parameters: Strategy: all elements 𝜃 𝑘𝑗 and 𝜑 𝑗𝑚 in the two matrices 𝜃 and 𝜑 Strategy: instead of estimating 𝜃 and 𝜑 directly, sample the authorship and topic assignment for each word in each document, and estimate 𝜃 and 𝜑 from the sample of assignments. The strategy is kind of complicated but look at it a little more closely

Parameter Estimation Parameters: Strategy: all elements 𝜃 𝑘𝑗 and 𝜑 𝑗𝑚 in the two matrices 𝜃 and 𝜑 Strategy: instead of estimating 𝜃 and 𝜑 directly, sample the authorship and topic assignment for each word in each document, and estimate 𝜃 and 𝜑 from the sample of assignments. “instead of estimating 𝜃 and 𝜑 directly” Why?  Intractable! “sample the authorship and topic assignment for each word in each document” How?  Gibbs Sampling! “estimate 𝜃 and 𝜑 from the sample of assignments” How?  We have a formula for this.

“Sample the authorship and topic assignment for each word in each document” How to do this? Gibbs Sampling! Gibbs sampling: When sampling from a joint distribution is impossible or hard, iteratively draw and update sample from conditional distributions and it will converge to a sample as if drawn from the joint distribution. Toy example: Sampling from a bivariate joint distribution 𝑝 𝜃 1 ,𝜃 2 Real application in Author-Topic Model: For each document d, we want to draw samples from 𝑝 𝑥 1 ,𝑧 1 ,… ,𝑥 𝑁 𝑑 , 𝑧 𝑁 𝑑 |𝒘 𝑑 , 𝒂 𝑑 , but it’s not feasible; Instead, we initialize with random assignments for all words, and; Update the authorship and topic assignment for each word in document d, ( 𝑥 𝑖 ,𝑧 𝑖 ), 𝑖=1,…, 𝑁 𝑑 , with a sample from a conditional distribution. So gibbs sampling is the practice of iteratively

“Sample the authorship and topic assignment for each word in each document” How to do this? Gibbs Sampling! Gibbs sampling: When sampling from a joint distribution is impossible or hard, iteratively draw and update sample from conditional distributions and it will converge to a sample as if drawn from the joint distribution. Toy example: Sampling from a bivariate joint distribution 𝑝 𝜃 1 ,𝜃 2 Real application in Author-Topic Model: For each document d, we want to draw samples from 𝑝 𝑥 1 ,𝑧 1 ,… ,𝑥 𝑁 𝑑 , 𝑧 𝑁 𝑑 |𝒘 𝑑 , 𝒂 𝑑 , but it’s not feasible; Instead, we initialize with random assignments for all words, and; Update the authorship and topic assignment for each word in document d, ( 𝑥 𝑖 ,𝑧 𝑖 ), 𝑖=1,…, 𝑁 𝑑 , with a sample from a conditional distribution: We want to sample the author and topic assignment The distribution of the authorship and topic assignment for one word conditioned on all other words and the assignments for all other words

“Sample the authorship and topic assignment for each word in each document” 𝐶 𝑚𝑗 𝑊𝑇 : the number of times a word m is assigned to topic j 𝐶 𝑘𝑗 𝐴𝑇 : the number of times (a word m with) topic j is assigned to author k 𝐶 𝑚𝑗 𝑊𝑇 +𝛽 𝑚′ 𝐶 𝑚′𝑗 𝑊𝑇 +𝑉𝛽 : smoothed probability of a word being sampled given a topic 𝐶 𝑘𝑗 𝐴𝑇 +𝛼 𝑗′ 𝐶 𝑘𝑗′ 𝐴𝑇 +𝑇𝛼 : smoothed probability of a topic being sampled given an author The product will converge after many iterations the distribution of the authorship and topic assignment for one word conditioned on all other words and the assignments for all other words

“Sample the authorship and topic assignment for each word in each document” 𝐶 𝑚𝑗 𝑊𝑇 : the number of times a word m is assigned to topic j 𝐶 𝑘𝑗 𝐴𝑇 : the number of times a word m with topic j is assigned to author k 𝐶 𝑚𝑗 𝑊𝑇 +𝛽 𝑚′ 𝐶 𝑚′𝑗 𝑊𝑇 +𝑉𝛽 : smoothed probability of a word being sampled given a topic 𝐶 𝑘𝑗 𝐴𝑇 +𝛼 𝑗′ 𝐶 𝑘𝑗′ 𝐴𝑇 +𝑇𝛼 : smoothed probability of a topic being sampled given an author The product will converge after many iterations the distribution of the authorship and topic assignment for one word conditioned on all other words and the assignments for all other words The two fractions are the probabilities documented in the two matrices! Doesn’t this sound familiar?

“Estimate 𝜃 and 𝜑 from the sample of assignments” How to do this? We have a formula for this, which is the converged probabilities/weights in the last step: Sure enough, we use the formula for the conditional distribution for the parameters

Outline Motivation Model formulation Parameter estimation Evaluation Generative process; plate notation; comparison with LDA Parameter estimation Gibbs sampling Evaluation Given a test document and its author(s), calculated perplexity score Application Predict the authors of a test document 1. Perplexity: the probability that the test document is generated by the generative model we built 2. To predict the authors, simply rank the perplexity score given different authors, the ones with lower perplexities are more likely to be the real authors

Takeaway By incorporating in the generative process a word-level author choosing and topic choosing according to an author-topic distribution, the Author-Topic Model manages to learn the relationship between authors and topics, and topic and words.

Takeaway By incorporating in the generative process a word-level author choosing and topic choosing according to an author-topic distribution, the Author-Topic Model manages to learn the relationship between authors and topics, and topic and words. Gibbs sampling is a solution for the difficulty of sampling from joint multivariate distributions and is used for inferring parameter values for generative models. The Author-Topic Model can also be used to predict authors of an unseen documents The learned relationship between authors and topics solved the problem raised in the motivation part

Thank you! Questions?