Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 10-09-010.

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

A Tutorial on Learning with Bayesian Networks

Mixture Models and the EM Algorithm

Expectation Maximization

Dynamic Bayesian Networks (DBNs)

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Segmentation and Fitting Using Probabilistic Methods

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Statistical Topic Modeling part 1

Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.

Generative Topic Models for Community Analysis

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Latent Dirichlet Allocation a generative model for text

Gaussian Mixture Example: Start After First Iteration.

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Visual Recognition Tutorial

Learning Bayesian Networks

Today Logistic Regression Decision Trees Redux Graphical Models

Bayesian Networks Alan Ritter.

Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Latent Dirichlet Allocation (LDA) Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University and Arvind Ramanathan of Oak Ridge National.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Integrating Topics and Syntax -Thomas L

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

HMM - Part 2 The EM algorithm Continuous density HMM.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Lecture 2: Statistical learning primer for biologists

Latent Dirichlet Allocation

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Techniques for Dimensionality Reduction

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Information Bottleneck versus Maximum Likelihood Felix Polyakov.

Web-Mining Agents Topic Analysis: pLSI and LDA

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Latent Dirichlet Allocation (LDA)

Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.

Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)

Analysis of Social Media MLD , LTI William Cohen

Probabilistic models for corpora and graphs. Review: some generative models Multinomial Naïve Bayes C W1W1 W2W2 W3W3 ….. WNWN  M  For each document.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,

Learning Deep Generative Models by Ruslan Salakhutdinov

Classification of unlabeled data:

Statistical Models for Automatic Speech Recognition

Latent Variables, Mixture Models and EM

Latent Dirichlet Analysis

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Stochastic Optimization Maximization for Latent Variable Models

Topic models for corpora and for graphs

LECTURE 07: BAYESIAN ESTIMATION

LDA AND OTHER DIRECTED MODELS FOR MODELING TEXT

Junghoo “John” Cho UCLA

Topic models for corpora and for graphs

Topic Models in Text Processing

Presentation transcript:

Analysis of Social Media MLD , LTI William Cohen

Stochastic blockmodel graphs Last week: spectral clustering Theory suggests it will work for graphs produced by a particular generative model Question: can you directly maximize Pr(structure,parameters|data) for that model?

Outline Stochastic block models & inference question Review of text models – Mixture of multinomials & EM – LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models – Latent-space models, exchangeable graphs, p1, ERGM

Review – supervised Naïve Bayes Naïve Bayes Model: Compact representation C W1W1 W2W2 W3W3 ….. WNWN C W N  M M   

Review – supervised Naïve Bayes Multinomial Naïve Bayes C W1W1 W2W2 W3W3 ….. WNWN  M  For each document d = 1, , M Generate C d ~ Mult( ¢ |  ) For each position n = 1, , N d Generate w n ~ Mult(¢| ,C d )

Review – supervised Naïve Bayes Multinomial naïve Bayes: Learning – Maximize the log-likelihood of observed variables w.r.t. the parameters: Convex function: global optimum Solution:

Review – unsupervised Naïve Bayes Mixture model: unsupervised naïve Bayes model C W N M   Joint probability of words and classes: But classes are not visible: Z

Review – unsupervised Naïve Bayes Mixture model: learning – Not a convex function No global optimum solution – Solution: Expectation Maximization Iterative algorithm Finds local optimum Guaranteed to maximize a lower-bound on the log-likelihood of the observed data

Review – unsupervised Naïve Bayes Quick summary of EM: – Log is a concave function – Lower-bound is convex! – Optimize this lower-bound w.r.t. each variable instead X1X1 X2X2 log(0.5x x 2 ) 0.5log(x 1 )+0.5log(x 2 ) 0.5x x 2 H(  )

Review – unsupervised Naïve Bayes Mixture model: EM solution E-step: M-step: Key capability: estimate distribution of latent variables given observed variables

Review - LDA

Motivation w M  N Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1, ,M Generate  d ~ D 1 (…) For each word n = 1, , N d generate w n ~ D 2 ( ¢ | θ d n ) Now pick your favorite distributions for D 1, D 2

Latent Dirichlet Allocation z w  M  N  For each document d = 1, ,M Generate  d ~ Dir(¢ |  ) For each position n = 1, , N d generate z n ~ Mult( ¢ |  d ) generate w n ~ Mult( ¢ |  z n ) “Mixed membership” K

LDA’s view of a document

LDA topics

Review - LDA Latent Dirichlet Allocation – Parameter learning: Variational EM – Numerical approximation using lower-bounds – Results in biased solutions – Convergence has numerical guarantees Gibbs Sampling – Stochastic simulation – unbiased solutions – Stochastic convergence

Review - LDA Gibbs sampling – Applicable when joint distribution is hard to evaluate but conditional distribution is known – Sequence of samples comprises a Markov Chain – Stationary distribution of the chain is the joint distribution Key capability: estimate distribution of one latent variables given the other latent variables and observed variables.

Why does Gibbs sampling work? What’s the fixed point? – Stationary distribution of the chain is the joint distribution When will it converge (in the limit)? – Graph defined by the chain is connected How long will it take to converge? – Depends on second eigenvector of that graph

Called “collapsed Gibbs sampling” since you’ve marginalized away some variables Fr: Parameter estimation for text analysis - Gregor Heinrich

Review - LDA Latent Dirichlet Allocation z w  M  N  Randomly initialize each z m,n Repeat for t=1,…. For each doc m, word n Find Pr(z mn =k|other z’s) Sample z mn according to that distr. “Mixed membership”

Outline Stochastic block models & inference question Review of text models – Mixture of multinomials & EM – LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models – Latent-space models, exchangeable graphs, p1, ERGM

Statistical Models of Networks Want a generative probabilistic model that’s amenable to analysis…. … but more expressive than Erdos-Renyi One approach: exchangeable graph model – Exchangeable: X1,X2 are exchangable if Pr(X1,X2,W)=Pr(X2,X1,W). – The generalizes of i.i.d.-ness – It’s a Bayesian thing

Review - LDA Motivation w M  N Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1, ,M Generate  d ~ D 1 (…) For each word n = 1, , N d generate w n ~ D 2 ( ¢ | θ d n ) Docs and words are exchangeable.

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks z p,z q are exchangeable zpzp zqzq a pq N2N2 zpzp N  p 

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks z p,z q are exchangeable zpzp zqzq a pq N2N2 zpzp N  p  Gibbs sampling: Randomly initialize z p for each node p. For t = 1… For each node p Compute z p given other z’s Sample z p See: Snijders & Nowicki, 1997, Estimation and Prediction for Stochastic Blockmodels for Groups with Latent Graph Structure

Mixed Membership Stochastic Block models pp qq zp.zp. z.qz.q a pq N2N2 pp N  p  Airoldi et al, JMLR 2008

Mixed Membership Stochastic Block models

Parkkinen et al paper

Another mixed membership block model

z=(zi,zj) is a pair of block ids n z = #pairs z q z1, i = #links to i from block z1 q z1,. = #outlinks in block z1 δ = indicator for diagonal M = #nodes

Another mixed membership block model

Outline Stochastic block models & inference question Review of text models – Mixture of multinomials & EM – LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models – Latent-space models, exchangeable graphs, p1, ERGM

Exchangeable Graph Model Defined by a 2 k x 2 k table q(b 1,b 2 ) Draw a length-k bit string b(n) like for each node n from a uniform distribution. For each pair of node n,m – Flip a coin with bias q(b(n),b(m)) – If it’s heads connect n,m complicated Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so u i ’s are correlated. Pass each u i thru a sigmoid so it’s in [0,1] – call that p i Pick b i using p i

Exchangeable Graph Model Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so u i ’s are correlated. Pass each u i thru a sigmoid so it’s in [0,1] – call that p i Pick b i using p i If α is big then ux,uy are really big (or small) so px,py will end up in a corner. 01 1

The p 1 model for a directed graph Parameters, per node i: – Θ: background edge probability – α i : “expansiveness” – how extroverted is i? – β i : “ popularity ” – how much do others want to be with i? – ρ i : “reciprocation” – how likely is i to respond to an incomping link with an outgoing one? Logistic-regression like procedure can be used to fit this to data from a graph

Exponential Random Graph Model Basic idea: – Define some features of the graph (e.g., number of edges, number of triangles, …) – Build a MaxEnt-style model based on these features

Latent Space Model Each node i has a latent position in Euclidean space, z(i) z(i)’s drawn from a mixture of Gaussians Probability of interaction between i and j depend on the distance between z(i) and z(j) Inference is a little more complicated… [Handcock & Raftery, 2007]

Outline Stochastic block models & inference question Review of text models – Mixture of multinomials & EM – LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models – Latent-space models, exchangeable graphs, p1, ERGM