Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic models for corpora and for graphs

Similar presentations


Presentation on theme: "Topic models for corpora and for graphs"— Presentation transcript:

1 Topic models for corpora and for graphs

2 More “real” network datasets
More experiments More “real” network datasets

3 More experiments

4

5 More experiments Varying the number of clusters for PIC and PIC4 (starts with random 4-d point rather than a random 1-d point).

6 More experiments Varying the amount of noise for PIC and PIC4 (starts with random 4-d point rather than a random 1-d point).

7 Social graphs seem to have
Motivation Social graphs seem to have some aspects of randomness small diameter, giant connected components,.. some structure homophily, scale-free degree dist?

8 “Stochastic block model”, aka “Block-stochastic matrix”:
More terms “Stochastic block model”, aka “Block-stochastic matrix”: Draw ni nodes in block i With probability pij, connect pairs (u,v) where u is in block i, v is in block j Special, simple case: pii=qi, and pij=s for all i≠j

9 Not? football

10 Not? books Artificial graph

11 Stochastic blockmodel graphs
Alternative to spectral methods: can you directly maximize Pr(structure,parameters|data) for a reasonable model?

12 We’ve looked in depth at one generative model so far….
The next big question Alternative to spectral methods: can you directly maximize Pr(structure,parameters|data) for a reasonable model? How expensive is it? How well does it work? What’s the “reasonable model”? and how reasonable does it have to be? We’ve looked in depth at one generative model so far….

13 Outline Stochastic block models & inference question
Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs

14 Review – supervised Naïve Bayes
Naïve Bayes Model: Compact representation C C ….. WN W1 W2 W3 W M N b M b

15 Review – supervised Naïve Bayes
Multinomial Naïve Bayes For each document d = 1,, M Generate Cd ~ Mult( .| ) For each position n = 1,, Nd Generate wn ~ Mult(.|,Cd) C ….. WN W1 W2 W3 M b

16 Review – supervised Naïve Bayes
Multinomial naïve Bayes: Learning Maximize the log-likelihood of observed variables w.r.t. the parameters: Convex function: global optimum Solution:

17 Review – unsupervised Naïve Bayes
Mixture model: unsupervised naïve Bayes model Joint probability of words and classes: But classes are not visible: Z C W N M b

18 Review – unsupervised Naïve Bayes
Mixture model: learning Not a convex function No global optimum solution Solution: Expectation Maximization Iterative algorithm Finds local optimum Guaranteed to maximize a lower-bound on the log-likelihood of the observed data

19 Review – unsupervised Naïve Bayes
Mixture model: EM solution P(label for d is k|…) E-step: Pr(label k|γ’s) M-step: Pr(word w|k,γ’s) Key capability: estimate distribution of latent variables given observed variables

20 Review - LDA

21 Review - LDA Motivation 
Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( ¢ | θdn) Now pick your favorite distributions for D1, D2 w N M

22 Latent Dirichlet Allocation
“Mixed membership” Latent Dirichlet Allocation For each document d = 1,,M Generate d ~ Dir(¢ | ) For each position n = 1,, Nd generate zn ~ Mult( . | d) generate wn ~ Mult( . | zn) a z w N M K

23 LDA’s view of a document

24 LDA topics

25 Latent Dirichlet Allocation
Review - LDA Latent Dirichlet Allocation Parameter learning: Variational EM Numerical approximation using lower-bounds Results in biased solutions Convergence has numerical guarantees Gibbs Sampling Stochastic simulation unbiased solutions Stochastic convergence

26 Review - LDA Gibbs sampling
Applicable when joint distribution is hard to evaluate but conditional distribution is known Sequence of samples comprises a Markov Chain Stationary distribution of the chain is the joint distribution Key capability: estimate distribution of one latent variables given the other latent variables and observed variables.

27 Why does Gibbs sampling work?
What’s the fixed point? Stationary distribution of the chain is the joint distribution When will it converge (in the limit)? Graph defined by the chain is connected How long will it take to converge? Depends on second eigenvector of that graph

28

29 (k) Called “collapsed Gibbs sampling” since you’ve marginalized away some variables Fr: Parameter estimation for text analysis - Gregor Heinrich

30 Latent Dirichlet Allocation
Review - LDA “Mixed membership” Latent Dirichlet Allocation Randomly initialize each zm,n Repeat for t=1,…. For each doc m, word n Find Pr(zmn=k|other z’s) Sample zmn according to that distr. a 30? 100? z w N M (k)

31 Outline Stochastic block models & inference question
Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models Latent-space models, exchangeable graphs, p1, ERGM

32 Review - LDA Motivation 
Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( ¢ | θdn) Docs and words are exchangeable. w N M

33 Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a b zp zp zq p apq N N2

34 Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a Gibbs sampling: Randomly initialize zp for each node p. For t = 1… For each node p Compute zp given other z’s Sample zp b zp zp zq p apq N N2 See: Snijders & Nowicki, 1997, Estimation and Prediction for Stochastic Blockmodels for Groups with Latent Graph Structure

35 Mixed Membership Stochastic Block models
q p zp. z.q p apq N N2 Airoldi et al, JMLR 2008

36 originally - PSK MLG 2009

37 Another mixed membership block model

38 Another mixed membership block model
z=(zi,zj) is a pair of block ids nz = #pairs z qz1,i = #links to i from block z1 qz1,. = #outlinks in block z1 δ = indicator for diagonal M = #nodes

39 Another mixed membership block model

40 Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010
+ lots of synthetic data Balasubramanyan, Lin, Cohen, NIPS w/s 2010

41

42

43 Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010

44 Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010

45 Much faster to learn from a large sparse graph than the MMSB model
Other observations… Much faster to learn from a large sparse graph than the MMSB model Easy to combine with other generative components E.g., in order to model documents + graph structure


Download ppt "Topic models for corpora and for graphs"

Similar presentations


Ads by Google