Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará

Similar presentations


Presentation on theme: "Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará"— Presentation transcript:

1 Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait (lalsumai@gmu.edu) Daniel Barbará (dbarbara@gmu.edu) Carlotta Domeniconi (carlotta@cs.gmu.edu) Department of Computer Science George Mason University SDM 2009

2 Outline  Introduction and related work  Online LDA (OLDA)  Parameter Generation Sliding history window Contribution weights  Experiments  Conclusion and future work

3 Introduction  When a topic is observed at a certain time, it is more likely to appear in the future  previously discovered topics hold important information about the underlying structure of data  Incorporating such information in future knowledge discovery can enhance the inferred topics

4 Related Work  Q. Sun, R. Li et al. ACL 2008. LDA-based Fisher kernel to measure the text semantic similarity between blocks of LDA documents  X. Wang et al. ICDM 2007 Topical N-Gram model that automatically identified feasible N-grams based on the context that surround it  X. Phan et al. IW3C2 2008. a classifier on both a small set of labeled documents in addition to an LDA topic model estimated from Wikipedia.

5 Tracking Topics NdNd M t K z t i w t i  t  t  t  t t Time (time between t & t+1 = ε ) StSt Topic Evolution Tracking Priors Construction Emerging Topic Detection t t  t t+ 1 NdNd M t+ 1 K z i t+ 1 w i t+ 1  t+ 1  t+ 1  t+ 1  t+ 1 S t+ 1 Emerging Topic List  t+ 1  t+ 1 t+1 t+1  t + 1 Online LDA (OLDA)

6 Inference Process Current stream Historic observations  Parameter Generation  Simple inference problem Gibbs Sampling Current stream Historic observations

7 Topic Evolution Tracking  Topic alignment over time  Handles changes in lexicon, topic drift Topic 1 (0.65)Bank (0.44), money (0.35), loan (0.21) Topic 2 (0.35)Factory (0.53), production (0.34), labor (0.13) Topic 1 (0.43)Bank (0.5), credit (0.32), money (0.18) Topic 2 (0.57)Factory (0.48), cost (0.32), manufacturing (0.2) t Time t+ 1 P(topic) P(word|topic) Aligned topics over time

8 Sliding History Window  Consider all topic-word distributions within a “ sliding history window ” (δ)  Alternatives for keeping track of history at time t full memory, δ= t short memory, δ=1 Intermediate memory, δ= c  Matrix Evolution Matrix Dictionary Topic distribution over time

9 Contribution Control  Evolution Tuning Parameters ω Individual weights of models  Decaying history: ω 1 < ω 2 < … < ω δ  Equal contributions: ω 1 = ω 2 = … = ω δ Total weight of history (vs. weight of new observations)  Balanced weights (sum=1)  Biased toward the past (sum>1)  Biased toward the future (sum<1)

10 Parameter Generation  Priors of Topic distribution over words at time t+1  Generate topic distribution

11 Experimental Design  “Matlab Topic Modeling Toolbox”, by Mark Steyvers and Tom Griffiths  Datasets: NIPS  Proceedings from 1988-2000  1,740 papers, 13,649 unique words, 2,301,375 word tokens  13 streams, size from 90 to 250 doc’s per stream Reuters-21578  News from 26-FEB-1987 to 19-OCT-1987  10,337 documents; 12,112 unique words; 793,936 word tokens  30 streams (29/340 doc’s, 1/517 doc’s)  Baselines: OLDAfixed: no memory OLDA (ω(1) ): short memory  Performance Evaluation measure: Perplexity Test set: documents of next year or stream

12 Reuters OLDA with fixed β vs. OLDA with semantic β No memory

13 Reuters OLDA with different window size and weights Increasing window size enhanced prediction Incremental history information (δ>1,sum>1) did not improve topic estimation at all Increase window size short memory Equal contribution Incremental History Information

14 NIPS OLDA with Different Window No memory Short memory Increasing window size enhanced prediction w.r.t. short memory Window size greater than 3 enhanced prediction Effect of total weight

15 NIPS OLDA with Different Total Weight No memory Sum of weight = 1 Decrease sum of weights  Models with lower total weight resulted in better prediction

16 NIPS & Reuters OLDA with Different Total Weight Variable sum(ω) δ = 2 Decrease total sum of weights Increase total sum of weights

17 NIPS OLDA with Equal vs Decaying History Contribution

18 Conclusions  the effect of embedding semantic information in LDA topic modeling of text streams  Parameter generation based on topical structures inferred in the past  Semantic embedding enhances OLDA prediction  Effect of Total influence of history, History window size, and Equal or decaying contributions  Future work use of prior-knowledge effect of embedded historic semantics on detecting emerging and/or periodic topics


Download ppt "Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará"

Similar presentations


Ads by Google