Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará

Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait (lalsumai@gmu.edu) Daniel Barbará (dbarbara@gmu.edu) Carlotta Domeniconi (carlotta@cs.gmu.edu) Department of Computer Science George Mason University SDM 2009

Outline  Introduction and related work  Online LDA (OLDA)  Parameter Generation Sliding history window Contribution weights  Experiments  Conclusion and future work

Introduction  When a topic is observed at a certain time, it is more likely to appear in the future  previously discovered topics hold important information about the underlying structure of data  Incorporating such information in future knowledge discovery can enhance the inferred topics

Related Work  Q. Sun, R. Li et al. ACL 2008. LDA-based Fisher kernel to measure the text semantic similarity between blocks of LDA documents  X. Wang et al. ICDM 2007 Topical N-Gram model that automatically identified feasible N-grams based on the context that surround it  X. Phan et al. IW3C2 2008. a classifier on both a small set of labeled documents in addition to an LDA topic model estimated from Wikipedia.

Tracking Topics NdNd M t K z t i w t i  t  t  t  t t Time (time between t & t+1 = ε ) StSt Topic Evolution Tracking Priors Construction Emerging Topic Detection t t  t t+ 1 NdNd M t+ 1 K z i t+ 1 w i t+ 1  t+ 1  t+ 1  t+ 1  t+ 1 S t+ 1 Emerging Topic List  t+ 1  t+ 1 t+1 t+1  t + 1 Online LDA (OLDA)

Inference Process Current stream Historic observations  Parameter Generation  Simple inference problem Gibbs Sampling Current stream Historic observations

Topic Evolution Tracking  Topic alignment over time  Handles changes in lexicon, topic drift Topic 1 (0.65)Bank (0.44), money (0.35), loan (0.21) Topic 2 (0.35)Factory (0.53), production (0.34), labor (0.13) Topic 1 (0.43)Bank (0.5), credit (0.32), money (0.18) Topic 2 (0.57)Factory (0.48), cost (0.32), manufacturing (0.2) t Time t+ 1 P(topic) P(word|topic) Aligned topics over time

Sliding History Window  Consider all topic-word distributions within a “ sliding history window ” (δ)  Alternatives for keeping track of history at time t full memory, δ= t short memory, δ=1 Intermediate memory, δ= c  Matrix Evolution Matrix Dictionary Topic distribution over time

Contribution Control  Evolution Tuning Parameters ω Individual weights of models  Decaying history: ω 1 < ω 2 < … < ω δ  Equal contributions: ω 1 = ω 2 = … = ω δ Total weight of history (vs. weight of new observations)  Balanced weights (sum=1)  Biased toward the past (sum>1)  Biased toward the future (sum<1)

Parameter Generation  Priors of Topic distribution over words at time t+1  Generate topic distribution

Experimental Design  “Matlab Topic Modeling Toolbox”, by Mark Steyvers and Tom Griffiths  Datasets: NIPS  Proceedings from 1988-2000  1,740 papers, 13,649 unique words, 2,301,375 word tokens  13 streams, size from 90 to 250 doc’s per stream Reuters-21578  News from 26-FEB-1987 to 19-OCT-1987  10,337 documents; 12,112 unique words; 793,936 word tokens  30 streams (29/340 doc’s, 1/517 doc’s)  Baselines: OLDAfixed: no memory OLDA (ω(1) ): short memory  Performance Evaluation measure: Perplexity Test set: documents of next year or stream

Reuters OLDA with fixed β vs. OLDA with semantic β No memory

Reuters OLDA with different window size and weights Increasing window size enhanced prediction Incremental history information (δ>1,sum>1) did not improve topic estimation at all Increase window size short memory Equal contribution Incremental History Information

NIPS OLDA with Different Window No memory Short memory Increasing window size enhanced prediction w.r.t. short memory Window size greater than 3 enhanced prediction Effect of total weight

NIPS OLDA with Different Total Weight No memory Sum of weight = 1 Decrease sum of weights  Models with lower total weight resulted in better prediction

NIPS & Reuters OLDA with Different Total Weight Variable sum(ω) δ = 2 Decrease total sum of weights Increase total sum of weights

NIPS OLDA with Equal vs Decaying History Contribution

Conclusions  the effect of embedding semantic information in LDA topic modeling of text streams  Parameter generation based on topical structures inferred in the past  Semantic embedding enhances OLDA prediction  Effect of Total influence of history, History window size, and Equal or decaying contributions  Future work use of prior-knowledge effect of embedded historic semantics on detecting emerging and/or periodic topics

Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará

Similar presentations

Presentation on theme: "Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará

Similar presentations

Presentation on theme: "Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará"— Presentation transcript:

Similar presentations

About project

Feedback