Online Multiscale Dynamic Topic Models

Slides:

Advertisements

Similar presentations

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Advertisements

Supervised Learning Recap

Statistical Topic Modeling part 1

Advanced Artificial Intelligence

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Generative Topic Models for Community Analysis

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Latent Dirichlet Allocation a generative model for text

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

British Museum Library, London Picture Courtesy: flickr.

Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Example 16,000 documents 100 topic Picked those with large p(w|z)

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Introduction Many decision making problems in real life

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Online Learning for Latent Dirichlet Allocation

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

CS Statistical Machine learning Lecture 24

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

Latent Dirichlet Allocation

Performance Comparison of Speaker and Emotion Recognition

Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.

Conditional Random Fields for ASR

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Statistical Models for Automatic Speech Recognition

CS 188: Artificial Intelligence Spring 2007

Hierarchical POMDP Solutions

Statistical Models for Automatic Speech Recognition

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

Bayesian Inference for Mixture Language Models

Stochastic Optimization Maximization for Latent Variable Models

Topic models for corpora and for graphs

Michal Rosen-Zvi University of California, Irvine

CS 188: Artificial Intelligence Fall 2008

Topic models for corpora and for graphs

Topic Models in Text Processing

Speech recognition, machine learning

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

Unsupervised Perceptual Rewards For Imitation Learning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

GhostLink: Latent Network Inference for Influence-aware Recommendation

Speech recognition, machine learning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Online Multiscale Dynamic Topic Models Best Research Paper Award Honorable Mention Online Multiscale Dynamic Topic Models Tomoharu Iwata Yasushi Sakurai Takeshi Yamada Naonori Ueda NTT Communication Science Laboratories Japan

Introduction Topic models for analyzing document dynamics Models Dynamic topic model [Blei+06] Topic over time [Wang+06] Dynamic mixture model [Wei+07] Topic tracking model [Iwata+09] Data scientific papers news articles blog e-mail

Multiscale dynamics Topics naturally evolve with multiple timescales Example: Politics topic in news articles Many years constitution, congress, president Tens of years names of members in Congress A few days names of bills under discussion Long timescale Middle timescale Short timescale

Proposed model Multiscale Dynamic Topic Model (MDTM) Topic model for analyzing dynamics with multiple timescales Robust Information loss is reduced by considering short and long timescale dynamics Efficient online inference The model is updated using only newly obtained data Past data need not to be stored

Standard topic model Graphical model Latent Dirichlet Allocation (LDA) [Blei+03] Basis of the proposed model A document is modeled as a mixture of topics Word distribution is generated from a symmetric Dirichlet No dynamics Dirichlet topic proportions Multinomial topic #docs word Multinomial word distribution #words ○：latent variable ●：observed variable □：repetition Dirichlet Graphical model #topics

Multiscale word distribution word distribution at scale s (from t-2 to t-1) s-1 generated depending on weighted sum of multiscale distributions long-scale word distribution at t-1 word distribution at t short-scale word distribution at t-1

Generative process of MDMT Gamma (of documents at epoch t ) Dirichlet topic proportions prior Word distribution is generated depended on the weighted sum of multiscale distributions Topic proportions’ prior is generated depended on the previous value Multinomial Multinomial weight Dirichlet multiscale word dist. * ξ t-1: hyper-parameter #scales

Online inference Update the model at each epoch using Stochastic EM the newly obtained data the previous model Stochastic EM [E-step] collapsed Gibbs sampling of latent topics [M-step] maximum joint likelihood of parameters model model data data t t+1

Estimation of multiscale distribution Maximum likelihood estimate word probability of scale s word count of scale s word count of scale s word count at epoch t’ word count at epoch t t-2 +1 s-1 t-1 t

Estimation of multiscale distribution Maximum likelihood estimate Online update Required memory word probability of scale s word count of scale s word count of scale s word count at epoch t’ word count at epoch t t-2 +1 s-1 t-1 t current word count first word count in the scale current value previous value

Approximated efficient estimation of multiscale word distribution Decrease update frequency for long-scale dist. Store only the previous epoch count

Approximated efficient estimation of multiscale word distribution Decrease update frequency for long-scale dist. Store only the previous epoch count Required memory count of s=3 t=4 t=5 t=6 t=7 t=8 scale=3 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8 scale=2 3 4 3 4 5 6 5 6 7 8 scale=1 4 5 6 7 8 count at t=3 5 6 7 8 update [s=1] update [s=1,2] update [s=1] update [s=1,2,3] newly obtained count

Experiments Data sets NIPS: papers in NIPS from 1987 to 1999 PNAS: titles in PNAS from 1915 to 2005 Digg: blog posts in social news site Digg from 1/29 to 2/20 in 2000 Addresses: the State of the Union addresses from 1790 to 2002 Methods MDTM: Online Multiscale Dynamic Topic Model DTM: Online Dynamic Topic Model (MDTM with #scales=1) LDAall: LDA that uses all past data for inference LDAone: LDA that uses just the current data for inference LDAonline: LDA with online inference

Average perplexity MDTM (standard deviation) MDTM can appropriately model the dynamics through its use of multiscale properties DTM does not model the long-timescale dependencies LDAall and LDAonline do not model the dynamics LDAone ignores the past information

Perplexity with different #scales Digg Addresses #scales #scales Perplexities decreased as #scales increased indicates the importance of considering multiscale dynamics

Estimated weights for each scale Addresses weight （λ） Digg scale scale Weights decreased as the timescale lengthened recent short-scale distributions are more informative for estimating current distribution

Topic extraction 1 (NIPS) Speech recognition topic speech recognition word speaker training set tdnn time test speakers 1992 - 1999 system data letter state letters neural utterance words phoneme classification state hmm system probabilities model words context hmms markov probability 1992 - 1995 1996 - 1999 level phonetic segmentation language segment accuracy duration continuous unit male spectral feature false acoustic independent models normalization rate trained gradient log likelihood models sequence sequences hidden hybrid states frame transition hidden states models feature continuous modeling features adaption human acoustic 1992 - 1993 1994-1995 1996-1997 1998 - 1999

Topic extraction 2 (NIPS) Reinforcement learning topic learning state control action time policy reinforcement optimal actions recognition 1992 - 1999 dynamic space model exploration states programming barto sutton goal task function state algorithm model agent decision step reward markov space 1992 - 1995 1996 - 1999 robot based controller system forward level memory real jordan world skills policies singh adaptive iteration stochastic transition values expected based grid based memory controller continuous cost system temporal iteration interpolation rl machine policies environment iteration mdp singh finite update search 1992 - 1993 1994-1995 1996-1997 1998 - 1999

Conclusion Topic model with multiscale dynamics Efficient online inference Experimentally confirmed the high predictive performance Future work Automatic determination of length of scale, and #topics Evaluation on other data, such as web access log, blog, e-mail