{bojan.furlan, jeca, 1/42 Probabilistic Graphical Models For Text Mining: A Topic Modeling Survey V. Jelisavčić*, B.

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

Xiaolong Wang and Daniel Khashabi

Hierarchical Dirichlet Process (HDP)

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Exact Inference in Bayes Nets

Dynamic Bayesian Networks (DBNs)

Supervised Learning Recap

Probabilistic Clustering-Projection Model for Discrete Data

Statistical Topic Modeling part 1

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Generative Topic Models for Community Analysis

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Latent Dirichlet Allocation a generative model for text

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Scalable Text Mining with Sparse Generative Models

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Anomaly detection with Bayesian networks Website: John Sandiford.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Example 16,000 documents 100 topic Picked those with large p(w|z)

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Computer vision: models, learning and inference Chapter 19 Temporal models.

Text Classification, Active/Interactive learning.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

Introduction to Bayesian Networks

Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Integrating Topics and Syntax -Thomas L

Randomized Algorithms for Bayesian Hierarchical Clustering

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

Stick-Breaking Constructions

CS Statistical Machine learning Lecture 24

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Latent Dirichlet Allocation

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Gaussian Processes For Regression, Classification, and Prediction.

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Multi-target Detection in Sensor Networks Xiaoling Wang ECE691, Fall 2003.

Link Distribution on Wikipedia [0407]KwangHee Park.

Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Statistical Models for Automatic Speech Recognition

CSCI 5822 Probabilistic Models of Human and Machine Learning

Latent Dirichlet Analysis

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Bayesian Inference for Mixture Language Models

Stochastic Optimization Maximization for Latent Variable Models

Michal Rosen-Zvi University of California, Irvine

Junghoo “John” Cho UCLA

Topic Models in Text Processing

Multivariate Methods Berlin Chen

Presentation transcript:

{bojan.furlan, jeca, 1/42 Probabilistic Graphical Models For Text Mining: A Topic Modeling Survey V. Jelisavčić*, B. Furlan **, J. Protić**, V. Milutinović** * Mathematical Institute of the Serbian Academy of Sciences and Arts/11000, Belgrade, Serbia ** Department of Computer Engineering, School of Electrical Engineering, University of Belgrade/11000, Belgrade, Serbia {bojan.furlan, jeca,

{bojan.furlan, jeca, 2/42 Summary Introduction to topic models Theoretical introduction – Probabilistic graphical models: basics – Inference in graphical models – Finding topics with PGM Classification of topic models – Classification method – Examples and applications Conclusion and ideas for future research

{bojan.furlan, jeca, 3/42 Introduction to topic models How do we define “topic”? – Group of words that frequently co-occur together – Context – Semantics? Why modeling topics? – Soft clustering of text – Similar documents -> similar topics – Machine learning from text: where to start? What features to use? Dimensionality of a million (or billion) words corpus How to use additional features alongside pure text

{bojan.furlan, jeca, 4/42 Introduction to topic models How to deal with uncertainty in natural language: – Probabilistic approach Comparison with language models: – Short distance vs long distance dependence – Local vs Global – Sequence vs Bag

{bojan.furlan, jeca, 5/42 Introduction to topic models Topic modeling in a nutshell: Text + (PG Model + Inference algorithm) -> Topics

{bojan.furlan, jeca, 6/42 Probabilistic graphical models: basics Modeling the problem: Start with the variable space Uncertainty through probability Basic elements: – Observed variables – Latent variables – Priors – Parameters

{bojan.furlan, jeca, 7/42 Probabilistic graphical models: basics Too many variables -> Too many dimensions in variable space Dimension reduction through independence assumption – Representing independencies using graphs

{bojan.furlan, jeca, 8/42 Probabilistic graphical models: basics Marginal & Conditional independence: knowing the difference Goals: – Learn full probability distribution from observed data – Find marginal distribution over some subset of variables – Find most likely value of a specific variable

{bojan.furlan, jeca, 9/42 Inference and learning in graphical models Likelihood: Max likelihood estimation Max a posteriori estimation Max margin estimation

{bojan.furlan, jeca, 10/42 Inference and learning in graphical models Goal: Learn the value of the latent variables using the given data (observed variables): – What are the most probable values of latent variables? Values with highest likelihood given the evidence! – Going step further (full bayesian approach): What are the most probable distributions of lat. var.? Use prior distributions!

{bojan.furlan, jeca, 11/42 Inference and learning in graphical models If there are no latent variables, learning is simple – Likelihood is concave function, finding max is trivial If there are latent variable, things tend to get more complicated – Sometimes learning is intractable To calculate the normalizing const for the likelihood, sum (or integration) over all possible values must be done – Approximation algorithms are required

{bojan.furlan, jeca, 12/42 Inference and learning in graphical models Expectation Maximization Markov Chain Monte Carlo (Gibbs sampling) Variational Inference Kalman Filtering

{bojan.furlan, jeca, 13/42 Finding topics with PGM I.I.D. – Bag of words (de Finetti theorem) Representing semantics using probability: dimensionality reduction

{bojan.furlan, jeca, 14/42 Finding topics with PGM Variables: documents, words, topics – Observed: words, documents – Latent: topics, topic assignment to words Documents contain words Topics are sets of words that frequently co-occur together (context)

{bojan.furlan, jeca, 15/42 Finding topics with PGM Soft clustering: – Documents contain multiple topics – Each topic can be found in multiple documents  Each document has its own distribution over topics – Topics contain multiple word types – Each word type can be found in multiple topics (with different probability)  Each topic has its own distribution over word types

{bojan.furlan, jeca, 16/42 Finding topics with PGM Probabilistic semantic indexing:

{bojan.furlan, jeca, 17/42 Finding topics with PGM Soft clustering: – Documents contain multiple topics – Each topic can be found in multiple documents  Each document has its own distribution over topics – Topics contain multiple word types – Each word type can be found in multiple topics (with different prob.)  Each topic has its own distribution over word types Number of parameters to learn should be independent of the total document number – Avoid overfitting – Solution: using priors! Each word token in document comes from a specific topic  Each word token should have its own topic identifier assigned

{bojan.furlan, jeca, 18/42 Finding topics with PGM Adding the priors: LDA

{bojan.furlan, jeca, 19/42 Finding topics with PGM Advantages of using PGMs: – Extendable Add more features to the model easily Use different prior distributions Incorporate other forms of knowledge alongside text – Modular Lessons learned in one model can easily be adopted by the other – Widely applicable Topics can be used to augment solutions to various existing problems

{bojan.furlan, jeca, 20/42 Classification Relaxing the exchangeability assumption: – Document relations Time Links – Topic relations Correlations Sequence – Word relations Intra-document (Sequentiality) Inter-document (Entity recognition)

{bojan.furlan, jeca, 21/42 Classification Modeling with additional data: – Document features Sentiment Authors – Topic features Labels – Word features Concepts

{bojan.furlan, jeca, 22/42 Classification

{bojan.furlan, jeca, 23/42 Examples and applications: Document relations In base model (LDA) documents are exchangeable (document exchangeability assumption) By removing this assumption, we can build more complex model More complex model -> New (more specific) applications Two types of document relations: a)Sequential (time) b)Networked (links, citations, references…)

{bojan.furlan, jeca, 24/42 Examples and applications Modeling time: topic detection and tracking – Trend detection: What was popular? What will be popular? – Event detection: Something important has happened – Topic tracking: Evolution of a specific topic

{bojan.furlan, jeca, 25/42 Examples and applications Modeling time: two approaches – Markov dependency Short-distance Dynamic Topic Model – Time as additional feature Long-distance Topics-Over-Time

{bojan.furlan, jeca, 26/42 Examples and applications

{bojan.furlan, jeca, 27/42 Examples and applications Modeling document networks: – Web (documents with hyperlinks) – Messages (documents with senders and recipients) – Scientific papers (documents and citations)

{bojan.furlan, jeca, 28/42 Examples and applications: Topic relations In base model (LDA) topics are “exchangeable” (topic exchangeability assumption) By removing this assumption, we can build more complex model More complex model -> New (more specific) applications Two types of topic relations: a)Correlations (topic hierarchy, similarity,…) b)Sequence (linear structure of text)

{bojan.furlan, jeca, 29/42 Examples and applications Topic correlations: – Instead of finding “flat” topic structure: Topic hierarchy: super-topics and sub-topics Topic correlation matrix Arbitrary DAG structure Topic sequence: – Sequential nature of the human language: Text is written from beginning to the end Topics in latter chapters of the text tend to depend on previous Markov property

{bojan.furlan, jeca, 30/42 Examples and applications

{bojan.furlan, jeca, 31/42 Examples and applications: Word relations In base model (LDA) words are “exchangeable” (word exchangeability assumption) By removing this assumption, we can build more complex model More complex model -> New (more specific) applications Two types of word relations: a)Intra-document (word sequence) b)Inter-document (entity recognition, multilinguality…)

{bojan.furlan, jeca, 32/42 Examples and applications Intra-document word relations: – Sequential nature of text: Modeling phrases and n-grams Markov property Inter-document word relations: – Some words can be treated as special entities Not sufficiently investigated – Multilingual models Harnessing multiple languages Bridging the language gap

{bojan.furlan, jeca, 33/42 Examples and applications

{bojan.furlan, jeca, 34/42 Examples and applications Relieving the aforementioned exchangeability assumptions is not the only way to extend the LDA model to new problems and more complex domains Extension can be made by utilizing additional features on any of the three levels (document, topic, word) Combining different features from different domains can solve new compound problems (eg. time-evolution of topic hierarchies)

{bojan.furlan, jeca, 35/42 Examples and applications Examples of models with additional features on document level: – Author topic models – Group topic models – Sentiment topic models – Opinion topic models

{bojan.furlan, jeca, 36/42 Examples and applications

{bojan.furlan, jeca, 37/42 Examples and applications Examples of models with additional features on topic level: – Supervised topic models – Segmentation topic models

{bojan.furlan, jeca, 38/42 Examples and applications Examples of models with additional features on word level: – Concept topic models – Entity disambiguation topic models

{bojan.furlan, jeca, 39/42 Examples and applications Using simple additional features sometimes is not enough: – How to implement knowledge? Complex set of features (with their dependencies) Markov logic networks? Incorporate knowledge through priors Room for improvement! Number of parameters is often not known in advance: – How many topics are there in a corpus? Solution: non-parametric distributions Dirichlet process (Chinese restaurant process, Stick-breaking process, Pitman-Yor process, Indian buffet process….)

{bojan.furlan, jeca, 40/42 Examples and applications

{bojan.furlan, jeca, 41/42 Conclusion and Ideas for Future Research Extending the “word” side of topic models (e.g., harnessing morphology): Stem LDA Combining existing topic modeling paradigms on new problems New topic representations (using ontology triplets instead of simple terms)

{bojan.furlan, jeca, 42/42 THE END THANK YOU FOR YOUR TIME. Probabilistic Graphical Models For Text Mining: A Topic Modeling Survey {bojan.furlan, jeca,