Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Clustering-Projection Model for Discrete Data

Similar presentations


Presentation on theme: "Probabilistic Clustering-Projection Model for Discrete Data"— Presentation transcript:

1 Probabilistic Clustering-Projection Model for Discrete Data
Shipeng Yu1,2, Kai Yu2, Volker Tresp2, Hans-Peter Kriegel1 1Institute for Computer Science, University of Munich 2Siemens Corporate Technology, Munich, Germany October 2005

2 Outline Motivation Previous Work The PCP Model Learning in PCP Model
Experiments Conclusion and Future Work

3 Motivation We model discrete data in this work Properties w ¢ d .
Fundamental problem for data mining and machine learning In “bag-of-words” document modelling: document-word pairs In collaborative filtering: item-rating pairs Properties The data can be described as a big matrix with integer entries The data matrix is normally very sparse (>90% are zeros) Words w 1 2 V d . D Documents Occurrences

4 Data Clustering Goal: Group similar documents together
For continuous data: Distance-based similarity (k-means) Iteratively minimize a distance-based cost function Equivalent to a Gaussian mixture model For discrete data: Occurrence-based similarity Similar documents should have similar occurrences of words No Gaussianity holds for discrete data w 1 2 V d . D

5 Data Projection Goal: Find a low-dimensional feature mapping
For continuous data: Principal Component Analysis Find orthogonal dimensions to explain data covariance For discrete data: Topic detection Topics explain the co-occurrences of words Topics are not orthogonal, but independent z 1 K w 1 2 V d . D

6 Projection versus Clustering
They are normally modelled separately But why not jointly? More informative projection  better document clusters Better clustering structure  better projection for words There should be a stable situation And how? PCP Model Well-defined generative model for the data Standard ways for learning and inference Generalizable to new data w 1 2 V d . D z K

7 Previous Work for Discrete Data
Projection model Clustering model PLSI [Hofmann 99] First topic model Not well-defined generative model LDA [Blei et al 03] State-of-the-art topic model Generalize PLSI with Dirichlet prior No clustering effect is modelled NMF [Lee & Seung 99] Factorize the data matrix Can be explained as a clustering model No projection of words is directly modelled w 1 2 V d . D z K Joint Projection-Clustering model Two-sided clustering [Hofmann & Puzicha 98]: Same problem as PLSI Discrete-PCA [Buntine & Perttu 03]: Similar to LDA in spirit TTMM [Keller & Bengio 04]: Lack a full Bayesian explanation w 1 2 V d . D

8 PCP Model: Overview Probabilistic Clustering-Projection Model
A probabilistic model for discrete data A clustering model using projected features A projection model with structural data Learning in PCP model: Variational EM Exactly equivalent to iteratively performing clustering and projection operations Guaranteed convergence

9 PCP Model: Sampling Process
Clustering Multinomial Projection Multinomial Dirichlet topic z 1 ; N …... word …... w 1 ; N document w 1 P r o j e c t i n cluster 1 Weights Cluster centers Multinomial …... …... …... …... topic z D ; N 1 …... word …... w D ; N 1 cluster D document w D Dirichlet D documents M clusters K topics V words Clustering model using projected features Projection model with structural data

10 PCP Model: Plate Model Likelihood Model Parameters Latent Variables
Observations Clustering Model Projection Model

11 Learning in PCP Model We are interested in the posterior distribution
The integral is intractable Variational EM learning Approximate the posterior with a variational distribution Minimize the KL-divergence Variational E-step: Minimize w.r.t. variational parameters Variational M-step: Minimize w.r.t. model parameters Iterate until convergence Variational Parameters Dirichlet Dirichlet Multinomial Multinomial D K L ( q j ^ p )

12 Update Equations Equations can be separated to clustering updates and projection updates Variational EM learning corresponds to iteratively performing clustering and projection until convergence Clustering Updates Projection Updates

13 Sufficient Projection term
Clustering Updates Sufficient Projection term Update soft cluster assignments, P ( c d = m ) Prior term Prior term Likelihood term Likelihood term Update cluster centers Update cluster weights

14 Sufficient Clustering term
Projection Updates Sufficient Clustering term Update word projection, P ( z d ; n = k ) Empirical estimate Update projection matrix

15 PCP Learning Algorithm
Sufficient Projection term Sufficient Clustering term Clustering Updates Projection Updates

16 Experiments Methodology Data sets Preprocessing
Document Modelling: Compare model generalization Word Projection: Evaluate topic space Document Clustering: Evaluate clustering results Data sets 5 categories in Reuters-21578: 3948 docs, 7665 words 4 categories in 20Newsgroup: 3888 docs, 8396 words Preprocessing Stemming and stop-word removing Pick up words that occur at least in 5 documents

17 Case Study Run on a 4-group subset of 20Newsgroup data Car Bike
Baseball Hockey

18 Exp1: Document Modelling
Goal: Evaluate generalization performance Methods to compare PLSI: A “pseudo” form for generalization LDA: State-of-the-art method Metric: Perplexity 90% for training and 10% for testing P e r p ( D t s ) = x d l n w j

19 Exp2: Word Projection Goal: Evaluate the projection matrix ¯
Methods to compare: PLSI, LDA We train SVMs on the 10-dimensional space after projection Test classification accuracy on leave-out data Reuters Newsgroup

20 Exp3: Document Clustering
Goal: Evaluate clustering for documents Methods to compare NMF: Do factorization for clustering LDA+k-means: Do clustering on the projected space Metric: normalized mutual information

21 Conclusion PCP is a well-defined generative model
PCP models clustering and projection jointly Learning in PCP corresponds to an iterative process of clustering and projection PCP learning guarantees convergence Future work Large scale experiments Build a probabilistic model with more factors

22 Thank you! Questions?


Download ppt "Probabilistic Clustering-Projection Model for Discrete Data"

Similar presentations


Ads by Google