Presentation is loading. Please wait.

Presentation is loading. Please wait.

Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,

Similar presentations


Presentation on theme: "Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,"— Presentation transcript:

1 Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai, Jiafeng Guo, Yanyan Lan, Xueqi Cheng

2 Outline Introduction Our approach Experimental results Conclusion

3 Introduction

4 Background

5 Previous work No local geometric regularization None or global regularization only e.g. SVD, PLSA, LDA, NMF, etc. Over-fitting & poor generalization Pairwise Neighborhood Smoothing Increasing the low dimensional affinity over nearby document pairs e.g. LapPLSA, LTM, DTM, etc. Losing the geometric information among pairs, especially in unbalanced document distribution Heuristic similarity measure & neighbors Empirical similarity threshold and neighbor numbers e.g. LapPLSA, LTM Improper similarity measure or number of neighbors hurts the representation A new low dimensional representation mining method by better exploiting the geometric relationship among documents

6 Our approach Basic ideas Factorizing document-word matrix in NMF way Mining low dimensional semantic representation Modeling document’s relationships with local linear combination Preserving rich local geometric informationSelecting neighbors without similarity measure and threshold

7 Local Linear Matrix Factorization(LLMF) min

8 Cont’ min

9 Graphic Model of LLMF

10 LLMF vs Others Comparing models without geometric information E.g. NMF, PLSA, LDA LLMF smoothes document representation with its neighbors Comparing models with geometric constraints E.g. LapPLSA, LTM LLMF is free of similarity measure and neighborhood threshold LLMF is more robust in preserving local geometric structure in unbalanced data distribution

11 Model fitting

12 Experimental Settings Data set 20news & la1(from Weka) Word Stemming Stop words removing Data sets Num. Of Document Num. of word Num. of category 20news18,74426, 21420 la12,85013,1955

13 Cont’

14 Experimental Results

15 Cont’

16 Conclusion Conclusions We propose a novel method, namely LLMF for learning low dimensional representations of document with local linear constraints. LLMF can better capture the rich geometric information among documents than those based on independent pairwise relationships. Experiments on benchmark of 20news and la1 show the proposed approach can learn better semantic representations compared to other baseline methods Future works We would extend LLMF to paralleled and distributed settings It is promising to apply LLMF in recommendation systems

17 References D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichlet allocation. JMLR, 3:2003, 2003. D. Cai, X. He, and J. Han. Locally consistent concept factorization for document clustering. TKDE, 23(6):902–913,2011 D. Cai, Q. Mei, J. Han, and C. Zhai. Modeling hidden topics on document manifold. CIKM ’08, 911–920,, NY, USA, 2008. ACM T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. In Machine Learning, page 2001, 2001 S. Huh and S. E. Fienberg. Discriminative topic modeling based on manifold learning. KDD ’10, pages 653–662, New York, NY, USA, 2010. ACM

18 Thanks!! Q&A

19 Appendix


Download ppt "Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,"

Similar presentations


Ads by Google