Latent Tree Models Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. AAAI 2014 Tutorial.

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Latent Tree Models Part III: Learning Algorithms Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
MICHAEL PAUL AND ROXANA GIRJU UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics.
Data Visualization STAT 890, STAT 442, CM 462
Latent Structure Models and Statistical Foundation for TCM Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science.
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
New Geometric Methods of Mixture Models for Interactive Visualization PIs: Jia Li, Xiaolong (Luke) Zhang, Bruce Lindsay Department of Statistics College.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.
Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks.
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Bayesian Networks Alan Ritter.
CSE 590ST Statistical Methods in Computer Science Instructor: Pedro Domingos.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Latent Tree Models Part II: Definition and Properties
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
An Evidence-Based Approach to
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
A Brief Introduction to Graphical Models
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Sum-Product Networks CS886 Topics in Natural Language Processing
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Latent Tree Models & Statistical Foundation for TCM Nevin L. Zhang Joint Work with: Chen Tao, Wang Yi, Yuan Shihong Department of Computer Science & Engineering.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Latent Tree Analysis of Unlabeled Data Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech.
CCF 贝叶斯网络在中国的应用和发展学术沙龙 香港科技大学 BN 理论研究和应用的情况
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Latent Tree Models for Hierarchical Topic Detection
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Introduction to Machine Learning August, 2014 Vũ Việt Vũ Computer Engineering Division, Electronics Faculty Thai Nguyen University of Technology.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Fill-in-The-Blank Using Sum Product Network
CSE 4705 Artificial Intelligence
Latent variable discovery in classification models
Background Information for Project
Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**
Consistent and Efficient Reconstruction of Latent Tree Models
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Recovering Temporally Rewiring Networks: A Model-based Approach
CSE 515 Statistical Methods in Computer Science
Markov Random Fields Presented by: Vladan Radosavljevic.
Forest Learning from Data
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Topic Models in Text Processing
Presentation transcript:

Latent Tree Models Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. AAAI 2014 Tutorial

HKUST 2014 HKUST 1988

AAAI 2014 Tutorial Nevin L. Zhang HKUST3 Latent Tree Models  Part I: Non-Technical Overview (25 minutes)  Part II: Definition and Properties (25 minutes)  Part III: Learning Algorithms (110 minutes, 30 minutes break half way)  Part IV: Applications (50 minutes)

AAAI 2014 Tutorial Nevin L. Zhang HKUST4 Part I: Non-Technical Overview  Latent tree models  What can LTMs be used for:  Discovery of co-occurrence/correlation patterns  Discovery of latent variable/structures  Multidimensional clustering  Examples  Danish beer survey data  Text data

AAAI 2014 Tutorial Nevin L. Zhang HKUST5 Latent Tree Models (LTMs)  Tree-structured probabilistic graphical models  Leaves observed (manifest variables)  Discrete or continuous  Internal nodes latent (latent variables)  Discrete  Each edge is associated with a conditional distribution  One node with marginal distribution  Defines a joint distributions over all the variables (Zhang, JMLR 2004)

AAAI 2014 Tutorial Nevin L. Zhang HKUST6 Latent Tree Analysis (LTA) Learning latent tree models: Determine Number of latent variables Numbers of possible states for latent variables Connections among nodes Probability distributions From data on observed variables, obtain latent tree model

AAAI 2014 Tutorial Nevin L. Zhang HKUST7 LTA on Danish Beer Market Survey Data  463 consumers, 11 beer brands  Questionnaire: For each brand:  Never seen the brand before (s0);  Seen before, but never tasted (s1);  Tasted, but do not drink regularly (s2)  Drink regularly (s3). Page 7 (Mourad et al. JAIR 2013)

AAAI 2014 Tutorial Nevin L. Zhang HKUST8 Why variables grouped as such?  Responses on brands in each group strongly correlated.  GronTuborg and Carlsberg: Main mass-market beers  TuborgClas and CarlSpec: Frequent beers, bit darker than the above  CeresTop, CeresRoyal, Pokal, …: minor local beers  In general, LTA partitions observed variables into groups such that  Variables in each group are strongly correlated, and  The correlations among each group can be properly be modeled using one single latent variable Page 8

AAAI 2014 Tutorial Nevin L. Zhang HKUST9 Multidmensional Clustering  Each Latent variable gives a partition of consumers.  H1:  Class 1: Likely to have tasted TuborgClas, Carlspec and Heineken, but do not drink regularly  Class 2: Likely to have seen or tasted the beers, but did not drink regularly  Class 3: Likely to drink TuborgClas and Carlspec regularly  H0 and H2 give two other partitions.  In general, LTA is a technique for multiple clustering.  In contrast, K-Means, mixture models give only one partition. Page 9

AAAI 2014 Tutorial Nevin L. Zhang HKUST10 Unidimensional vs Multidimensional Clustering l Grouping of objects into clusters such that objects in the same cluster are similar while objects from different clusters are dissimilar. Page 10 l Result of clustering is often a partition of all the objects.

AAAI 2014 Tutorial Nevin L. Zhang HKUST11 How to Cluster Those?

AAAI 2014 Tutorial Nevin L. Zhang HKUST12 How to Cluster Those? Style of picture

AAAI 2014 Tutorial Nevin L. Zhang HKUST13 How to Cluster Those? Type of object in picture

AAAI 2014 Tutorial Nevin L. Zhang HKUST14 Multidimensional Clustering Complex data usually have multiple facets and can be meaningfully partitioned in multiple ways. Multidimensional clustering / Multi-Clustering LTA is a model-based method for multidimensional clustering. Other methods:

AAAI 2014 Tutorial Nevin L. Zhang HKUST15  LTA produces a partition of observed variables.  For each cluster of variables, it produces a partition of objects. Clustering of Variables and Objects

AAAI 2014 Tutorial Nevin L. Zhang HKUST16  1041 web pages collected from 4 CS departments in 1997  336 words Binary Text Data: WebKB

AAAI 2014 Tutorial Nevin L. Zhang HKUST17 Latent Tree Model for WebKB Data (Liu et al. MLJ 2013) 89 latent variables

Latent Tree Modes for WebKB Data

AAAI 2014 Tutorial Nevin L. Zhang HKUST21  Words in each group tend to co-occur.  On binary text data, LTA partitions word variables into groups such that  Words in each group tend to co-occur and  The correlations can be properly be explained using one single latent variable Why variables grouped as such? LTA is a method for identifying co-occurrence relationships.

AAAI 2014 Tutorial Nevin L. Zhang HKUST22 LTA is an alternative approach to topic detection  Y66=4: Object Oriented Programming (oop)  Y66=2: Non-oop programming  Y66=1: programming language  Y66=3: Not on programming Multidimensional Clustering More on this in Part IV

AAAI 2014 Tutorial Nevin L. Zhang HKUST23 Summary  Latent tree models:  Tree-structured probabilistic graphical models  Leaf nodes: observed variables  Internal nodes: latent variable  What can LTA be used for:  Discovery of co-occurrence patterns in binary data  Discovery of correlation patterns in general discrete data  Discovery of latent variable/structures  Multidimensional clustering  Topic detection in text data  Probabilistic modelling

Key References:  Anandkumar, A., Chaudhuri, K., Hsu, D., Kakade, S. M., Song, L., & Zhang, T. (2011). Spectral methods for learning multivariate latent tree structure. In Twenty-Fifth Conference in Neural Information Processing Systems (NIPS-11).  Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., and Telgarsky, M. Tensor decompositions for learning latent variable models. In Preprint, 2012a.  Anandkumar, A., Hsu, D., and Kakade, S. M. A method of moments for mixture models and hidden Markov models. In An abridged version appears in the Proc. Of COLT, 2012b.  Choi, M. J., Tan, V. Y., Anandkumar, A., & Willsky, A. S. (2011). Learning latent tree graphical models. Journal of Machine Learning Research, 12, 1771–1812.  Friedman, N., Ninio, M., Pe’er, I., & Pupko, T. (2002). A structural EM algorithm for phylogenetic inference.. Journal of Computational Biology, 9(2), 331–353.  Harmeling, S., & Williams, C. K. I. (2011). Greedy learning of binary latent trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(6), 1087–1097.  Hsu, D., Kakade, S., & Zhang, T. (2009). A spectral algorithm for learning hidden Markov models. In The 22nd Annual Conference on Learning Theory (COLT 2009).

AAAI 2014 Tutorial Nevin L. Zhang HKUST25 Key References:  E. Mossel, S. Roch, and A. Sly. Robust estimation of latent tree graphical models: Inferring hidden states with inexact parameters. Submitted  Mourad, R., Sinoquet, C., & Leray, P. (2011). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genomewide association studies. BMC Bioinformatics, 12, 16.  Mourad R., Sinoquet C., Zhang N. L., Liu T. F. and Leray P. (2013). A survey on latent tree models and applications. Journal of Artificial Intelligence Research, 47, , 13 May doi: /jair  Parikh, A. P., Song, L., & Xing, E. P. (2011). A spectral algorithm for latent tree graphical models. In Proceedings of the 28th International Conference on Machine Learning (ICML-2011).  Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees.. Molecular Biology and Evolution, 4(4), 406–425.  Song, L., Parikh, A., & Xing, E. (2011). Kernel embeddings of latent tree graphical models. In Twenty- Fifth Conference in Neural Information Processing Systems (NIPS-11).  Tan, V. Y. F., Anandkumar, A., & Willsky, A. (2011). Learning high-dimensional Markov forest distributions: Analysis of error rates. Journal of Machine Learning Research,12, 1617–1653.

AAAI 2014 Tutorial Nevin L. Zhang HKUST26 Key References:  T. Chen and N. L. Zhang (2006). Quartet-based learning of shallow latent variables. In Proceedings of the Third European Workshop on Probabilistic Graphical Model,59-66, September 12-15,  Chen, T., Zhang, N. L., Liu, T., Poon, K. M., & Wang, Y. (2012). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246–2269.  Liu, T. F., Zhang, N. L., Liu, A. H., & Poon, L. K. M. (2013). Greedy learning of latent tree models for multidimensional clustering. Machine Learning, doi: /s  Liu, T. F., Zhang, N. L., and Chen, P. X. (2014). Hierarchical latent tree analysis for topic detection. ECML, 2014  Poon, L. K. M., Zhang, N. L., Chen, T., & Wang, Y. (2010). Variable selection in modelbased clustering: To do or to facilitate. In Proceedings of the 27th International Con-ference on Machine Learning (ICML- 2010).  Wang, Y., Zhang, N. L., & Chen, T. (2008). Latent tree models and approximate inference in Bayesian networks. Journal of Articial Intelligence Research, 32, 879–900.  Wang, X. F., Guo, J. H., Hao, L. Z., Zhang, N.L., & P. X. Chen (2013). Recovering discrete latent tree models by spectral methods.  Wang, X. F., Zhang, N. L. (2014). A Study of Recently Discovered Equalities about Latent Tree Models using Inverse Edges. PGM  Zhang, N. L. (2004). Hierarchical latent class models for cluster analysis. The Journal of Machine Learning Research, 5, 697–723.  Zhang, N. L., & Kocka, T. (2004a). Effective dimensions of hierarchical latent class models. Journal of Articial Intelligence Research, 21, 1–17.

Key References:  Zhang, N. L., & Kocka, T. (2004b). Efficient learning of hierarchical latent class models. In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 585–593.  Zhang, N. L., Nielsen, T. D., & Jensen, F. V. (2004). Latent variable discovery in classification models. Artificial Intelligence in Medicine, 30(3), 283–299.  Zhang, N. L., Wang, Y., & Chen, T. (2008). Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*. Journal of Systems Science and Complexity, 21(2), 172–183.  Zhang, N. L., Yuan, S., Chen, T., & Wang, Y. (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine, 42(3), 229–245.  Zhang, N. L., Yuan, S., Chen, T., & Wang, Y. (2008). Statistical Validation of TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):  Zhang, N. L., Fu, C., Liu, T. F., Poon, K. M., Chen, P. X., Chen, B. X., Zhang, Y. L. (2014). The Latent Tree Analysis Approach to Patient Subclassification in Traditional Chinese Medicine. Evidence-Based Complementary and Alternative Medicine.  Xu, Z. X., Zhang, N. L., Wang, Y. Q., Liu, G. P., Xu, J., Liu, T. F., and Liu A. H. (2013). Statistical Validation of Traditional Chinese Medicine Syndrome Postulates in the Context of Patients with Cardiovascular Disease. The Journal of Alternative and Complementary Medicine. 18, 1-6.  Zhao, Y. Zhang, N. L., Wang, T. F., Wang, Q. G. (2014). Discovering Symptom Co-Occurrence Patterns from 604 Cases of Depressive Patient Data using Latent Tree Models. The Journal of Alternative and Complementary Medicine. 20(4):