Latent Tree Analysis of Unlabeled Data Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Latent Tree Models Part IV: Applications Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Latent Tree Models Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. AAAI 2014 Tutorial.
Myers’ PSYCHOLOGY (7th Ed)
Clustering V. Outline Validating clustering results Randomization tests.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Latent Structure Models and Statistical Foundation for TCM Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Latent Structure Models & Statistical Foundation for TCM Nevin L. Zhang The Hong Kong University of Science & Techology.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Andrew K. C. Wong Yang Wang 國立雲林科技大學 National Yunlin University of.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Latent Tree Models Part II: Definition and Properties
An Evidence-Based Approach to
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Data Mining Chun-Hung Chou
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
An Evidence-Based Approach to TCM Patient Class Definition and Differentiation Nevin L. Zhang The Hong Kong Univ. of Sci. & Tech.
Chapter 19 Confidence Interval for a Single Proportion.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Chapter 2 Research in Abnormal Psychology. Slide 2 Research in Abnormal Psychology  Clinical researchers face certain challenges that make their investigations.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
Latent Tree Models & Statistical Foundation for TCM Nevin L. Zhang Joint Work with: Chen Tao, Wang Yi, Yuan Shihong Department of Computer Science & Engineering.
First topic: clustering and pattern recognition Marc Sobel.
Randomized Algorithms for Bayesian Hierarchical Clustering
+ Chi Square Test Homogeneity or Independence( Association)
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Bilinear Logistic Regression for Factored Diagnosis Problems Sumit Basu 1, John Dunagan 1,2, Kevin Duh 1,3, and Kiran-Kumar Munuswamy-Reddy 1,4 1 Microsoft.
Evidence Based Practice RCS /9/05. Definitions  Rosenthal and Donald (1996) defined evidence-based medicine as a process of turning clinical problems.
CS Statistical Machine learning Lecture 24
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Latent Tree Models for Hierarchical Topic Detection
Automatic Labeling of Multinomial Topic Models
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
1 Occupancy models extension: Species Co-occurrence.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
A2 unit 4 Clinical Psychology 4) Content Reliability of the diagnosis of mental disorders Validity of the diagnosis of mental disorders Cultural issues.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
CSE 4705 Artificial Intelligence
Topic 2: Types of Statistical Studies
Writing a sound proposal
Latent variable discovery in classification models
By Arijit Chatterjee Dr
Background Information for Project
Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**
Chapter 12 Tests with Qualitative Data
Dr. Muhammad Ajmal Zahid Chairman, Department of Psychiatry,
A Short Tutorial on Causal Network Modeling and Discovery
Yulong Xu Henan University of Chinese Medicine
Statistical Data Analysis
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
STAT 515 Statistical Methods I Chapter 1 Data Types and Data Collection Brian Habing Department of Statistics University of South Carolina Redistribution.
Presentation transcript:

Latent Tree Analysis of Unlabeled Data Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech.

Outline l Latent tree models l Latent tree analysis algorithms l What can LTA be used for: n Discovery of co-occurrence/correlation patterns n Discovery of latent variable/structures n Multidimensional clustering l Examples n Danish beer survey data n Text data n TCM survey data Page 2

Latent Tree Models l Tree-structured probabilistic graphical models n Leaves observed (manifest variables)  Discrete or continuous n Internal nodes latent (latent variables)  Discrete n Each edge is associated with a conditional distribution n One node with marginal distribution n Defines a joint distributions over all the variables (Zhang, JMLR 2004) Page 3

Latent Tree Analysis Learning latent tree models: Determine Number of latent variables Numbers of possible states for latent variables Connections among nodes Probability distributions From data on observed variables, obtain latent tree model Model Selection Criterion Find the model that maximize the BIC score BIC(m|D) = log P(D|m, θ *) – d/2 logN D: Data, N: sample size m: model, θ *: MLE of parameters d: number of free parameters

Algorithms: EAST l Search-based Extension, Adjustment, Simplification until Termination l Can deal with ~100 observed variables (Chen, Zhang et al. AIJ 2011) Page 5

(Liu, Zhang et al. MLJ 2013) UniDimensioanlity Test

(Liu, Zhang et al. MLJ 2013)

Chow-Liu tree (1968)

Close to EAST in terms of model quality. Can deal with 1,000 observed variables (Liu, Zhang et al. MLJ 2013)

Outline l Latent tree models l Latent tree analysis algorithms l What can LTA be used for: n Discovery of co-occurrence/correlation patterns n Discovery of latent variable/structures n Multidimensional clustering l Examples n Danish beer survey data n Text data n TCM survey data Page 10

Danish Beer Market Survey l 463 consumers, 11 beer brands l Questionnaire: For each brand: n Never seen the brand before (s0); n Seen before, but never tasted (s1); n Tasted, but do not drink regularly (s2) n Drink regularly (s3). Page 11 (Mourad et al. JAIR 2013)

Why variables grouped as such? l GronTuborg and Carlsberg: Main mass-market beers l TuborgClas and CarlSpec: Frequent beers, bit darker than the above l CeresTop, CeresRoyal, Pokal, …: minor local beers l Grouped as such because responses on brands in each group strongly correlated. l Intuitively, latent tree analysis: n Partitions observed variables into groups such that  Variables in each group are strongly correlated, and  The correlations among each group can be properly be modeled using one single latent variable Page 12

Multidmensional Clustering l Each Latent variable give s a partition of consumers. n H1:  Class 1: Likely to have tasted TuborgClas, Carlspec and Heineken, but do not drink regularly  Class 2: Likely to have seen or tasted the beers, but did not drink regularly  Class 3: Likely to drink TuborgClas and Carlspec regularly l Intuitively, latent tree analysis is a technique for multiple clustering. n K-Means, mixture models give only one partition. Page 13

Binary Text Data: WebKB 1041 web pages collected from 4 CS departments in words Page 14 (Liu et al. PGM 2012, MLJ 2013)

Latent Tree Model for WebKB Data by BI Algorithm Page latent variables

Latent Tree Modes for WebKB Data

Page 17

Page 18

Page 19 l Group as such because words in in each group tend to co-occur. l On binary data, latent tree analysis: n Partitions observed word variables into groups such that  Words in each group tend to co-occur and  The correlations can be properly be explained using one single latent variable Page 19 Why variables grouped as such? LTA is a method for identifying co-occurrence relationships.

Multidimensional Clustering LTA is an approach to topic detection l Y66=4: Object Oriented Programming (oop) l Y66=2: Non-oop programming l Y66=1: programming language l Y66=3: Not on programming

Outline l Latent tree models l Latent tree analysis algorithms l What can LTA be used for: n Discovery of co-occurrence/correlation patterns n Discovery of latent variable/structures n Multidimensional clustering l Examples n Danish beer survey data n Text data n TCM survey data Page 21

Background of Research l Common practice in China, increasingly in Western world n Patients of a WM disease divided into several TCM classes n Different classes are treated differently using TCM treatments. l Example: n WM disease: Depression n TCM Classes:  Liver-Qi Stagnation ( 肝气郁结 ). Treatment principle: 疏肝解郁, Prescription: 柴胡疏肝散  Deficiency of Liver Yin and Kidney Yin ( 肝肾阴虚 ) : Treatment principle: 滋肾养肝, Prescription: 逍遥散合六味地黄丸  Vacuity of both heart and spleen ( 心脾两虚 ). Treatment principle: 益 气健脾, Prescription: 归脾汤  …. Page 22

Key Question l How should patients of a WM disease be divided into subclasses from the TCM perspective? n What TCM classes? n What are the characteristics of each TCM class? n How to differentiate different TCM classes? l Important for n Clinic practice n Research  Randomized controlled trials for efficacy  Modern biomedical understanding of TCM concepts l No consensus. Different doctors/researchers use different schemes. Key weakness of TCM. Page 23

Key Idea l Our objective: n Provide an evidence-based method for TCM patient classification l Key Idea n Cluster analysis of symptom data => empirical partition of patients n Check to see whether it corresponds to TCM class concept l Key technology: Multidimensional clustering n Motivation for developing latent tree analysis Page 24

Symptoms Data of Depressive Patients l Subjects: n 604 depressive patients aged between 19 and 69 from 9 hospitals n Selected using the Chinese classification of mental disorder clinic guideline CCMD-3 n Exclusion:  Subjects we took anti-depression drugs within two weeks prior to the survey; women in the gestational and suckling periods,.. etc l Symptom variables n From the TCM literature on depression between 1994 and n Searched with the phrase “ 抑郁 and 证 ” on the CNKI (China National Knowledge Infrastructure) data n Kept only those on studies where patients were selected using the ICD-9, ICD-10, CCMD-2, or CCMD-3 guidelines. n 143 symptoms reported in those studies altogether. Page 25 (Zhao et al. JACM 2014)

The Depression Data l Data as a table n 604 rows, each for a patient n 143 columns, each for a symptom n Table cells: 0 – symptom not present, 1 – symptom present l Removed: Symptoms occurring <10 times l 86 symptoms variables entered latent tree analysis. l Structure of the latent tree model obtained on the next two slides. Page 26

Model Obtained for a Depression Data (Top) Page 27

Model obtained for a Depression Data (Bottom) Page 28

The Empirical Partitions Page 29 l The first cluster (Y 29 = s 0 ) consists of 54% of the patients and while the cluster (Y 29 = s 1 ) consists of 46% of the patients. l The two symptoms ‘fear of cold’ and ‘cold limbs’ do not occur often in the first cluster l While they both tend to occur with high probabilities (0.8 and 0.85) in the second cluster.

Probabilistic Symptom co-occurrence pattern l Probabilistic symptom co-occurrence pattern: n The table indicates that the two symptoms ‘fear of cold’ and ‘cold limbs’ tend to co-occur in the cluster Y 29 = s 1 l Pattern meaningful from the TCM perspective. n TCM asserts that YANG DEFICIENCY ( 阳虚 ) can lead to, among other symptoms, ‘fear of cold’ and ‘cold limbs’ n So, the co-occurrence pattern suggests the TCM symdrome type (证型) YANG DEFICIENCY ( 阳虚 ). Page 30 l The partition Y 29 suggests that n Among depressive patients, there is a subclass of patient with YANG DEFICIENCY. n In this subclass, ‘fear of cold’ and ‘cold limbs’ co-occur with high probabilities (0.8 and 0.85)

Probabilistic Symptom co-occurrence pattern Page 31 l Y 28 = s 1 captures the probabilistic co-occurrence of ‘aching lumbus’, ‘lumbar pain like pressure’ and ‘lumbar pain like warmth’. l This pattern is present in 27% of the patients. l It suggests that n Among depressive patients, there is a subclass that correspond to the TCM concept of KIDNEY DEPRIVED OF NOURISHMENT ( 肾虚失养 ) n Characteristics of the subclass given by distributions for Y 28 = s 1

Probabilistic Symptom co-occurrence pattern Page 32 l Y 27 = s 1 captures the probabilistic co-occurrence of ‘weak lumbus and knees’ and ‘cumbersome limbs’. l This pattern is present in 44% of the patients l It suggests that, n Among depressive patients, there is a subclass that correspond to the TCM concept of KIDNEY DEFICIENCY (肾虚) n Characteristics of the subclass given by distributions for Y 27 = s 1 l Y27, Y28, Y29 together provide evidence for defining KIDNEY YANG DEFICIENCY

Probabilistic Symptom co-occurrence pattern l Pattern Y 21 = s 1 : evidence for defining STAGNANT QI TURNING INTO FIRE (气郁化火) l Y 15 = s 1 : evidence for defining QI DEFICIENCY l Y 17 = s 1 : evidence for defining HEART QI DEFICIENCY l Y 16 = s 1 : evidence for defining QI STAGNATION l Y 19 = s 1 : evidence for defining QI STAGNATION IN HEAD Page 33

Probabilistic Symptom co-occurrence pattern l Y 9 = s 1 :evidence for defining DEFICIENCY OF BOTH QI AND YIN ( 气阴两虚 ) l Y 10 = s 1 : evidence for defining YIN DEFICIENCY ( 阴虚 ) l Y 11 = s 1 : evidence for defining DEFICIENCY OF STOMACH/SPLEEN YIN ( 脾 胃阴虚 ) Page 34

Symptom Mutual-Exclusion Patterns l Some empirical partitions reveal symptom exclusion patterns l Y 1 reveals the mutual exclusion of ‘white tongue coating’, ‘yellow tongue coating’ and ‘yellow-white tongue coating’ l Y 2 reveals the mutual exclusion of ‘thin tongue coating’, ‘thick tongue coating’ and ‘little tongue coating’. Page 35

Summary of TCM Data Analysis l By analyzing 604 cases of depressive patient data using latent tree models we have discovered a host of probabilistic symptom co-occurrence patterns and symptom mutual-exclusion patterns. l Most of the co-occurrence patterns have clear TCM syndrome connotations, while the mutual-exclusion patterns are also reasonable and meaningful. l The patterns can be used as evidence for the task of defining TCM classes in the context of depressive patients and for differentiating between those classes. Page 36

Another Perspective: Statistical Validation of TCM Postulates Page 37 (Zhang et al. JACM 2008) Yang Deficiency Y29 = s1 Kidney deprived of nourishment Y28 = s1 l TCM terms such as Yang Deficiency were introduced to explain symptom co- occurrence patterns observed in clinic practice. …..

Value of Work in View of Others l D. Haughton and J. Haughton. Living Standards Analytics: Development through the Lens of Household Survey Data. Springer n Zhang et al. provide a very interesting application of latent class (tree) models to diagnoses in traditional Chinese medicine (TCM). n The results tend to confirm known theories in Chinese traditional medicine. n This is a significant advance, since the scientific bases for these theories are not known. n The model proposed by the authors provides at least a statistical justification for them. Page 38

Summary l Latent tree models: n Tree-structure probabilistic graphical models n Leaf nodes: observed variables n Internal nodes: latent variable l What can LTA be used for: n Discovery of co-occurrence patterns in binary data n Discovery of correlation patterns in general discrete data n Discovery of latent variable/structures n Multidimensional clustering n Topic detection in text data n Key role in TCM patient classification Page 39

References: l N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6): , l T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), l T.F.Liu, N. L. Zhang, A.H. Liu, L.K.M. Poon (2012). A Novel LTM-based Method for Multidimensional Clustering. European Workshop on Probabilistic Graphical Models (PGM-12), l T.F, Liu, N. L. Zhang, P. X. Chen, A. H.Liu, L. K. M. Poon, and Yi Wang (2013). Greedy learning of latent tree models for multidimensional clustering. Machine Learning, doi: /s l R. Mourad, C. Sinoquet, N. L. Zhang, T.F. Liu and P. Leray (2013). A survey on latent tree models and applications. Journal of Artificial Intelligence Research, 47, , 13 May doi: /jair l N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Statistical Validation of TCM Theories. Journal of Alternative and Complementary Medicine, 14(5): l N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42: l Z.X. Xu, N. L. Zhang, Y.Q. Wang, G.P. Liu, J. Xu, T. F. Liu, and A. H. Liu (2013). Statistical Validation of Traditional Chinese Medicine Syndrome Postulates in the Context of Patients with Cardiovascular Disease. The Journal of Alternative and Complementary Medicine. l Y. Zhao, N. L. Zhang, T.F.Wang, Q. G. Wang (2014). Discovering Symptom Co-Occurrence Patterns from 604 Cases of Depressive Patient Data using Latent Tree Models. The Journal of Alternative and Complementary Medicine. Page 40

Thank You !