Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**

Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**
* The Hong Kong University of Science and Technology ** The Education University of Hong Kong

What is Latent Tree Analysis?
Repeated event co-occurrences might Due to common hidden causes or genuine direct correlations, OR Be coincidental, esp. in big data Challenge: Identify co-occurrences due to hidden causes or correlations. Latent tree analysis solves a related and simpler problem: Detect co-occurrences that can be statistically explained by a tree of latent variables Can be used to solve interesting tasks Multidimensional clustering Hierarchical topic detection Latent structure discovery …

Basic Latent Tree Models (LTM)
Tree-structured Bayesian network All variables are discrete Variables at leaf nodes are observed Variables at internal nodes are latent Parameters: P(Y1), P(Y2|Y1),P(X1|Y2), P(X2|Y2), … Semantics: Also known as Hierarchical latent class (HLC) models, HLC models (Zhang. JMLR 2004)

Pouch Latent Tree Models (PLTM)
An extension of basic LTM Rooted tree Internal nodes represent discrete latent variables Each leaf node consists of one or more continuous observed variables, called a pouch. (Poon et al. ICML 2010)

More General Latent Variable Tree Models
Internal nodes can be observed Internal nodes can be continuous Forest Primary focus of this talk: the basic LTM (Choi et al. JMLR 2011)

Identifiability Issues
Root change lead to equivalent models So, edge orientations unidentifiable Hence, we are really talking about undirected models Undirected LTM represents an equivalent class of directed LTMs In implementation, represented as directed model instead of MRF so that partition function is always 1. (Zhang, JMLR 2004)

Identifiability Issues
|X|: Cardinality of variable X, i.e., the number of states. (Zhang, JMLR 2004) Theorem: The set of all regular models for a given set of observed variables is finite. Latent variables cannot have too many states.

Latent Tree Analysis (LTA)
Learning latent tree models: Determine Number of latent variables Numbers of possible states for each latent variable Connections among variables Probability distributions

Latent Tree Analysis (LTA)
Learning latent tree models: Determine Number of latent variables Numbers of possible states for each latent variable Connections among nodes Probability distributions Difficult, but doable

Three Settings for Algorithm Developments
Setting 1: CLRG (Choi et al, 2011, Huang et al. 2015) Assume that data generated from an unknown LTM. Investigate properties of LTMs and use them for learning E.g., model structure from tree additivity of information distance Theoretical guarantees to recover generative model under conditions Setting 2: EAST, BI (Chen et al, 2012, Liu et al, 2013) Do not assume that data generated from an LTM. Fit to LTM to data using BIC score, via search or heuristics Does not make sense to talk about theoretical guarantees Obtains better models than Setting 1 because the assumption usually untrue. Setting 3: HLTA (Liu et al, 2014, Chen et al, 2016) Consider usefulness in addition to model fit. Hierarchy of latent variables. .

Current Capabilities Takes a few hours on a single machine to analyze data sets with Thousands of variables, and Hundreds of thousands of instances

What can LTA be used for? Multidimensional clustering
Hierarchical topic detection Latent structure discovery Other applications .

What can LTA be used for? Multidimensional clustering
Hierarchical topic detection Latent structure discovery Other applications

How to Cluster? Cluster analysis: Grouping of objects into clusters such that Objects in the same cluster are similar Objects from different clusters are dissimilar.

How to Cluster these?

Multidimensional Clustering
Complex data usually have multiple facets, and can be meaningfully partitioned in multiple ways. More reasonable to look for multiple ways to partition data How to get multiple partitions?

How to get one partition?
Finite mixture models: One latent variable z Gaussian mixture models: Continuous data Latent class model (mixture of multinomial distributions): Categorical data Key point: Use models with one latent variable for one partition

How to get multiple partitions?
Use models with multiple latent variables for multiple partitions Latent tree models Probabilistic graphical models with multiple latent variables A generalization of latent class models

Multidimensional Clustering of Social Survey Data
// Survey on corruption in Hong Kong and performance of the anti-corruption agency -- ICAC //31 questions, 1200 samples C_City: s0 s1 s2 s // very common, quite common, uncommon, very uncommon C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s //totally intolerable, intolerable, tolerable, totally tolerable Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... ….. …. (Chen et al, AIJ 2012)

Latent Structure Discovery
Y2: Demographic background; Y4: ICAC performance; Y6: Level of corruption; Y3: Tolerance toward corruption; Y5: Change in level of corruption; Y7: ICAC accountability

Y2=s0: Low income youngsters; Y2=s1: Women with no/little income; Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income.

Values of observed variable: S0 - totally intolerable, …., s3 -totally tolerable Interpretations of values of latent variables Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15% Interesting finding: People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are lenient about corruption are more lenient toward C-Bus than C-Gov

Who are the toughest toward corruption among the 4 groups? Interesting finding: Y2=s2: ( good education and good income) the toughest on corruption. Y2=s3: (poor education and average income) the most lenient on corruption The other two classes are in between.

Multidimensional Clustering Summary
Latent tree analysis has found several interesting ways to partition the ICAC data, and revealed some interesting relationships between different partitions. (Chen et al. AIJ 2012)

What Can LTA be used for? Multidimensional clustering

Hierarchical Latent Tree Analysis (HLTA)
Each word is a binary variable 0 – absence from doc, 1 – presence in doc Each document is a binary vector over vocabulary

Topics Each latent variable partitions docs into 2 clusters
Document clusters interpreted as topics Z14=0: background topic Z14=1: “video-card-driver” Each latent variable gives one topic

Topic Hierarchy Latent variables at high levels
“long-range” word co-occurrences, more general topics Latent variables at low levels “short-range” word co-occurrences, more specific topics.

The New York Times Dataset
From UCI 300,000 articles from 10,000 words selected using TF/IDF HLTA took 7 hours on a desktop machine

Model Learned from 300,000 New York Times Articles
(

Hierarchical Latent Tree Analysis: Summary
Latent tree analysis gives a novel method for hierarchical topic detection. It is able to find meaningful topics and topic hierarchies. It differs from the LDA-based methods in several important ways. In empirical evaluations, the new method significantly outperforms the LDA methods. (Chen et al. AAAI 2016)

Link to Deep Learning Commonalities between HLTM vs DBN
Define distribution over observed binary variables Multiple layers of latent variables Differences between HLTM vs DBN HLTM: Tree-structured, learned from data DBN: Full-connections between layers, manually specified

Link to Deep Learning Potential method for learning structures for deep models Learn HLTM from data Use it as skeleton for deep model Add additional links for better model fit

Link to Deep Learning Sparse Boltzmann Machines (SBM) (Chen et al, AAAI 2017, Wed 10-11, ML19) Restricted Boltzman Machine (RBM) Sparse Boltzman Machine (SBM) Learn appropriate structure from data Avoids overfitting Per-word perplexity RBM with optimal # hidden units RBM with same # hidden units as SBM RBM with same interlayer links as SBM RBM with same # hidden units as SBM, pruned links SBM

Analysis of Online Transaction Data
Use latent tree models to model co-purchases of items Structure can be used to organize the items Potentially provide a new perspective on collaborative filtering Results on last.fm

Analysis of Data from Traditional Chinese Medicine (TCM)
Use latent tree models to model co-occurrence of symptoms Significance of work TCM concepts criticized for being vague and elusive in meaning Our work shows: They are terms to label symptom patterns and correspond to soft clusters of patients. (Zhang et al. JACM 2008)

Analysis of Gene Expression Data
(Gitter et al, 2016) Use latent tree models to model co-expressions of genes Helpful in reconstructing transcriptional regulatory networks Recently, Gitter and others have learned a latent tree model to explain co-expressions of genes, which can help to reconstruct transcriptional regulatory networks. So, there is a lot of things we can do with latent tree models.

LTMs for Probabilistic Modelling
Attractive Representation of Joint Distributions Computationally very simple to work with. Represent complex relationships among observed variables. Those two properties are exploited to build tractable probabilistic models in (Wang, Zhang, and Chen, 2008; Kaltwang, Todorovic, and Pantic, 2015; Yu, Huang, and Dauwels, 2016). (Pear 1988)

Uni-Dimensional Clustering of Categorical Data
Latent class model (LCM) LTM with 1 latent variable Known as mixture of multinomials. Widely used for cluster analysis in social, behavioral and health sciences (Collins and Lanza, 2010). Key weakness: Local independence assumption is too strong. Latent tree models offer a natural framework where the local independence assumption can be relaxed.

Summary Latent tree models can be used to model Useful tool for
Co-occurrences of words in documents Co-occurrences of symptoms among patients Co-purchases of items in online transaction data Co-expressions of genes Correlations among answers to survey queries … Useful tool for Multidimensional clustering Hierarchical topic detection Latent structure learning Tractable probabilistic models Relax local independence assumption of latent class models

Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**

Similar presentations

Presentation on theme: "Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**

Similar presentations

Presentation on theme: "Latent Tree Analysis Nevin L. Zhang* and Leonard K. M. Poon**"— Presentation transcript:

Similar presentations

About project

Feedback