Consistent and Efficient Reconstruction of Latent Tree Models

Slides:



Advertisements
Similar presentations
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Advertisements

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Fast Algorithms For Hierarchical Range Histogram Constructions
Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.
Supervised Learning Recap
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Minimum Spanning Trees
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
A Randomized Linear-Time Algorithm to Find Minimum Spaning Trees 黃則翰 R 蘇承祖 R 張紘睿 R 許智程 D 戴于晉 R David R. Karger.
An Introduction to Variational Methods for Graphical Models.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Mutual Information Mathematical Biology Seminar
CS2420: Lecture 13 Vladimir Kulyukin Computer Science Department Utah State University.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Module #1 - Logic 1 Based on Rosen, Discrete Mathematics & Its Applications. Prepared by (c) , Michael P. Frank and Modified By Mingwu Chen Trees.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
All-Pairs Shortest Paths & Essential Subgraph 01/25/2005 Jinil Han.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
CS Statistical Machine learning Lecture 24
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Trees Dr. Yasir Ali. A graph is called a tree if, and only if, it is circuit-free and connected. A graph is called a forest if, and only if, it is circuit-free.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Today Graphical Models Representing conditional dependence graphically
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Red-Black Trees an alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A.
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
Latent Tree Models Part I: Concept and Algorithms
Learning Deep Generative Models by Ruslan Salakhutdinov
Real-Time Soft Shadows with Adaptive Light Source Sampling
Multiway Search Trees Data may not fit into main memory
Chapter 5. Greedy Algorithms
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Discrete Mathematicsq
Analysis of algorithms
Minimum Spanning Tree 8/7/2018 4:26 AM
COSC160: Data Structures Linked Lists
Latent Variables, Mixture Models and EM
Hierarchical clustering approaches for high-throughput data
Effective Social Network Quarantine with Minimal Isolation Costs
Bayesian Models in Machine Learning
An Introduction to Variational Methods for Graphical Models
Markov Random Fields Presented by: Vladan Radosavljevic.
Expectation-Maximization & Belief Propagation
Boltzmann Machine (BM) (§6.4)
Text Categorization Berlin Chen 2003 Reference:
September 1, 2009 Tandy Warnow
Analysis of algorithms
Presentation transcript:

Consistent and Efficient Reconstruction of Latent Tree Models Stochastic Systems Group Consistent and Efficient Reconstruction of Latent Tree Models Myung Jin Choi Joint work with Vincent Tan, Anima Anandkumar, and Alan S. Willsky Laboratory for Information and Decision Systems Massachusetts Institute of Technology September 29, 2010

Latent Tree Graphical Models Disney CVS Microsoft Apple Dell

Latent Tree Graphical Models Market Computer Disney CVS Microsoft Computer Equipment Apple Dell

Outline Reconstruction of a latent tree Algorithm 1: Recursive Grouping Algorithm 2: CLGrouping Experimental results

Reconstruction of a Latent Tree Gaussian model: each node – a scalar Gaussian variable Discrete model: each node – a discrete variable with K states Reconstruct a latent tree using samples of the observed nodes.

Minimal Latent Trees (Pearl, 1988) Conditions for Minimal Latent Trees Each hidden node should have at least three neighbors. Any two variables are neither perfectly dependent nor independent.

Desired Properties for Algorithms Consistent for minimal latent trees Correct recovery given exact distributions. Computationally efficient Low sample complexity Good empirical performance

Desired Properties for Algorithms Consistent for minimal latent trees Correct recovery given exact distributions. Computationally efficient Low sample complexity Good empirical performance

Related Work EM-based approaches Phylogenetic trees ZhangKocka04, HarmelingWilliams10, ElidanFriedman05 No consistency guarantees Computationally expensive Phylogenetic trees Neighbor-joining (NJ) method (SaitouNei87)

Information Distance Gaussian distributions

Information Distance Gaussian distributions Discrete distributions ex) Joint probability matrix Marginal probability matrix

Information Distance Gaussian distributions Discrete distributions Algorithms use information distances of observed variables. Assume first that the exact information distances are given. Joint probability matrix Marginal probability matrix

Additivity of Information Distances on Trees

Testing Node Relationships Node j – a leaf node Node i – parent of j for all k ≠ i, j . Can identify (parent, leaf child) pair

Testing Node Relationships Node i and j – leaf nodes and share the same parent (sibling nodes) for all k ≠ i, j . Can identify leaf-sibling pairs.

Recursive Grouping Step 1. Compute for all observed nodes (i, j, k).

Recursive Grouping Step 2. Identify (parent, leaf child) or (leaf siblings) pairs.

Recursive Grouping Step 3. Introduce a hidden parent node for each sibling group without a parent.

Recursive Grouping Step 4. Compute the information distance for new hidden nodes. e.g.)

Recursive Grouping Step 5. Remove the identified child nodes and repeat Steps 2-4.

Recursive Grouping Step 5. Remove the identified child nodes and repeat Steps 2-4.

Recursive Grouping Step 5. Remove the identified child nodes and repeat Steps 2-4.

Recursive Grouping Identifies a group of family nodes at each step. Introduces hidden nodes recursively. Correctly recovers all minimal latent trees. Computational complexity O(diam(T) m3). Worst case O(m4)

CLGrouping Algorithm

Chow-Liu Tree Minimum spanning tree of V using D as edge weights V = set of observed nodes D = information distances

Chow-Liu Tree Minimum spanning tree of V using D as edge weights V = set of observed nodes D = information distances Computational complexity O(m2 log m) For Gaussian models, MST(V; D) = Chow-Liu tree (minimizes KL-divergence to the distribution given by D).

Surrogate Node V = set of observed nodes Surrogate node of i Talk about red arrows!

Property of the Chow-Liu Tree

CLGrouping Algorithm Step 1. Using information distances of observed nodes, construct the Chow-Liu tree, MST(V; D). Identify the set of internal nodes {3, 5}.

CLGrouping Algorithm Step 2. Select an internal node and its neighbors, and apply the recursive-grouping (RG) algorithm.

CLGrouping Algorithm You can see that this procedure recovered hidden variables h1 and h2 whose surrogate node is 5, the internal node that we selected. It can be shown that this holds in general. Whenever we apply recursive grouping to an internal node and its neighbors, we recover the hidden nodes that have the internal node as a surrogate node. Step 3. Replace the output of RG with the sub-tree spanning the neighborhood.

CLGrouping Algorithm Repeat Steps 2-3 until all internal nodes are operated on.

CLGrouping Step 1: Constructs the Chow-Liu tree, MST(V; D). Step 2: For each internal node and its neighbors, applies latent-tree-learning subroutines (RG or NJ). Correctly recovers all minimal latent trees. Computational complexity O(m2 log m + (#internal nodes) (maximum degree)3). O(m2 log m)

Sample-based Algorithms Compute the ML estimates of information distances. Relaxed constraints for testing node relationships. Consistent. More details in the paper

Experimental Results Simulations using Synthetic Datasets Compares RG, NJ, CLRG, and CLNJ. Robinson-Foulds Metric and KL-divergence.

Performance Comparisons For a double star, RG is clearly the best. NJ is poor in recovering HMM. CLGrouping performs well in all three structures. Average running time for CLGrouping < 1 second.

Monthly Stock Returns Monthly returns of 84 companies in S&P 100. Samples from 1990 to 2007. Latent tree learned using CLNJ.

20 Newsgroups with 100 Words 16,242 binary samples of 100 words Latent tree learned using regCLRG. The dataset consists of binary samples of 100 words, indicating whether each word appears in each article or not.

Contributions Recursive-grouping CLGrouping Identifies families and introduces hidden nodes recursively. CLGrouping First learns the Chow-Liu tree Then applies latent-tree-learning subroutines locally. We introduced two algorithms to learn latent trees. Recursive grouping identifies families at each layer of the tree and introduces hidden nodes recursively. CLGrouping first constructs the Chow-Liu tree among observed nodes and then applies other latent-tree-leraning subroutines such as RG or NJ locally to each internal node and its neighbors.

Contributions Recursive-grouping CLGrouping Consistent. CLGrouping - superior experimental results in both accuracy and computational efficiency. Longer version of the paper and MATLAB implementation available at the project webpage. http://people.csail.mit.edu/myungjin/latentTree.html Both algorithms are consistent, and CLGrouping in particular shows superior performance in both accuracy and computational efficiency. In our longer version of the paper, you can see the comparison of our algorithms to other existing methods in more detail, and also see the proofs for all our theoretical results. This paper and MATLAB implementation of algorithms are available on the project website.

Testing Node Relationships constant for all k ≠ i, j .

Edge Contractions Contract edge (h1, x1) Contract edge (h1, x1) and then (h2, x1)

CLGrouping and Edge Contractions Chow-Liu tree = Obtained by contracting each hidden node and its surrogate node. CLGrouping reverses edge contraction operations. (a) (b) (c) (d)

Desired Properties for Algorithms

Desired Properties for Algorithms Consistency Correct recovery given exact distributions.

Desired Properties for Algorithms Consistency Correct recovery given exact distributions. Computational Efficiency

Desired Properties for Algorithms Consistency Correct recovery given exact distributions. Computational Efficiency Sample Complexity

CLGrouping Algorithm In this subtree, h3 has only two neighbors so it will not be recovered. Step 3. Replace the output of RG with the sub-tree spanning the neighborhood.

CLGrouping Algorithm In this subtree, h3 has only two neighbors so it will not be recovered. The remaining tree has 2 hidden nodes and since RG recovers the correct minimal latent tree, this is the output of RG. Step 3. Replace the output of RG with the sub-tree spanning the neighborhood.