ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving.

Slides:



Advertisements
Similar presentations
1 Spatial processes and statistical modelling Peter Green University of Bristol, UK BCCS GM&CSS 2008/09 Lecture 8.
Advertisements

CS188: Computational Models of Human Behavior
Bayesian network for gene regulatory network construction
CIS: Compound Importance Sampling for Binding Site p-value Estimation The Hebrew University, Jerusalem, Israel Yoseph Barash Gal Elidan Tommy Kaplan Nir.
Pattern Finding and Pattern Discovery in Time Series
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
ABSTRACT: We examine how to detect hidden variables when learning probabilistic models. This problem is crucial for for improving our understanding of.
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,
Ideal Parent Structure Learning School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan with Iftach Nachman and Nir.
1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.
Error-Correcting codes
Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases Paul Beame Jerry Li Sudeepa Roy Dan Suciu University of Washington.
Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
2 x /10/2015 Know Your Facts!. 8 x /10/2015 Know Your Facts!
1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
1 Lecture 5 PRAM Algorithm: Parallel Prefix Parallel Computing Fall 2008.
5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.
Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.
Linear Programming – Simplex Method: Computational Problems Breaking Ties in Selection of Non-Basic Variable – if tie for non-basic variable with largest.
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti.
Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.
Computational Facility Layout
Quiz Number 2 Group 1 – North of Newark Thamer AbuDiak Reynald Benoit Jose Lopez Rosele Lynn Dave Neal Deyanira Pena Professor Kenneth D. Lawerence New.
T-SPaCS – A Two-Level Single-Pass Cache Simulation Methodology + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Wei Zang.
Clustering AMCS/CS 340: Data Mining Xiangliang Zhang
Learning with Missing Data
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
From Variable Elimination to Junction Trees
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
A Brief Introduction to Graphical Models
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Learning Bayesian Networks with Local Structure by Nir Friedman and Moises Goldszmidt.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Lecture 2: Statistical learning primer for biologists
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
Introduction on Graphic Models
Today Graphical Models Representing conditional dependence graphically
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Learning Bayesian Network Models from Data
Bayesian Models in Machine Learning
Chapter 20. Learning and Acting with Bayes Nets
Learning Bayesian networks
Presentation transcript:

ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving our ability to learn compact models and complement our earlier work of discovering hidden variables. We describe an approach that utilizes a score-based agglomerative state-clustering. This approach allows us to efficiently evaluate models with a range of cardinality for the hidden variable. We extend our procedure to handle several interacting hidden variable. We demonstrate the effectiveness of this approach by evaluating this on several synthetic and real-life data sets. We show that our approach learns models with hidden variables that generalize better and have better structure then previous approaches. Learning the Dimensionality of Hidden Variables Why is dimensionality important? Representation: The I-mapminimal structure which implies only independencies that hold in the marginal distributionis typically complex Improve Learning: Models with fewer parameters allow us to learn faster and more robustly. not introducing new independencies M-Step: Score & Parameterize Learning: Structural EM Training Data X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 + E-Step: Computation X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 Expected Counts N(X 1 ) N(X 2 ) N(X 3 ) N(H, X 1, X 1, X 3 )... re-iterate with best candidate Bayesian scoring metric: A Bayesian network represents a joint probability over a set of random variables using a DAG : What is a Bayesian Network Abnormality in Chest Visit to Asia Smoking Lung Cancer Tuberculosis Bronchitis X-Ray Dyspnea P(D|A,B) = 0.8 P(D|¬A,B)=0.1 P(D|A, ¬B)=0.1 P(D| ¬ A, ¬B)=0.01 P(X 1,…X n )=P(V)P(S)P(T|V) … P(X|A)P(D|A,B) Single Hidden Variable h { 1, 2, …, n } h { 1, 2, …, n-1 } X1 X2X3 Y1Y2 H Y3 X1 X2X3 Y1Y2Y3 H Choosing the dimensionality Start with a unique value for each Markov Blanket assignment of the hidden variable Greedily combine two states for maximal score improvement Choose the number of values that correspond to the maximal score Propose a candidate network: (1) Introduce H as a parent of all nodes in S (2) Replace all incoming edges to S by edges to H (3) Remove all inter- S edges (4) Make all children of S children of H if acyclic The FindHidden Algorithm Semi-Clique S with N nodes A hidden variable discovery algorithm (Elidan et al, 2000) that uses structural signatures (approximates cliques) to detect hidden variables. 6 Behavior of the score Efficient computation: N[h i,Pa H ] + N[h j,Pa H ] = N[h ij,Pa H ] and does not depend on other states Complexity reduction increases the score The likelihood of Family H is increased when |H| is smaller The likelihood of Family child(H) is decreased and towards a single state significantly plunges. 8 Several interacting variables Round-robin approach iterates between hidden variables from bottom-up Initialize with a single states to rely only on observable nodes Improvement to complete score guarantees convergence of method Gal Elidan, Nir Friedman Hebrew University {galel, Summary and Future Work We introduced the importance of setting the correct dimensionally for hidden variables and implemented a computationally effective agglomerative method to determine the number of states. The algorithm performs well and improves the quality and performance of the models learned when combined with the hidden variable discovery algorithm FindHidden. Future work: Use additional measures to discover hidden variable such as edge confidence, information measures computed directly from the data, etc. Handle hidden variables when the data is sparse Explore hidden variables in Probabilistic Relational Models Integration with FindHidden Log-loss performance of FindHidden with and without agglomeration on test and real-life data. Base line is the performance of the original input network The TB network after FindHidden The TB network after FindHidden with agglomeration 24 Variables in the Alarm network were hidden and the agglomeration methods was applied: Perfect recovery: 15 variables ; Single missing state: 2 variables Extra state: 2 variables. These variables children have stochastic CPDs. The algorithm tries to explain dependencies that arise in a specific training set. 5 variables collapse to a single state. These were redundant (confirmed by aggressive EM) Original FindHiddenwith Agglomeration log-loss (bits/instance) HR LVFAILUREVENTLUNG INTUBATION TBSTOCK NEWS x0x1x2x3x4x5x6x7 h1h2h3 h0 x0x1x2x3x4x5x6x7 h1h2h3 h0 x0x1x2x3x4x5x6x7 h1h2h3 h0 True model (h0-h3 have 3,2,4,3 states) Model learned with agglomeration Model learned with binary states x-ray smpros hivpos age Hidden hivres clustered ethnic homeless pob gender disease_site x-ray smpros hivpos age Hidden hivres clusteredethnic homeless pob gender disease_site Agglomeration Tree of the HYPOVOLEMIA node in the alarm network. Leaves show assignments to parents. Each node is numbered according to agglomeration order and shows change in score N,T,LN,F,L N,F,HN,F,N H,F,H L,F,L H,F,LH,F,N L,F,NL,F,H L,T,NH,T,L L,T,L (1) (12) –185.5 (9) (8) (3) (11) –19.6 (4) (2) (10) +5.0 (7) (6) (5) +17.5