Download presentation

Presentation is loading. Please wait.

Published byJoan Bound Modified over 2 years ago

1
ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving our ability to learn compact models and complement our earlier work of discovering hidden variables. We describe an approach that utilizes a score-based agglomerative state-clustering. This approach allows us to efficiently evaluate models with a range of cardinality for the hidden variable. We extend our procedure to handle several interacting hidden variable. We demonstrate the effectiveness of this approach by evaluating this on several synthetic and real-life data sets. We show that our approach learns models with hidden variables that generalize better and have better structure then previous approaches. Learning the Dimensionality of Hidden Variables Why is dimensionality important? Representation: The I-mapminimal structure which implies only independencies that hold in the marginal distributionis typically complex Improve Learning: Models with fewer parameters allow us to learn faster and more robustly. not introducing new independencies M-Step: Score & Parameterize Learning: Structural EM Training Data X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 + E-Step: Computation X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 Expected Counts N(X 1 ) N(X 2 ) N(X 3 ) N(H, X 1, X 1, X 3 )... re-iterate with best candidate Bayesian scoring metric: A Bayesian network represents a joint probability over a set of random variables using a DAG : What is a Bayesian Network Abnormality in Chest Visit to Asia Smoking Lung Cancer Tuberculosis Bronchitis X-Ray Dyspnea P(D|A,B) = 0.8 P(D|¬A,B)=0.1 P(D|A, ¬B)=0.1 P(D| ¬ A, ¬B)=0.01 P(X 1,…X n )=P(V)P(S)P(T|V) … P(X|A)P(D|A,B) 1 2 3 Single Hidden Variable 4 5 9 7 h { 1, 2, …, n } h { 1, 2, …, n-1 } X1 X2X3 Y1Y2 H Y3 X1 X2X3 Y1Y2Y3 H Choosing the dimensionality Start with a unique value for each Markov Blanket assignment of the hidden variable Greedily combine two states for maximal score improvement Choose the number of values that correspond to the maximal score Propose a candidate network: (1) Introduce H as a parent of all nodes in S (2) Replace all incoming edges to S by edges to H (3) Remove all inter- S edges (4) Make all children of S children of H if acyclic The FindHidden Algorithm Semi-Clique S with N nodes A hidden variable discovery algorithm (Elidan et al, 2000) that uses structural signatures (approximates cliques) to detect hidden variables. 6 Behavior of the score Efficient computation: N[h i,Pa H ] + N[h j,Pa H ] = N[h ij,Pa H ] and does not depend on other states Complexity reduction increases the score The likelihood of Family H is increased when |H| is smaller The likelihood of Family child(H) is decreased and towards a single state significantly plunges. 8 Several interacting variables Round-robin approach iterates between hidden variables from bottom-up Initialize with a single states to rely only on observable nodes Improvement to complete score guarantees convergence of method Gal Elidan, Nir Friedman Hebrew University {galel, nir}@huji.ac.il Summary and Future Work We introduced the importance of setting the correct dimensionally for hidden variables and implemented a computationally effective agglomerative method to determine the number of states. The algorithm performs well and improves the quality and performance of the models learned when combined with the hidden variable discovery algorithm FindHidden. Future work: Use additional measures to discover hidden variable such as edge confidence, information measures computed directly from the data, etc. Handle hidden variables when the data is sparse Explore hidden variables in Probabilistic Relational Models Integration with FindHidden Log-loss performance of FindHidden with and without agglomeration on test and real-life data. Base line is the performance of the original input network The TB network after FindHidden The TB network after FindHidden with agglomeration 24 Variables in the Alarm network were hidden and the agglomeration methods was applied: Perfect recovery: 15 variables ; Single missing state: 2 variables Extra state: 2 variables. These variables children have stochastic CPDs. The algorithm tries to explain dependencies that arise in a specific training set. 5 variables collapse to a single state. These were redundant (confirmed by aggressive EM). 0.02 0.04 0.06 0.08 0.1 Original FindHiddenwith Agglomeration log-loss (bits/instance) HR LVFAILUREVENTLUNG INTUBATION TBSTOCK NEWS 0.1 0.2 0.3 0.4 0.5 0 x0x1x2x3x4x5x6x7 h1h2h3 h0 x0x1x2x3x4x5x6x7 h1h2h3 h0 x0x1x2x3x4x5x6x7 h1h2h3 h0 True model (h0-h3 have 3,2,4,3 states) Model learned with agglomeration Model learned with binary states x-ray smpros hivpos age Hidden hivres clustered ethnic homeless pob gender disease_site x-ray smpros hivpos age Hidden hivres clusteredethnic homeless pob gender disease_site Agglomeration Tree of the HYPOVOLEMIA node in the alarm network. Leaves show assignments to parents. Each node is numbered according to agglomeration order and shows change in score N,T,LN,F,L N,F,HN,F,N H,F,H L,F,L H,F,LH,F,N L,F,NL,F,H L,T,NH,T,L L,T,L (1) +610.6 (12) –185.5 (9) +10.0 (8) +10.6 (3) +38.4 (11) –19.6 (4) +23.4 (2) +46.4 (10) +5.0 (7) +12.3 (6) +15.6 (5) +17.5

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google