Download presentation

Presentation is loading. Please wait.

Published byAdriel Fryar Modified over 2 years ago

1
Ideal Parent Structure Learning School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan with Iftach Nachman and Nir Friedman

2
Problems: Need to score many candidates Each one requires costly parameter optimization Structure learning is often impractical S C E D S C E D S C E D S C E D Learning Structure Data Variables Input: -17.23 -19.19 -23.13 Instances S C E D Output: Init: Start with initial structure Consider local changes 1 Score each candidate 2 Apply best modification 3 The Ideal Parent Approach Approximate improvements of changes (fast) Optimize & score promising candidates (slow)

3
E C P(E| C) D A C E B Linear Gaussian Networks

4
Goal: Score only promising candidates The Ideal Parent Idea Parent Profile Child Profile Instances Pred(X|U) U X

5
Goal: Score only promising candidates The Ideal Parent Idea Ideal Profile Instances Pred(X|U) U X Y Step 1: Compute optimal hypothetical parent Pred(X|U,Y) Instances potential parents Step 2: Search for similar parent Z1Z1 Z2Z2 Z3Z3 Z4Z4 Parent Profile Child Profile

6
Step 3: Add new parent and optimize parameters Goal: Score only promising candidates The Ideal Parent Idea Instances U X Step 1: Compute optimal hypothetical parent Instances potential parents Step 2: Search for similar parent Z1Z1 Z2Z2 Z3Z3 Z4Z4 Pred(X|U,Y) Ideal Profile Y Parent(s) Profile Z2Z2 Predicted(X|U,Z) Child Profile

7
Choosing the best parent Z Our goal: Choose Z that maximizes U X Z U X Likelihood of Theorem: likelihood improvement when only z is optimized y,z Y Z We define:

8
Similarity vs. Score C 2 is more accurate C 1 will be useful later score C 2 Similarity score C 1 Similarity We now have an efficient approximation for the score effect of fixed variance is large

9
Ideal Parent in Search Structure search involves O(N 2 ) Add parent O(NE) Replace parent O(E) Delete parent O(E) Reverse edge S C E D S C E D S C E D S C E D -17.23 -19.19 -23.13 Vast majority of evaluations are replaced by ideal approximation Only K candidates per family are optimized and scored

10
Gene Expression Experiment 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables 0.1 0.2 12345 K test -log-likelihood Amino Metabolism Conditions (AA) Conditions (Met) 0 12345 K 0 1 2 3 4 speedup 12345 K 0.4%-3.6% changes evaluated greedy Speedup: 1.8-2.7

11
Scope Conditional probability distribution (CPD) of the form link function white noise General requirement: g(U) be any invertible (w.r.t u i ) function Linear GaussianChemical ReactionSigmoid Gaussian

12
Problem: No simple form for similarity measures Sigmoid Gaussian CPD 0 2 -4-2 0 2 4 P(X=0.5|Z) Z 0 2 -4-2 0 2 4 P(X=0.85|Z) 0 1 g(z) Z X = 0.5 X = 0.85 0 1 g(z) 0.5 Y(0.5)Y(0.85) -4-2 0 2 4-4-2 0 2 4 Linear approximation around Y=0 Exact Approx Z X Likelihood Solution: Sensitivity to Z depends on gradient of specific instance Z

13
Sigmoid Gaussian CPD -0.86-0.30.270.83 0.04 1.15 2.26 3.37 Z x 0.25 ( g 0.5 ) Z x 0.1275 ( g 0.85 ) -1.85-0.640.581.79 -0.11 1.1 2.31 3.52 Z (X=0.5) Z (X=0.85) Equi-Likelihood PotentialAfter gradient correction We can now use the same measure

14
Sigmoid Gene Expression 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables -0.1 0 0.1 test -log-likelihood 05101520 K Amino Metabolism Conditions (AA) Conditions (Met) greedy 20 60 100 speedup 05101520 K 2.2%-6.1% moves evaluated 18-30 times faster

15
For the Linear Gaussian case: Challenge: Find that maximizes this bound Adding New Hidden Variables Idea Profile Idea: Introduce hidden parent for nodes with similar ideal profiles H X1X1 X2X2 X4X4 X1X1 X2X2 X3X3 X4X4 X5X5 Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y5Y5 Instances

16
where is the matrix whose columns are must lie in the span of is the eigenvector with largest eignevalue Setting and using the above (with A invertible) Scoring a parent Rayleigh quotient of the matrix and. Finding h* amounts to solving an eigenvector problem where |A|=size of cluster

17
X1X1 X2X2 X3X3 X4X4 X1X1 X2X2 X3X3 X4X4 compute only once Compute using X1X1 X2X2 12.35 X1X1 X3X3 14.12 X3X3 X4X4 3.11 Finding the best Cluster

18
X1X1 X2X2 X3X3 X4X4 X1X1 X2X2 X3X3 X4X4 compute only once X1X1 X3X3 X1X1 X3X3 X1X1 X2X2 12.35 X1X1 X3X3 14.12 X3X3 X4X4 3.11 14.12 X1X1 X3X3 X2X2 X2X2 18.45 X4X4 X1X1 X3X3 X2X2 X4X4 16.79 Finding the best Cluster wSelect cluster with highest score wAdd hidden parent and continue with search

19
Bipartite Network Instances from biological expert network with 7 (hidden) parents and 141 (observed) children 10100 -100 -60 -20 test log-likelihood Instances 10100 -60 -40 -20 train log-likelihood Instances Greedy Ideal K=2 Ideal K=5 Gold Speedup is roughly x 10 Greedy takes over 2.5 days!

20
Summary New method for significantly speeding up structure learning in continuous variable networks Offers promising time vs. performance tradeoff Guided insertion of new hidden variables Future work Improve cluster identification for non-linear case Explore additional distributions and relation to GLM Combine the ideal parent approach as plug-in with other search approaches

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google