Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.

Similar presentations


Presentation on theme: "Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman."— Presentation transcript:

1 Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman (Hebrew U.)

2 Learning Bayesian Networks Density estimation Model data distribution in population Probabilistic inference: Prediction Classification Dependency structure Interactions between variables Causality Scientific discovery Data INTL MSFT MOT NVLS

3 Stock Market Learn dependency of stock prices as a function of Global influencing factors Sector influencing factors Price of other major stocks Mar.’02May.’02Aug.’02Oct.’02Jan.’03 Jan.’02 MSFT DELL INTL NVLS MOTI 10 20 30 40 50 60 70 MSFT DELL INTL NVLS MOT

4 Stock Market Learn dependency of stock prices as a function of Global influencing factors Sector influencing factors Price of other major stocks Mar.’02May.’02Aug.’02Oct.’02Jan.’03 Jan.’02 MSFT DELL INTL NVLS MOTI 10 20 30 40 50 60 70 DELL INTL NVLS MOT MSFT

5 Stock Market Learn dependency of stock prices as a function of Global influencing factors Sector influencing factors Price of other major stocks Mar.’02May.’02Aug.’02Oct.’02Jan.’03 Jan.’02 MSFT DELL INTL NVLS MOTI 10 20 30 40 50 60 70 INTL MSFT DELL NVLS MOT Bayesian Network

6 Fragment of learned BN Stock Market 4411 stocks (variables) 273 trading days (instances) from Jan.’02 – Mar.’03 Problems Statistical robustness Interpretability

7 Key Observation Many stocks depend on the same influencing factors in much the same way Example: Intel, Novelus, Motorola, Dell depend on the price of Microsoft Many other domains with similar characteristics Gene expression Collaborative filtering Computer network performance … Mar.’02May.’02Aug.’02Oct.’02Jan.’03 Jan.’02 MSFT DELL INTL NVLS MOTI 10 20 30 40 50 60 70

8 INTL MSFT MOT DELL AMAT HPQ CPD 2 CPD 1 CPD 3 Bayesian Network The Module Network Idea CPD 6 CPD 3 CPD 5 CPD 1 CPD 2 CPD 4 INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I Module Network

9 Problems and Solutions Statistical robustness Interpretability Share parameters and dependencies between variables with similar behavior Explicit modeling of modular structure

10 Outline Module Network Probabilistic model Learning the model Experimental results

11 Module Network Components Module Assignment Function A(MSFT)=M I A(MOT)=A(DELL)=A(INTL) =M II A(AMAT)= A(HPQ)=M III INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I INTL MSFT MOT DELL AMAT HPQ

12 Module Network Components Module Assignment Function Set of parents for each module Pa(M I )=  Pa(M II )={MSFT} Pa(M III )={DELL, INTL} INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I

13 Module Network Components Module Assignment Function Set of parents for each module CPD template for each module INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I

14 Ground Bayesian Network A module network induces a ground BN over X A module network defines a coherent probabilty distribution over X if the ground BN is acyclic INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I INTL MSFT MOT DELL AMAT HPQ Ground Bayesian Network

15 Module Graph Nodes correspond to modules M i  M j if at least one variable in M i is a parent of M j INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I MIMI M II M III Module graph Theorem: The ground BN is acyclic if the module graph is acyclic Acyclicity checked efficiently using the module graph

16 Outline Module Network Probabilistic model Learning the model Experimental results

17 Learning Overview Given data D, find assignment function A and structure S that maximize the Bayesian score Marginal data likelihood Data likelihood Parameter prior Marginal likelihood Assignment / structure prior

18 Instance 3 Likelihood Function Module III Module II Module I INTL MSFT MOT DELL AMAT HPQ Instance 1 Instance 2 MIMI  M II |MSFT  M III |DELL,INTL Sufficient statistics of (X,Y) Likelihood function decomposes by modules

19 Bayesian Score Decomposition Bayesian score decomposes by modules INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I Delete INTL  Module III Module j variablesModule j parents

20 Bayesian Score Decomposition Bayesian score decomposes by modules INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I A(MOT)=2  A(MOT)=1

21 Algorithm Overview Find assignment function A and structure S that maximize the Bayesian score Dependency structure S Improve structure Improve assignments Find initial assignment A Assignment function A

22 Initial Assignment Function x[1] AMAT MSFTDELL MOT HPQ INTL x[2] x[3] x[4] Variables (stocks) Instances (trading days) Find variables that are similar across instances A(MOT)= M II A(INTL)= M II A(DELL)= M II MSFT MOTHPQ AMAT DELL INTL 123

23 Algorithm Overview Find assignment function A and structure S that maximize the Bayesian score Dependency structure S Improve structure Improve assignments Find initial assignment A Assignment function A

24 Learning Dependency Structure Heuristic search with operators Add/delete parent for module Cannot reverse edges Handle acyclicity Can be checked efficiently on the module graph Efficient computation After applying operator for module M j, only update score of operators for module M j INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I MIMI M II M III X INTL  Module I INTL  Module III X  MSFT  Module II

25 Learning Dependency Structure Structure search done at module level Parent selection Reduced search space relative to BN Acyclicity checking Individual variables only used for computation of sufficient statistics

26 Algorithm Overview Find assignment function A and structure S that maximize the Bayesian score Dependency structure S Improve structure Improve assignments Find initial assignment A Assignment function A

27 Learning Assignment Function A(DELL)=M I Score: 0.7 INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I DELL

28 Learning Assignment Function A(DELL)=M I Score: 0.7 A(DELL)=M II Score: 0.9 INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I DELL

29 Learning Assignment Function A(DELL)=M I Score: 0.7 A(DELL)=M II Score: 0.9 A(DELL)=M III Score: cyclic! INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I DELL

30 Learning Assignment Function A(DELL)=M I Score: 0.7 A(DELL)=M II Score: 0.9 A(DELL)=M III Score: cyclic! INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I

31 Ideal Algorithm Learn the module assignment of all variables simultaneously

32 Problem Due to acyclicity cannot optimize assignment for variables separately DELL Module I AMAT Module III MSFT Module II HPQ Module IV MIMI M II M III Module Network Module graph M IV A(DELL)=Module IV A(MSFT)=Module III DELL MSFT DELL

33 Problem Due to acyclicity cannot optimize assignment for variables separately DELL Module I AMAT Module III MSFT Module II HPQ Module IV MIMI M II M III Module Network Module graph M IV A(DELL)=Module IV A(MSFT)=Module III DELL MSFT DELL

34 Learning Assignment Function Sequential update algorithm Iterate over all variables For each variable, find its optimal assignment given the current assignment to all other variables Efficient computation When changing assignment from M i to M j, only need to recompute score for modules i and j

35 Learning the Model Initialize module assignment A Optimize structure S Optimize module assignment A For each variable, find its optimal assignment given the current assignment to all other variables INTL MSFT MOT DELL AMAT HPQ Module III Module II Module I INTL MSFT MOT DELL AMAT HPQ MOT

36 Related Work Bayesian networks Parameter sharing PRMs OOBNs Module Networks Shared structure X  X X  Shared parameters   X   Learn parameter sharing X X  Langseth+al N/A Learn structure X   

37 Outline Module Network Probabilistic model Learning the model Experimental results Statistical validation Case study: Gene regulation

38 Learning Algorithm Performance -131 -130 -129 -128 05101520 Bayesian score (avg. per gene) Algorithm iterations Structure change iterations

39 -800 -750 -700 -650 -600 -550 -500 -450 020406080100120140160180200 Test data likelihood (per instance) Number of modules 25 instances 50 instances 100 instances 200 instances 500 instances Generalization to Test Data Synthetic data: 10 modules, 500 variables Best performance achieved for models with 10 modules

40 Generalization to Test Data Test data likelihood (per instance) Number of modules Synthetic data: 10 modules, 500 variables 25 instances 50 instances 100 instances 200 instances 500 instances Gain beyond 100 instances is small

41 Structure Recovery Graph Synthetic data: 10 modules, 500 variables Number of modules Recovered structure (% correct) 25 instances 50 instances 200 instances 500 instances 100 instances 74% of 2250 parent- child relationships recovered

42 Stock Market 4411 variables (stocks), 273 instances (trading days) Comparison to Bayesian networks (cross validation) Test Data Log-Likelihood (gain per instance) Number of modules 400 450 500 550 600 050100150200250300 0 Bayesian network performance

43 Regulatory Networks Learn structure of regulatory networks: Which genes are regulated by each regulator

44 Gene Expression Data Measures mRNA level for all genes in one condition Learn dependency of the expression of genes as a function of expression of regulators Experiments Genes Induced Repressed

45 Gene Expression 2355 variables (genes), 173 instances (arrays) Comparison to Bayesian networks Test Data Log-Likelihood (gain per instance) Number of modules -150 -100 -50 0 50 100 150 0 100200300400500 Bayesian network performance

46 Biological Evaluation Find sets of co-regulated genes (regulatory module) Find the regulators of each module Segal et al., Nature Genetics, 2003 46/50 30/50

47 Experimental Design Hypothesis: Regulator ‘X’ activates process ‘Y’ Experiment: Knock out ‘X’ and repeat experiment HAP4  Ypl230W true false true false X ? Segal et al., Nature Genetics, 2003

48 wt   Ypl230w 0 3 5 7 9 24 0 2 5 7 9 24 (hrs.) >16x 341 differentially expressed genes 0 7 15 30 60 wt  (min.)  Ppt1 >4x 602 0 5 15 30 60 wt  (min.)  Kin82 >4x 281 Differentially Expressed Genes Segal et al., Nature Genetics, 2003

49 Were the differentially expressed genes predicted as targets? Rank modules by enrichment for diff. expressed genes #ModuleSignificance 14 Ribosomal and phosphate metabolism 8/32, 9e3 11 Amino acid and purine metabolism11/53, 1e2 15 mRNA, rRNA and tRNA processing 9/43, 2e2 39 Protein folding6/23, 2e2 30 Cell cycle7/30, 2e2  Ppt1 # ModuleSignificance 39Protein folding7/23, 1e-4 29Cell differentiation6/41, 2e-2 5 Glycolysis and folding5/37, 4e-2 34Mitochondrial and protein fate 5/37, 4e-2  Ypl230w #ModuleSignificance 3Energy and osmotic stress I8/31, 1e4 2Energy, osmolarity & cAMP signaling9/64, 6e3 15 mRNA, rRNA and tRNA processing6/43, 2e2  Kin82 Biological Experiments Validation All regulators regulate predicted modules Segal et al., Nature Genetics, 2003

50 Summary Probabilistic model for learning modules of variables and their structural dependencies Improved performance over Bayesian networks Statistical robustness Interpretability Application to gene regulation Reconstruction of many known regulatory modules Prediction of targets for unknown regulators

51


Download ppt "Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman."

Similar presentations


Ads by Google