Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao.

Similar presentations


Presentation on theme: "1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao."— Presentation transcript:

1 1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao (Siemens Medical Solutions) ~ Thesis Defense ~ Stefan Niculescu Carnegie Mellon University, July 2005

2 2 Domain Knowledge In real world, often data is too sparse to allow building of an accurate model Domain knowledge can help alleviate this problem Several types of domain knowledge: –Relevance of variables (feature selection) –Conditional Independences among variables –Parameter Domain Knowledge

3 3 Parameter Domain Knowledge In a Bayesian Network for a real world domain: –can have huge number of parameters –not enough data to estimate them accurately Parameter Domain Knowledge constraints: –reduce the space of feasible parameters –reduce the variance of parameter estimates

4 4 Parameter Domain Knowledge Examples: DK: “If a person has a Family history of Heart Attack, Race and Pollution are not significant factors for the probability of getting a Heart Attack.” DK: “Two voxels in the brain may exhibit the same activation patterns during a cognitive task, but with different amplitudes.” DK: “Two countries may have different Heart Disease rates, but the relative proportion of Heart Attack to CHF is the same.” DK: “The aggregate probability of Adverbs in English is less than the aggregate probability of Verbs”.

5 5 Thesis Standard methods for performing parameter estimation in Bayesian Networks can be naturally extended to take advantage of parameter domain knowledge that can be provided by a domain expert. These new learning algorithms perform better (in terms of probability density estimation) than existing ones.

6 6 Outline Motivation  Parameter Domain Knowledge Framework Simple Parameter Sharing Parameter Sharing in Hidden Process Models Types of Parameter Domain Knowledge Related Work Summary / Future Work

7 7 Parameter Domain Knowledge Framework ~ Domain Knowledge Constraints ~

8 8 Parameter Domain Knowledge Framework ~ Frequentist Approach, Complete Data ~

9 9

10 10 Parameter Domain Knowledge Framework ~ Frequentist Approach, Incomplete Data ~ EM Algorithm. Repeat until convergence:

11 11 Parameter Domain Knowledge Framework ~ Frequentist Approach, Incomplete Data ~ ~ Discrete Variables ~ EM Algorithm. Repeat until convergence:

12 12 Parameter Domain Knowledge Framework ~ Bayesian Approach ~

13 13 Parameter Domain Knowledge Framework ~ Bayesian Approach ~

14 14 Parameter Domain Knowledge Framework ~ Computing the Normalization Constant ~

15 15 Parameter Domain Knowledge Framework ~ Computing the Normalization Constant ~ In H 7 : ε = 0.5 H(2)

16 16 Outline Motivation Parameter Domain Knowledge Framework  Simple Parameter Sharing Parameter Sharing in Hidden Process Models Types of Parameter Domain Knowledge Related Work Summary / Future Work

17 17 Simple Parameter Sharing ~ Maximum Likelihood Estimators ~ Theorem. The Maximum Likelihood parameters are given by: Total: k i places Cubical Die – cut symmetrically at each corner k 1 =6 k 2 =8

18 18 Simple Parameter Sharing ~ Dependent Dirichlet Priors ~

19 19 Simple Parameter Sharing ~ Variance Reduction in Parameter Estimates ~

20 20 Simple Parameter Sharing ~ Experiments – Learning a Probability Distribution ~ Synthetic Dataset: –Probability distribution over 50 values –50 randomly generated parameters: 6 shared between 2 and 5 times to count as half The rest “not shared” (shared exactly once) –1000 examples sampled from this distribution –Purpose: Domain Knowledge readily available To be able to study the effect of training set size (up to 1000) To be able to compare our estimated distribution to the true distribution Models: –STBN ( Standard Bayesian Network ) –PDKBN ( Bayesian Network with PDK )

21 21 Experimental Results The difference between PDKBN and STBN shrinks when the size of training set increases, but PDKBN is much better when training data is scarce. PDKBN performs better than STBN – Largest difference: 0.05 (30 ex) On average, STBN needs 1.86 times more examples to catch up in KL !!! 40 (PDKBN) ~ 103 (STBN) 200 (PDKBN) ~ 516 (STBN) 650 (PDKBN) ~ >1000 (STBN)

22 22 Outline Motivation Parameter Domain Knowledge Framework Simple Parameter Sharing  Parameter Sharing in Hidden Process Models Types of Parameter Domain Knowledge Related Work Summary / Future Work

23 23 Hidden Process Models One observation (trial): N different trials: All trials and all Processes have equal length T

24 24 Parameter Sharing in HPMs similar shape activity different amplitudes XvXv

25 25 Parameter Sharing in HPMs ~ Maximum Likelihood Estimation ~ l’(P,C) quadratic in (P,C), but linear in P ! linear in C !

26 26 Parameter Sharing in HPMs ~ Maximum Likelihood Estimation ~

27 27 Starplus Dataset Trial: –read sentence –view picture –answer whether sentence describes picture 40 trials – 32 time slices (2/sec) –picture presented first in half of trials –sentence first in the other half Three possible objects: star, dollar, plus Collected by Just et al. IDEA: model using HPMs with two processes: –“Sentence” and “Picture” –We assume a process starts when stimulus is presented –Will use Shared HPMs where possible

28 28 It is true that the star is above the plus?

29 29

30 30 + --- *

31 31

32 32 Parameter Sharing in HPMs ~ Hierarchical Partitioning Algorithm ~

33 33 Parameter Sharing in HPMs ~ Experiments ~ We compare three models: –Based on Average (per trial) Likelihood –StHPM – Standard, per voxel HPM –ShHPM – One HPM for all voxels in an ROI (24 total) –HieHPM – Hierarchical HPM Effect of training set size (6 to 40) in CALC: –ShHPM biased here Better than StHPM at small sample size Worse at 40 examples – HieHPM – the best It can represent both models e 106 times better data likelihood than StHPM at 40 examples StHPM needs 2.9 times more examples to catch up

34 34 Parameter Sharing in HPMs ~ Experiments ~ Performance over whole brain (40 examples): –HieHPM – the best e 1792 times better data likelihood than StHPM Better than StHPM in 23/24 ROIs Better than ShHPM in 12/24 ROIs, equal in 11/24 –ShHPM – second best e 464 times better data likelihood than StHPM Better than StHPM in 18/24 ROIs It has bias, but makes sense to share whole ROIs not involved in the cognitive task

35 35 Learned Voxel Clusters In the whole brain:  ~ 300 clusters  ~ 15 voxels / cluster In CALC:  ~ 60 clusters  ~ 5 voxels / cluster

36 36 Sentence Process in CALC

37 37 Outline Motivation Parameter Domain Knowledge Framework Simple Parameter Sharing Parameter Sharing in Hidden Process Models  Types of Parameter Domain Knowledge Related Work Summary / Future Work

38 38 Parameter Domain Knowledge Types DISCRETE: Known Parameter Values Parameter Sharing and Proportionality Constants – One Distribution Sum Sharing and Ratio Sharing – One Distribution Parameter Sharing and Hierarchical Sharing – Multiple Distributions Sum Sharing and Ratio Sharing – Multiple Distributions CONTINUOUS (Gaussian Distributions): Parameter Sharing and Proportionality Constants – One Distribution Parameter Sharing in Hidden Process Models INEQUALITY CONSTRAINTS: Between Sums of Parameters – One Distribution Upper Bounds on Sums of Parameters – One Distribution

39 39 Probability Ratio Sharing Want to model P(Word|Language) Two languages: English, Spanish Different sets of words Domain Knowledge: Word groups: About computers: computer, keyboard, monitor, etc Relative frequency of “computer” to “keyboard” same in both languages Aggregate mass can be different T 1 Computer Words T 2 Business Words

40 40 Probability Ratio Sharing DK: Parameters of a given color preserve their relative ratios across all distributions!...

41 41 Proportionality Constants for Gaussians

42 42 Inequalities between Sums of Parameters  In spoken language: Each Adverb comes along with a Verb Each Adjective comes with a Noun or Pronoun  Therefore it is reasonable to expect that: The frequency of Adverbs is less than that of Verbs The frequency of Adjectives is less than that of Nouns and Pronouns Equivalently:  In general, within the same distribution:

43 43 Outline Motivation Parameter Domain Knowledge Framework Simple Parameter Sharing Parameter Sharing in Hidden Process Models Types of Parameter Domain Knowledge  Related Work Summary / Future Work

44 44 Dirichlet Priors in a Bayes Net Prior Belief Spread The Domain Expert specifies an assignment of parameters. – leaves room for some error (Variance). Several types: – Standard – Dirichlet Tree Priors – Dependent Dirichlet

45 45 Markov Models...

46 46 Module Networks In a Module: Same parents Same CPTs Image from “Learning Module Networks” by Eran Segal and Daphne Koller

47 47 Context Specific Independence Alarm Set Burglary

48 48 Limitations of Current Models Dirichlet priors –When the number of parameters is huge, specifying a useful prior is difficult –Unable to enforce even simple constraints:  Need additional hyperparameters to enforce basic parameter sharing, but no closed form MAP estimates can be computed ! –Dependent Dirichlet Priors are not conjugate priors Our priors are dependent and also conjugate !!! Markov Models, Module Networks and CSI –Particular cases of our Parameter Sharing DK –Do not allow sharing at parameter level of granularity

49 49 Outline Motivation Parameter Domain Knowledge Framework Simple Parameter Sharing Parameter Sharing in Hidden Process Models Types of Parameter Domain Knowledge Related Work  Summary / Future Work

50 50 Summary Parameter Related Domain Knowledge is needed when data is scarce –Reduces the number of free parameters –Reduces the variance in parameter estimates (illustrated on Simple Parameter Sharing) Developed unified Parameter Domain Knowledge Framework –From both a frequentist and Bayesian point of view –From both complete and incomplete data Developed efficient learning algorithms for several types of PDK: –Closed form solutions for most of these types –For both discrete and continuous variables –For both equality and inequality constraints –Particular cases of our parameter sharing framework: Markov Models, Module Nets, Context Specific Independence Developed method of automatically learning the domain knowledge (illustrated on HPMs) Experiments show the superiority of models using PDK

51 51 Future Work Interactions among different types of Parameter Domain Knowledge Incorporate Parameter Domain Knowledge in Structure Learning Hard vs. Soft constraints Parameter Domain Knowledge for learning Undirected Graphical Models

52 52 Questions ?

53 53 THE END


Download ppt "1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao."

Similar presentations


Ads by Google