Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo.

Similar presentations


Presentation on theme: "Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo."— Presentation transcript:

1 Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo Restificar School of EECS Oregon State University

2 First-order Probabilistic Models Combine the expressiveness of first-order logic with the uncertainty modeling of the graphical models Combine the expressiveness of first-order logic with the uncertainty modeling of the graphical models Several formalisms already exist: Several formalisms already exist: Probabilistic Relational Models (PRMs) Probabilistic Relational Models (PRMs) Bayesian Logic Programs (BLPs) Bayesian Logic Programs (BLPs) Stochastic Logic Programs (SLPs) Stochastic Logic Programs (SLPs) Relational Bayesian Networks (RBNs) Relational Bayesian Networks (RBNs) Probabilistic Logic Programs (PLPs), … Probabilistic Logic Programs (PLPs), … Parameter sharing and quantification allow compact representation Parameter sharing and quantification allow compact representation “The project’s difficulty and the project team’s competence influence the project’s success.” “The project’s difficulty and the project team’s competence influence the project’s success.”

3 First-order Probabilistic Models Combine the expressiveness of first-order logic with the uncertainty modeling of the graphical models Combine the expressiveness of first-order logic with the uncertainty modeling of the graphical models Several formalisms already exist: Several formalisms already exist: Probabilistic Relational Models (PRMs) Probabilistic Relational Models (PRMs) Bayesian Logic Programs (BLPs) Bayesian Logic Programs (BLPs) Stochastic Logic Programs (SLPs) Stochastic Logic Programs (SLPs) Relational Bayesian Networks (RBNs) Relational Bayesian Networks (RBNs) Probabilistic Logic Programs (PLPs), … Probabilistic Logic Programs (PLPs), … Parameter sharing and quantification allow compact representation Parameter sharing and quantification allow compact representation The

4 Multiple Parents Problem Often multiple objects are related to an object by the same relationship Often multiple objects are related to an object by the same relationship One’s friend’s drinking habits influence one’s own One’s friend’s drinking habits influence one’s own A student’s GPA depends on the grades in the courses he takes A student’s GPA depends on the grades in the courses he takes The size of a mosquito population depends on the temperature and the rainfall each day since the last freeze The size of a mosquito population depends on the temperature and the rainfall each day since the last freeze The target variable in each of these statements The target variable in each of these statements has multiple influents (“parents” in Bayes net jargon) has multiple influents (“parents” in Bayes net jargon)

5 Population Rain1Temp1Rain2Temp2Rain3Temp3 Multiple Parents for population ■ Variable number of parents ■ Large number of parents ■ Need for compact parameterization

6 Solution 1: Aggregators Population Rain1Temp1Rain2Temp2Rain3Temp3 AverageRainAverageTemp Deterministi c Problem: Does not take into account the interaction between related parents Rain and Temp Stochastic

7 Solution 2: Combining Rules Population Rain1Temp1Rain2Temp2Rain3Temp3 Population3Population1 Population2 Top 3 distributions share parameters The 3 distributions are combined into one final distribution

8 Outline First-order Conditional Influence Language First-order Conditional Influence Language Learning the parameters of Combining Rules Learning the parameters of Combining Rules Experiments and Results Experiments and Results

9 First Order Conditional Influence Language First Order Conditional Influence Language Learning the parameters of Combining Rules Learning the parameters of Combining Rules Experiments and Results Experiments and Results

10 First-order Conditional Influence Language (FOCIL) Task and role of a document influence its folder Task and role of a document influence its folder if {task(t), doc(d), role(d,r,t)} then r.id, t.id Qinf d.folder. if {task(t), doc(d), role(d,r,t)} then r.id, t.id Qinf d.folder. The folder of the source of the document influences the folder of the document The folder of the source of the document influences the folder of the document if {doc(d1), doc(d2), source(d1,d2)} then d1.folder Qinf d2.folder if {doc(d1), doc(d2), source(d1,d2)} then d1.folder Qinf d2.folder The difficulty of the course and the intelligence of the student influence his/her GPA The difficulty of the course and the intelligence of the student influence his/her GPA if (student(s), course(c), takes(s,c))} then s.IQ, c.difficulty Qinf s.gpa ) if (student(s), course(c), takes(s,c))} then s.IQ, c.difficulty Qinf s.gpa )

11 Relationship to Other Formalisms Shares many of the same properties as other statistical relational models. Shares many of the same properties as other statistical relational models. Generalizes path expressions in probabilistic relational Generalizes path expressions in probabilistic relational models to arbitrary conjunctions of literals. models to arbitrary conjunctions of literals. Unlike BLPs, explicitly distinguishes between conditions, which do not allow uncertainty, and influents, which do. Unlike BLPs, explicitly distinguishes between conditions, which do not allow uncertainty, and influents, which do. Monotonicity relationships can be specified. Monotonicity relationships can be specified. if {person(p)} then p.age Q+ p.height if {person(p)} then p.age Q+ p.height

12 Combining Multiple Instances of a Single Statement If {task(t), doc(d), role(d,r,t)} then If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder t.id, r.id Qinf (Mean) d.folder t1.id d.folder Mean r1.id t2.id r2.id

13 A Different FOCIL Statement for the Same Target Variable If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder s.folder Qinf (Mean) d.folder d.folder s2.folder d.folder Mean s1.folder

14 Combining Multiple Statements Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder t.id, r.id Qinf (Mean) d.folder If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder s.folder Qinf (Mean) d.folder}

15 “Unrolled” Network for Folder Prediction t1.id d.folder Weighted Mean d.folder s2.folder d.folder Mean r1.id t2.id r2.id s1.folder

16 First Order Conditional Influence Language First Order Conditional Influence Language Learning the parameters of Combining Rules Learning the parameters of Combining Rules Experiments and Results Experiments and Results

17 X 1 1,1 X 1 1,k … 1 X 1 2,1 X 1 2,k … 2 … X 1 m1,k … m1m1 Mean X 2 1,1 X 2 1,k … 1 X 2 2,1 X 2 2,k … 2 … X 2 m2,k … m2m2 Mean Weighted mean Rule1Rule2 Y General Unrolled Network

18 Gradient Descent for Squared Error Squared error Squared error where

19 Gradient Descent for Loglikelihood Loglikelihood Loglikelihood, where

20 Learning the weights Mean Squared Error Mean Squared Error Loglikelihood Loglikelihood

21 X 1 1,1 X 1 1,k … 1 … X 1 m1,k … m1m1 Mean X 2 1,1 X 2 1,k … 1 … X 2 m2,k … m2m2 Mean Weighted mean w1w1 w2w2 Y Expectation-Maximization 1111  1 m1 2121  2 m2 1/m 1 1/m 2

22 EM learning Expectation-step: Compute the responsibilities of each instance of each rule Expectation-step: Compute the responsibilities of each instance of each rule Maximization-step: Compute the maximum likelihood parameters using responsibilities as the counts Maximization-step: Compute the maximum likelihood parameters using responsibilities as the counts where n is the # of examples with 2 or more rules instantiated

23 First Order Conditional Influence Language First Order Conditional Influence Language Learning the parameters of Combining Rules Learning the parameters of Combining Rules Experiments and Results Experiments and Results

24 Experimental Setup 500 documents, 6 tasks, 2 roles, 11 folders 500 documents, 6 tasks, 2 roles, 11 folders Each document typically has 1-2 task-role pairs Each document typically has 1-2 task-role pairs 25% of documents have a source folder 25% of documents have a source folder 10-fold cross validation 10-fold cross validation Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder. If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder. }

25 Folder prediction task Mean reciprocal rank – Mean reciprocal rank – where n i is the number of times the true folder was ranked as i where n i is the number of times the true folder was ranked as i Propositional classifiers: Propositional classifiers: Decision trees and Naïve Bayes Decision trees and Naïve Bayes Features are the number of occurrences of each task-role pair and source document folder Features are the number of occurrences of each task-role pair and source document folder

26 RankEM GD- MS GD-LLJ48NB 1349354346351326 210798113100110 32226182834 4151215619 564464 600300 714120 802001 900061 1000000 1100005 MRR0.82990.83250.82740.82790.797

27 Learning the weights Original dataset : 2 nd rule has more weight ) it is more predictive when both rules are applicable Original dataset : 2 nd rule has more weight ) it is more predictive when both rules are applicable Modified dataset : The folder names of all the sources were randomized ) 2 nd rule is made ineffective ) weight of Modified dataset : The folder names of all the sources were randomized ) 2 nd rule is made ineffective ) weight of the 2 nd rule decreases the 2 nd rule decreases EMGD-MSGD-LL Original data set Weights h.15,.85 i h.22,.78 i h.05,.95 i Score.8299.8325.8274 Modified data set Weights h.9,.1 i h.84,.16 i h 1,0 i Score.7934.8021.7939

28 Lessons from Real-world Data The propositional learners are almost as good as the first-order learners in this domain! The propositional learners are almost as good as the first-order learners in this domain! The number of parents is 1-2 in this domain The number of parents is 1-2 in this domain About ¾ of the time only one rule is applicable About ¾ of the time only one rule is applicable Ranking of probabilities is easy in this case Ranking of probabilities is easy in this case Accurate modeling of the probabilities is needed Accurate modeling of the probabilities is needed Making predictions that combine with other predictions Making predictions that combine with other predictions Cost-sensitive decision making Cost-sensitive decision making

29 2 rules with 2 inputs each: W rule1 = 0.1,W rule2 = 0.9 2 rules with 2 inputs each: W rule1 = 0.1,W rule2 = 0.9 Probability that an example matches a rule =.5 Probability that an example matches a rule =.5 If an example matches a rule, the number of instances is 3 - 10 If an example matches a rule, the number of instances is 3 - 10 Performance metric: average absolute error in predicted probability Performance metric: average absolute error in predicted probability Synthetic Data Set

30 Synthetic Data Set - Results

31 Synthetic Data Set GDMS

32 Synthetic Data Set GDLL

33 Synthetic Data Set EM

34 Conclusions Introduced a general instance of multiple parents problem in first-order probabilistic languages Introduced a general instance of multiple parents problem in first-order probabilistic languages Gradient descent and EM successfully learn the parameters of the conditional distributions as well as the parameters of the combining rules (weights) Gradient descent and EM successfully learn the parameters of the conditional distributions as well as the parameters of the combining rules (weights) First order methods significantly outperform propositional methods in modeling the distributions when the number of parents ¸ 3 First order methods significantly outperform propositional methods in modeling the distributions when the number of parents ¸ 3

35 Future Work We plan to extend these results to more general classes of combining rules We plan to extend these results to more general classes of combining rules Develop efficient inference algorithms with combining rules Develop efficient inference algorithms with combining rules Develop compelling applications Develop compelling applications Combining rules and aggregators Combining rules and aggregators Can they both be understood as instances of causal independence? Can they both be understood as instances of causal independence?


Download ppt "Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo."

Similar presentations


Ads by Google