Sriraam Natarajan Introduction to Probabilistic Logical Models Slides based on tutorials by Kristian Kersting, James Cussens, Lise Getoor & Pedro Domingos.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

CS188: Computational Models of Human Behavior
Big Ideas in Cmput366. Search Blind Search State space representation Iterative deepening Heuristic Search A*, f(n)=g(n)+h(n), admissible heuristics Local.
Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise.
Markov Networks Alan Ritter.
A Tutorial on Learning with Bayesian Networks
Efficient Inference Methods for Probabilistic Logical Models
Representations for KBS: Uncertainty & Decision Support
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 March, 25, 2015 Slide source: from Pedro Domingos UW.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
1 Logical Bayesian Networks A knowledge representation view on Probabilistic Logical Models Daan Fierens, Hendrik Blockeel, Jan Ramon, Maurice Bruynooghe.
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Apr, 8, 2015 Slide source: from David Page (MIT) (which were.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.
Bayesian Belief Networks
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic DCGs and SLPs Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Learning Models of Relational Stochastic Processes Sumit Sanghai.
ICML-Tutorial, Banff, Canada, 2004 Overview 1.Introduction to PLL 2.Foundations of PLL –Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic.
A Brief Introduction to Graphical Models
Practical Probabilistic Relational Learning Sriraam Natarajan.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Markov Logic And other SRL Approaches
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 34 Dec, 2, 2015 Slide source: from David Page (MIT) (which were.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 Nov, 23, 2015 Slide source: from Pedro Domingos UW.
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Introduction on Graphic Models
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Learning Bayesian Networks for Complex Relational Data
Oliver Schulte Machine Learning 726
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 29
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32
Class #19 – Tuesday, November 3
Presentation transcript:

Sriraam Natarajan Introduction to Probabilistic Logical Models Slides based on tutorials by Kristian Kersting, James Cussens, Lise Getoor & Pedro Domingos

Take-Away Message Learn from rich, highly structured data Progress to date Burgeoning research area “Close enough” to goal Easy-to-use open-source software available Lots of Challenges/Problems in the future

Introduction Introduction Probabilistic Logic Models Probabilistic Logic Models Directed vs Undirected Models Directed vs Undirected Models Learning Learning Conclusion Conclusion Outline

Introduction Introduction Probabilistic Logic Models Probabilistic Logic Models Directed vs Undirected Models Directed vs Undirected Models Learning Learning Conclusion Conclusion

Motivation Most learners assume i.i.d. data (independent and identically distributed) Most learners assume i.i.d. data (independent and identically distributed) –One type of object –Objects have no relation to each other To predict if the image is “eclipse” To predict if the image is “eclipse”

Real-World Data (Dramatically Simplified) PatientID Gender Birthdate P1 M 3/22/63 PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Date Lab Test Result P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months Non- i.i.d Multi- Relational Solution: First-Order Logic / Relational Databases Shared Parameters

The World is inherently Uncertain Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution FeverAche Influenza Random Variables Direct Influences Propositional Model!

Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Logic Probabilities Add Probabilities Add Relations Statistical Relational Learning (SRL) Uncertainty in SRL Models is captured by probabilities, weights or potential functions

A (very) Brief History Probabilistic Logic term coined by Nilsson in 1986 Probabilistic Logic term coined by Nilsson in 1986 Considered the “probabilistic entailment” i.e., the probabilities of all sentences between 0 and 1 Considered the “probabilistic entailment” i.e., the probabilities of all sentences between 0 and 1 Earlier work by (Halpern, Bacchus and others) focused on the representation and not learning Earlier work by (Halpern, Bacchus and others) focused on the representation and not learning Niem and Haddawy (1995) – one of the earlier approaches Niem and Haddawy (1995) – one of the earlier approaches Late 90’s: OOBN, PRM, PRISM, SLP etc Late 90’s: OOBN, PRM, PRISM, SLP etc ‘00- ‘05 : Plethora of approaches (representation) ‘00- ‘05 : Plethora of approaches (representation) Learning methods (since ‘01) Learning methods (since ‘01) Recent thrust – Inference (Lifted Inference techniques) Recent thrust – Inference (Lifted Inference techniques)

Several SRL formalisms => Endless Possibilities  Web data (web)  Biological data (bio)  Social Network Analysis (soc)  Bibliographic data (cite)  Epidimiological data (epi)  Communication data (comm)  Customer networks (cust)  Collaborative filtering problems (cf)  Trust networks (trust)  Reinforcement Learning  Natural Language Processing  SAT …

(Propositional) Logic Program – 1-slide Intro Clauses: IF burglary and earthquake are true THEN alarm is true burglary. earthquake. alarm :- burglary, earthquake. marycalls :- alarm. johncalls :- alarm. Herbrand Base (HB) = all atoms in the program burglary, earthquake, alarm, marycalls, johncalls Program atom body head

Logic Programming (LP) Logic Programming (LP) 2 views: 2 views: 1)Model-Theoretic 2)Proof-Theoretic

Model Theoretic View Logic Program restricts the set of possible worlds Logic Program restricts the set of possible worlds Five propositions – Herbrand base Five propositions – Herbrand base Specifies the set of possible worlds Specifies the set of possible worlds An interpretation is a model of a clause C  If the body of C holds then the head holds, too. An interpretation is a model of a clause C  If the body of C holds then the head holds, too. burglary. earthquake. alarm :- burglary, earthquake. marycalls :- alarm. johncalls :- alarm. burglary earthquake alarm marycalls johncalls true false true false true false true false true false

Probabilities on Possible worlds Specifies a joint distribution P(X 1,…,X n ) over a fixed, finite set {X 1,…,X n } Specifies a joint distribution P(X 1,…,X n ) over a fixed, finite set {X 1,…,X n } Each random variable takes a value from respective domain Each random variable takes a value from respective domain Defines a probability distribution over all possible interpretations Defines a probability distribution over all possible interpretations burglary earthquake alarm marycalls johncalls true false true false true false true false true false

Proof Theoretic burglary. earthquake. alarm :- burglary, earthquake. marycalls :- alarm. johncalls :- alarm. :- alarm. :- burglary, earthquake. :- earthquake. {} A logic program can be used to prove some goals that are entailed by program A logic program can be used to prove some goals that are entailed by program Goal :- johncalls

Probabilities on Proofs Stochastic grammars Stochastic grammars Each time a rule is applied in a proof, the probability of the rule is multiplied with the overall probability Each time a rule is applied in a proof, the probability of the rule is multiplied with the overall probability Useful in NLP – most likely parse tree or the total probability that a particular sentence is derived Useful in NLP – most likely parse tree or the total probability that a particular sentence is derived Use SLD trees for resolution Use SLD trees for resolution 1.0 : S  NP, VP 1/3 : NP  i 1/3 : NP  Det, N 1/3 : NP  NP, PP....

Full Clausal Logic Functors aggregate objects Relational Clausal Logic Constants and variables refer to objects Propositional Clausal Logic Expressions can be true or false

Introduction Introduction Probabilistic Logic Models Probabilistic Logic Models Directed vs Undirected Models Directed vs Undirected Models Learning Learning Conclusion Conclusion

First-Order/Relational Logic + Probability = PLM Model-Theoretic vs. Proof-Theoretic Model-Theoretic vs. Proof-Theoretic Directed vs. Undirected Directed vs. Undirected Aggregators vs. Combining Rules Aggregators vs. Combining Rules

Model-Theoretic Approaches

Probabilistic Relational Models – Getoor et al. Combine advantages of relational logic & Bayesian networks: Combine advantages of relational logic & Bayesian networks: –natural domain modeling: objects, properties, relations –generalization over a variety of situations –compact, natural probability models Integrate uncertainty with relational model: Integrate uncertainty with relational model: –properties of domain entities can depend on properties of related entities Lise Getoor’s talk LPRM

Course Instructor Rating Difficulty Name Registration Course Student Grade Satisfaction RegID Student Intelligence Ranking Name Relational Schema Professor Popularity Teaching-Ability Name Primary keys are indicated by a blue rectangle M M M 1 M 1 Indicates one-to- many relationship

Probabilistic Relational Models Studen t Intelligence Ranking Course Rating Difficulty Professor Popularity Teaching-Ability Registration Grade Satisfaction M M MM 1 1 AVG The student’s ranking depends on the average of his grades A course rating depends on the average satisfaction of students in the course P(pop|Ability) LMH L M H P(sat|Ability) LMH L M H Parameter are shared between all the Professors

Probabilistic Entity Relational Models (PERMs) – Heckerman et al. Extend ER models to represent probabilistic relationships Extend ER models to represent probabilistic relationships ER model consists of Entity classes, relationships and attributes of the entities ER model consists of Entity classes, relationships and attributes of the entities DAPER model consists of: DAPER model consists of: –Directed arcs between attributes –Local distributions Conditions on arcs Conditions on arcs Student Course Takes Intell Diff Grade Student[Grade] = Student[Intell] Course[Grade] = Course[Diff]

satisfaction teachingAbilitygrade Professor Bayesian Logic Programs (BLPs) satisfaction(S,L) teachingAbility(P,A)grade(C,S,G) argument predicate atom variable Student Course LMH ABCABCABC L M H sat(S,L) | student(S), professor(P), course(C), grade(S,C,G), teachingAbility(P,A)

Bayesian Logic Programs (BLPs) – Kersting & De Raedt sat(S,L) | student(S), professor(P), course(C), grade(S,C,G), teachingAbility(P,A) popularity(P,L) | professor(P), teachingAbility(P,A) grade(S,C,G) | course(C), student(S), difficultyLevel(C,D) grade(S,C,G) | student(S), IQ(S,I) Associated with each clause is a CPT There could be multiple instances of the course - Combining Rules

Proof theoretic Probabilistic Logic Methods

Probabilistic Proofs -PRISM Associate probability label to the facts Associate probability label to the facts Labelled fact p:f – Probability is p with which f is true Labelled fact p:f – Probability is p with which f is true P(Bloodtype = A) P(Bloodtype = B) P(Bloodtype = AB) P(Bloodtype = O) P(Gene = A) P(Gene = B) P(Gene = O)

bloodtype(a) :- (genotype(a,a) ; genotype(a,o) ; genotype(o,a)). bloodtype(a) :- (genotype(a,a) ; genotype(a,o) ; genotype(o,a)). bloodtype(b) :- (genotype(b,b) ; genotype(b,o) ; genotype(o,b)). bloodtype(b) :- (genotype(b,b) ; genotype(b,o) ; genotype(o,b)). bloodtype(o) :- genotype(o,o). bloodtype(o) :- genotype(o,o). bloodtype(ab) :- (genotype(a,b) ; genotype(b,a)). bloodtype(ab) :- (genotype(a,b) ; genotype(b,a)). genotype(X,Y) :- gene(father,X), gene(mother,Y) genotype(X,Y) :- gene(father,X), gene(mother,Y) (0.4) gene(P,a) (0.4) gene(P,a) (0.4) gene(P,b) (0.4) gene(P,b) (0.2) gene(P,o) (0.2) gene(P,o) Probabilistic Proofs -PRISM gene a is inherited from P A child has genotype Probabilities attached to facts

PRISM Logic programs with probabilities attached to facts Logic programs with probabilities attached to facts Clauses have no probability labels  Always true with probability 1 Clauses have no probability labels  Always true with probability 1 Switches are used to sample the facts i.e., the facts are generated at random during program execution Switches are used to sample the facts i.e., the facts are generated at random during program execution Probability distributions are defined on the proofs of the program given the switches Probability distributions are defined on the proofs of the program given the switches

Probabilistic Proofs – Stochastic Logic Programs (SLPs) Similar to Stochastic grammars Similar to Stochastic grammars Attach probability labels to clauses Attach probability labels to clauses Some refutations fail at clause level Some refutations fail at clause level Use normalization to account for failures Use normalization to account for failures

0.4:s(X) :- p(X), p(X). 0.6:s(X) :- q(X). 0.3:p(a). 0.2:q(a). 0.7:p(b). 0.8:q(b). :-s(X) :-p(X), p(X) :-q(X) :-p(b):-p(a) 0.4{X’/X} 0.6{X’’/X} 0.3{X/a} 0.7{X/b} 0.2{X/a}0.8{X/a} 0.3{}0.7{} 0.3{fail} 0.7{fail}

0.4:s(X) :- p(X), p(X). 0.6:s(X) :- q(X). 0.3:p(a). 0.2:q(a). 0.7:p(b). 0.8:q(b). :-s(X) :-p(X), p(X) :-q(X) :-p(b):-p(a) 0.4{X’/X} 0.6{X’’/X} 0.3{X/a} 0.7{X/b} 0.2{X/a}0.8{X/a} 0.3{}0.7{} 0.3{fail} 0.7{fail} P(s(a)) = (0.4*0.3* *0.2)/(0.832) = P(s(b)) = (0.4*0.7* *0.8)/(0.832) =

Directed Models vs. Undirected Models Parent Child Friend 1 Friend 2 P(Child|Parent)φ(Friend1,Friend2)

Undirected Probabilistic Logic Models Upgrade undirected propositional models to relational setting Markov Nets  Markov Logic Networks Markov Random Fields  Relational Markov Nets Conditional Random Fields  Relational CRFs

Markov Logic Networks (Richardson & Domingos) Soften logical clauses Soften logical clauses –A first-order clause is a hard constraint on the world –Soften the constraints so that when a constraint is violated, the world is less probably, not impossible –Higher weight  Stronger constraint –Weight of  first-order logic Probability( World S ) = ( 1 / Z )  exp {  weight i x numberTimesTrue(f i, S) }

Example: Friends & Smokers Cancer(A) Smokes(A)Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B)

Plethora of Approaches Relational Bayes Nets Relational Bayes Nets –Models the distribution over relationships Bayesian Logic Bayesian Logic –Handle “identity” uncertainty Relational Probability trees Relational Probability trees –Extend Decision-Trees to logical Setting Relational Dependency networks Relational Dependency networks –Extend DNs to logical setting CLP-BN CLP-BN –Integrates Bayesian networks with constraint logic programming

Multiple Parents Problem Often multiple objects are related to an object by the same relationship Often multiple objects are related to an object by the same relationship –One’s friend’s drinking habits influence one’s own –A students’s GPA depends on the grades in the courses he takes –The size of a mosquito population depends on the temperature and the rainfall each day since the last freeze The resultant variable in each of these statements The resultant variable in each of these statements has multiple influents (“parents” in Bayes net jargon) has multiple influents (“parents” in Bayes net jargon)

Population Rain1Temp1Rain2Temp2Rain3Temp3 Multiple Parents for “population” ■ Variable number of parents ■ Large number of parents ■ Need for compact parameterization

Solution 1: Aggregators – PRM, RDN, PRL etc Population Rain1Temp1Rain2Temp2Rain3Temp3 AverageRainAverageTemp Deterministic Stochastic

Solution 2: Combining Rules – BLP, RBN,LBN etc Population Rain1Temp1Rain2Temp2Rain3Temp3 Population3Population1 Population2

Introduction Introduction Probabilistic Logic Models Probabilistic Logic Models Directed vs Undirected Models Directed vs Undirected Models Learning Learning Conclusion Conclusion

Learning Parameter Learning – Where do the numbers come from Parameter Learning – Where do the numbers come from Structure Learning – neither logic program nor models are fixed Structure Learning – neither logic program nor models are fixed Evidence Evidence –Model Theoretic: Learning from Interpretations {burglary = false, earthquake = true, alarm = ?, johncalls = ?, marycalls = true} –Proof Theoretic: Learning from entailment

Parameter Estimation Given: a set of examples E, and a logic program L Given: a set of examples E, and a logic program L Goal: Compute the values of parameters λ * that best explains the data Goal: Compute the values of parameters λ * that best explains the data MLE: λ * = argmax λ P(E|L,λ) MLE: λ * = argmax λ P(E|L,λ) Log-likelihood argmax λ log [P(E|L,λ)] Log-likelihood argmax λ log [P(E|L,λ)] MLE = Frequency Counting MLE = Frequency Counting Expectation-Maximization (EM) algorithm Expectation-Maximization (EM) algorithm –E-Step: Compute a distribution over all possible completions of each partially observed data case –M-Step: Compute the updated parameter values using frequency counting

Parameter Estimation – Model Theoretic The given data and current model induce a BN and then the parameters are estimated The given data and current model induce a BN and then the parameters are estimated E-step – Determines the distribution of values for unobserved states E-step – Determines the distribution of values for unobserved states M-step – Improved estimates of the parameters of a node M-step – Improved estimates of the parameters of a node Parameters are identical for different ground instances of the same clause Parameters are identical for different ground instances of the same clause Aggregators and combining rules Aggregators and combining rules

Parameter Estimation – Proof Theoretic Based on refutations and failures Based on refutations and failures Assumption: Examples are logically entailed by the program Assumption: Examples are logically entailed by the program Parameters are estimated by computing the SLD tree for each example Parameters are estimated by computing the SLD tree for each example Each path from root to leaf is one possible computation Each path from root to leaf is one possible computation The completions are weighted with the product of probabilities associated with the clauses/facts The completions are weighted with the product of probabilities associated with the clauses/facts Improved estimated are obtained Improved estimated are obtained

Introduction Introduction Probabilistic Logic Models Probabilistic Logic Models Directed vs Undirected Models Directed vs Undirected Models Learning Learning Conclusion Conclusion

Probabilistic Logic Distributional Semantics Constraint Based Model Theoretic Proof Theoretic RBN BLP PRM PHAPRISMSLP PL DirectedUndirected ML RPT MRF * * *

Structure leanring unexplored, simple models SLD trees Multiple- paths Proof Theoretic SLP Structure learning unexplored, simple models Proof trees Multiple- paths Proof Theoretic PRISM Slot Chains are binary, no implmenetation Unrolling to a BN AggregatorsDirected Model Theoretic PRM Limitations of directed models And/Or tree (BN) Combining Rules Directed Model Theoretic BLP Inference is hard, representation is too general Mainly Sampling Counts of the instantations Undirected Model Theoretic ML PitfallsInference Multiple- Parents DirectionType