S TATISTICAL R ELATIONAL L EARNING Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

A Tutorial on Learning with Bayesian Networks
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Efficient Inference Methods for Probabilistic Logical Models
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 March, 25, 2015 Slide source: from Pedro Domingos UW.
Sriraam Natarajan Introduction to Probabilistic Logical Models Slides based on tutorials by Kristian Kersting, James Cussens, Lise Getoor & Pedro Domingos.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference CS 460, Belief Networks1 Mundhenk and Itti Based.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.
Learning, Logic, and Probability: A Unified View Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Stanley Kok, Matt.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Boosting Markov Logic Networks
Modeling and Inference with Relational Dynamic Bayesian Networks PhD Dissertation Cristina E. Manfredotti XXII Ciclo (DISCo) Università degli Studi Milano-Bicocca.
Practical Probabilistic Relational Learning Sriraam Natarajan.
1 Transfer Learning by Mapping and Revising Relational Knowledge Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova,
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Markov Logic And other SRL Approaches
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
Ch. 14 – Probabilistic Reasoning Supplemental slides for CSE 327 Prof. Jeff Heflin.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Made by: Maor Levy, Temple University  Inference in Bayes Nets ◦ What is the probability of getting a strong letter? ◦ We want to compute the.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Belief Networks CS121 – Winter Other Names Bayesian networks Probabilistic networks Causal networks.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Learning through Advice-Seeking via Transfer Phillip Odom, Raksha Kumaraswamy, Kristian Kersting and Sriraam Natarajan.
Learning Relational Dependency Networks for Relation Extraction
A Brief Introduction to Bayesian networks
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.
Fast Learning of Relational Dependency Networks
Probabilistic Reasoning; Network-based reasoning
Belief Networks CS121 – Winter 2003 Belief Networks.
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Statistical Relational AI
Presentation transcript:

S TATISTICAL R ELATIONAL L EARNING Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik

B AYESIAN N ETWORKS BurglaryEarthquake Alarm JohnCalls eba e 0.01 b MaryCalls

B AYESIAN N ETWORK FOR A C ITY BurglaryEarthquake Alarm Calls(H1)Calls(H3) BurglaryEarthquake Alarm Calls(H2) BurglaryEarthquake Alarm Calls(H2)Calls(H4) BurglaryEarthquake Alarm Calls(H3)Calls(H5) BurglaryEarthquake Alarm Calls(H4)Calls(H6) H1 H2 H3 H4 H5

S HARED V ARIABLES Earthquake(BL) Alarm(H1) Alarm(H2) Alarm(H3) Alarm(H4) Burglary(H4) Burglary(H2) Burglary(H3) Burglary(H1) Calls(H1) Calls(H4) Calls(H5) Calls(H2)Calls(H3)

F IRST O RDER L OGIC Burglary(house) Earthquake(city) Alarm(house) Calls(nhouse) HouseInCity(house, city) Alarm(house) :- HouseInCity(house, city), Earthquake(city), Burglary(house) eba Neighbor(house, nhouse)

L OGIC + P ROBABILITY = S TATISTICAL R ELATIONAL L EARNING M ODELS Logic Probabilities Add Probabilities Statistical Relational Learning (SRL) Add Relations PRating CRating Diff

A LPHABETIC S OUP  Knowledge-based model construction [Wellman et al., 1992]  PRISM [Sato & Kameya 1997]  Stochastic logic programs [Muggleton, 1996]  Probabilistic relational models [Friedman et al., 1999]  Bayesian logic programs [Kersting & De Raedt, 2001]  Bayesian logic [Milch et al., 2005]  Markov logic [Richardson & Domingos, 2006]  Relational dependency networks [Neville & Jensen 2007]  ProbLog [De Raedt et al., 2007] And many others!

R ELATIONAL D ATABASE ProfLevel ProfCourseRating CourseDiff StudentCourseGrade StudentIQSatisfaction

F IRST O RDER L OGIC Prof(P) Level(P,L) Diff(C) Course(C) taughtBy(P,C) ratings(P,C,R) Student(S) IQ(S,I) satis(S,B) takes(S,C) grde(S,C,G) ProfLevel ProfCourseRating CourseDiff StudentCourseGrade StudentIQSatisfaction

G RAPHICAL M ODEL satisfaction(S, B) Diff(S, C, D)grades(S, C, G) avgGrade(S, G) avgDiff(S, D) P(satisfaction(S, B) | avgGrade(S, G), avgDiff(D))

R ELATIONAL D ECISION T REE speed(X,S), S>120 job(X, politician) knows(X,Y) job(Y, politician) N N N Y Y no yes no yes no yes NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary

R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(X, politician) knows(X,Y) job(Y, politician) N N N Y Y no yes no yes no yes

R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(X,Y) job(Y, politician) N N N Y Y no yes no yes no yes

R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(Y, politician) N N N Y Y no yes no yes no yes

R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(John, politician) N N N Y Y no yes no yes no yes

R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(John, politician) N N N Y Y no yes no yes no yes

R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(John, politician) N N N Y Y no yes no yes no yes

R ELATIONAL P ROBABILITY T REES Use probabilities on the leaves Can be used to represent the conditional distributions Can use regression values on leaves to represent regression functions speed(X,S), S>120 job(X, politician) knows(X,Y) job(Y, politician) no yes no yes no yes

S TRUCTURE L EARNING P ROBLEM Learn the structure of the conditional distributions Find the parents and the distribution for the target concept satisfaction(S, B) avgGrade(S, G)avgDiff(S, D) IQ(S, I) level(P, L)

R ELATIONAL T REE L EARNING 20 student(X ) paper(X,Y) student(X) = T paper(X,Y) = Tpaper(X,Y) = F student(X) = F XΔ x10.7 x2-0.2 x3-0.9 XY x1y1 x1y2 x3y1 X x1 x2 paper(X, Y) student(X) adviser(X) XΔ x10.7 x2-0.2 XΔ x3-0.9 XΔ x2-0.2 XΔ x

Sequentially learn models where each subsequent model corrects the previous model F UNCTIONAL G RADIENT B OOSTING Data Predictions - Residues = Initial Model + + Induce Iterate Final Model = … ψmψm Natarajan et al MLJ’12

B OOSTING A LGORITHM For each gradient step m=1 to M For each query predicate, P Generate trainset using previous model, F m-1 Learn a regression function, T m,p For each example, x Compute gradient for x Add to trainset Add T m,p to the model, F m

UW-CSE AUC-ROCAUC-PRLikelihood Training Time Boosting s RDN s Alchemy hrs Predict advisedBy relation Given student, professor, courseTA, courseProf, etc relations 5-fold cross validation

CARDIA Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function etc Goal is to identify risk factors in early adulthood that causes serious cardio-vascular issues in older adults Extremely rich dataset with 25 years of information S. Natarajan, J. Carr

R ESULTS

I MITATION L EARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories to suggest actions based on current state Natarajan et al. IJCAI’11

Gridworld domainRobocup domain

A LZHEIMER ' S R ESEARCH AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and memory MRI – neuroimaging method Visualization of brain anatomy Humans are not very good at identifying people with AD, especially before cognitive decline MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN Natarajan et al. Under review

P ROPOSITIONAL M ODELS ( WITH AAL)

C ONCLUSION Statistical Relational Learning combines first-order logic with probabilistic models Relational trees used to represent conditional distributions Boosting trees can be used to efficiently learn structure of SRL models