S TATISTICAL R ELATIONAL L EARNING Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik
B AYESIAN N ETWORKS BurglaryEarthquake Alarm JohnCalls eba e 0.01 b MaryCalls
B AYESIAN N ETWORK FOR A C ITY BurglaryEarthquake Alarm Calls(H1)Calls(H3) BurglaryEarthquake Alarm Calls(H2) BurglaryEarthquake Alarm Calls(H2)Calls(H4) BurglaryEarthquake Alarm Calls(H3)Calls(H5) BurglaryEarthquake Alarm Calls(H4)Calls(H6) H1 H2 H3 H4 H5
S HARED V ARIABLES Earthquake(BL) Alarm(H1) Alarm(H2) Alarm(H3) Alarm(H4) Burglary(H4) Burglary(H2) Burglary(H3) Burglary(H1) Calls(H1) Calls(H4) Calls(H5) Calls(H2)Calls(H3)
F IRST O RDER L OGIC Burglary(house) Earthquake(city) Alarm(house) Calls(nhouse) HouseInCity(house, city) Alarm(house) :- HouseInCity(house, city), Earthquake(city), Burglary(house) eba Neighbor(house, nhouse)
L OGIC + P ROBABILITY = S TATISTICAL R ELATIONAL L EARNING M ODELS Logic Probabilities Add Probabilities Statistical Relational Learning (SRL) Add Relations PRating CRating Diff
A LPHABETIC S OUP Knowledge-based model construction [Wellman et al., 1992] PRISM [Sato & Kameya 1997] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Bayesian logic programs [Kersting & De Raedt, 2001] Bayesian logic [Milch et al., 2005] Markov logic [Richardson & Domingos, 2006] Relational dependency networks [Neville & Jensen 2007] ProbLog [De Raedt et al., 2007] And many others!
R ELATIONAL D ATABASE ProfLevel ProfCourseRating CourseDiff StudentCourseGrade StudentIQSatisfaction
F IRST O RDER L OGIC Prof(P) Level(P,L) Diff(C) Course(C) taughtBy(P,C) ratings(P,C,R) Student(S) IQ(S,I) satis(S,B) takes(S,C) grde(S,C,G) ProfLevel ProfCourseRating CourseDiff StudentCourseGrade StudentIQSatisfaction
G RAPHICAL M ODEL satisfaction(S, B) Diff(S, C, D)grades(S, C, G) avgGrade(S, G) avgDiff(S, D) P(satisfaction(S, B) | avgGrade(S, G), avgDiff(D))
R ELATIONAL D ECISION T REE speed(X,S), S>120 job(X, politician) knows(X,Y) job(Y, politician) N N N Y Y no yes no yes no yes NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary
R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(X, politician) knows(X,Y) job(Y, politician) N N N Y Y no yes no yes no yes
R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(X,Y) job(Y, politician) N N N Y Y no yes no yes no yes
R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(Y, politician) N N N Y Y no yes no yes no yes
R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(John, politician) N N N Y Y no yes no yes no yes
R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(John, politician) N N N Y Y no yes no yes no yes
R ELATIONAL D ECISION T REE NameSpeedJobFine Bob120TeacherN Alice150WriterN John180PoliticianN Mary160StudentY Mike140EngineerY Person1Person2 AliceJohn MaryMike MaryAlice BobMike BobMary speed(Alice,150), 150>120 job(Alice, politician) knows(Alice,John) job(John, politician) N N N Y Y no yes no yes no yes
R ELATIONAL P ROBABILITY T REES Use probabilities on the leaves Can be used to represent the conditional distributions Can use regression values on leaves to represent regression functions speed(X,S), S>120 job(X, politician) knows(X,Y) job(Y, politician) no yes no yes no yes
S TRUCTURE L EARNING P ROBLEM Learn the structure of the conditional distributions Find the parents and the distribution for the target concept satisfaction(S, B) avgGrade(S, G)avgDiff(S, D) IQ(S, I) level(P, L)
R ELATIONAL T REE L EARNING 20 student(X ) paper(X,Y) student(X) = T paper(X,Y) = Tpaper(X,Y) = F student(X) = F XΔ x10.7 x2-0.2 x3-0.9 XY x1y1 x1y2 x3y1 X x1 x2 paper(X, Y) student(X) adviser(X) XΔ x10.7 x2-0.2 XΔ x3-0.9 XΔ x2-0.2 XΔ x
Sequentially learn models where each subsequent model corrects the previous model F UNCTIONAL G RADIENT B OOSTING Data Predictions - Residues = Initial Model + + Induce Iterate Final Model = … ψmψm Natarajan et al MLJ’12
B OOSTING A LGORITHM For each gradient step m=1 to M For each query predicate, P Generate trainset using previous model, F m-1 Learn a regression function, T m,p For each example, x Compute gradient for x Add to trainset Add T m,p to the model, F m
UW-CSE AUC-ROCAUC-PRLikelihood Training Time Boosting s RDN s Alchemy hrs Predict advisedBy relation Given student, professor, courseTA, courseProf, etc relations 5-fold cross validation
CARDIA Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function etc Goal is to identify risk factors in early adulthood that causes serious cardio-vascular issues in older adults Extremely rich dataset with 25 years of information S. Natarajan, J. Carr
R ESULTS
I MITATION L EARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories to suggest actions based on current state Natarajan et al. IJCAI’11
Gridworld domainRobocup domain
A LZHEIMER ' S R ESEARCH AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and memory MRI – neuroimaging method Visualization of brain anatomy Humans are not very good at identifying people with AD, especially before cognitive decline MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN Natarajan et al. Under review
P ROPOSITIONAL M ODELS ( WITH AAL)
C ONCLUSION Statistical Relational Learning combines first-order logic with probabilistic models Relational trees used to represent conditional distributions Boosting trees can be used to efficiently learn structure of SRL models