Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Probabilistic Relational Learning Sriraam Natarajan.

Similar presentations


Presentation on theme: "Practical Probabilistic Relational Learning Sriraam Natarajan."— Presentation transcript:

1 Practical Probabilistic Relational Learning Sriraam Natarajan

2 Take-Away Message Learn from rich, highly structured data!

3 Traditional Learning + DataAttributes(Features) Data is i.i.d. BEAMJ 10110 00001... 01101 Earthquake Alarm Burglary MaryCalls JohnCalls

4 Learning Earthquake Alarm Burglary MaryCalls JohnCalls 0.080.92 0.010.99 0.10.9 0.550.45 0.60.4 0.950.05 0.30.7 0.80.2 0.10.9 0.1

5 PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA Real-World Problem: Predicting Adverse Drug Reactions PatientID Gender Birthdate P1 M 3/22/63 PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Date Lab Test Result P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 Patient Table Visit Table Lab Tests SNP Table Prescriptions

6 Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Logic Probabilities Add Probabilities Statistical Relational Learning (SRL) Several previous SRL Workshops in the past decade This year – StaRAI @ AAAI 2013 Add Relations

7 Propositional Logic First Order Logic Statistical Relational Learning Probability Theory Probabilistic Logic Inductive Logic Programming Classical Machine Learning Prop Rule Learning Deterministic Stochastic Learning No Learning PropFO

8 Costs and Benefits of the SRL soup  Benefits  Rich pool of different languages  Very likely that there is a language that fits your task at hand well  A lot research remains to be done, ;-)  Costs  “Learning” SRL is much harder  Not all frameworks support all kinds of inference and learning settings How do we actually learn relational models from data?

9 Why is this problem hard?  Non-convex problem  Repeated search of parameters for every step in induction of the model  First-order logic allows for different levels of generalization  Repeated inference for every step of parameter learning  Inference is P# complete  How can we scale this?

10 Relational Probability Trees  Each conditional probability distribution can be learned as a tree  Leaves are probabilities  The final model is the set of the RRTs male(X) chol(X,Y,L), Y>40,L>200 diag(X,Hypertension,Z),Z>55 bmi(X,W,55), W>30 0.8 0.77 0.05 0.3 no yes no yes no yes [Blockeel & De Raedt ’98] To predict heartAttack(X) …

11  Probability of an example  Weight learning  Gradient of log-likelihood w.r.t w = Δ i  Sum all gradients to get final w  Several gradient-based approaches in SRL Learning Problem #1 : Parameter Learning Logistic Regression Singla & Domingos AAAI’05, Jaeger ICML ’07, Natarajan et al. ICML’05, AMAI’08

12 Learning Problem #2: Structure Learning  Large space of possible structures  Typical approaches:  Use ILP techniques to learn the structure followed by parameter learning Kersting and De Raedt’02  Learn parameters for every candidate structure May not have closed form solution for parameter learning Kok and Domingos ICML‘05 12

13 Probability of an example Functional gradient – Gradient of log-likelihood w.r.t (x) – Sum all gradients to get final (x) Functional Gradients xΔ a1a2a3 0.7 b1b2b3 -0.2 c1c2c3 -0.9 J. Friedman, Annals of Statistics’01

14 Gradient (Tree) Boosting [Friedman Annals of Statistics 29(5):1189-1232, 2001]  Models = weighted combination of a large number of small trees (models)  Intuition: Generate an additive model by sequentially fitting small trees to pseudo-residuals from a regression at each iteration… Data Predictions - Residuals = Data + Loss fct Initial Model + + + Induce Iterate Final Model = + + + + …

15 Boosting Results – MLJ 11 AlgoLikelihoodAUC-ROCAUC-PRTime Boosting0.8100.9610.9309s MLN0.7300.5350.62193 hrs Predicting the advisor for a student Movie Recommendation Citation AnalysisMachine Reading

16 Other Applications  Similar Results in several other problems  Imitation Learning – Learning how to act from demonstrations (Natarajan et al IJCAI ‘11)  Robocup, a grid world domain, traffic signal domain and blocksworld  Prediction of CAC Levels – Predicting cardio-vascular risks in young adults (Natarajan et al – IAAI 13)  Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine 12)  Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12, Natarajan et al IJMLC 2013)

17 Parallel Lifted Learning

18 Stochastic ML Statistical Relational Scales well, stochastic gradients, online learning, … Symmetries, compact models, lifted inference, …. Parallel Symmetries, compact models, lifted inference, ….

19 Symmetry based inference

20 1 3 5 423 2 1 4 5 1 3 5 4 2 1 3 5 42 P(Anna)HI (Bob) P(Bob) HI(Anna) root clause P(Anna)  !P(Bob) neighboring clauses P(Anna) => !HI(Bob) P(Anna) => HI(Anna) P(Bob) => HI(Bob) P(Bob) => !HI(Anna) Tree (set of clauses) P(Anna)  !P(Bob) P(Bob)=> HI(Bob) P(Bob)=> !HI(Anna) Variabilized tree P(X)  !P(Y) P(Y)=> HI(Y) P(Y)=> !HI(X)

21 Lifted Training Generate tree pieces from corresponding patterns. Compute gradient using lifted BP Update covariance matrix C or some low rank variant Update parameter vector and the corresponding equations Randomly draw mini-batches Generate initial tree pieces and variablize its arguments.

22 Challenges  Message schedules  Iterative Map-reduce?  How do we take this idea to learning the models?  How can we more efficiently parallelize symmetry identification?  What are the compelling problems? Vision, NLP,…

23 Conclusion  The world is inherently relational and uncertain  SRL has developed into an exciting field in the past decade  Several previous SRL workshops  Boosting Relational models has promising initial results  Applied to several different problems  First scalable relational learning algorithm  How can we parallelize/scale this algorithm?  Can this benefit from an inference algorithm like Belief Propagation that can be parallelized easily?

24 Future Work  Develop Lifted Online Structure Learning  Integrate ideas from DB  Exploit relational logic on DB and implement lifted inference techniques on DB  Real-world applications of FGB  Activity Recognition, Localization, Natural Language Processing, Bio-Medical Applications  Predictive Personalized Medicine  Mining information from large-scale medical databases  Use text from the web (blogs) and combine the learned models with the clinical data  Learning from expert  Evaluate in several domains such as Wargus, Robocup


Download ppt "Practical Probabilistic Relational Learning Sriraam Natarajan."

Similar presentations


Ads by Google