Practical Probabilistic Relational Learning Sriraam Natarajan.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Dialogue Policy Optimisation
Efficient Inference Methods for Probabilistic Logical Models
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
S TATISTICAL R ELATIONAL L EARNING Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik.
Sriraam Natarajan Introduction to Probabilistic Logical Models Slides based on tutorials by Kristian Kersting, James Cussens, Lise Getoor & Pedro Domingos.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
Computer vision: models, learning and inference
Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Scalable Text Mining with Sparse Generative Models
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Distributed Iterative Training Kevin Gimpel Shay Cohen Severin Hacker Noah A. Smith.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
APPLICATION : DIAGNOSTIC CODING 1 SIEMENS  Coding is the translation of diagnosis terms describing patients diagnosis or treatment into a coded number.
Latent Boosting for Action Recognition Zhi Feng Huang et al. BMVC Jeany Son.
Boosting Markov Logic Networks
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Markov Logic And other SRL Approaches
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
NTU & MSRA Ming-Feng Tsai
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Daphne Koller Introduction Motivation and Overview Probabilistic Graphical Models.
Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
Learning Relational Dependency Networks for Relation Extraction
A Brief Introduction to Bayesian networks
CSE 4705 Artificial Intelligence
High-Throughput Machine Learning from EHR Data
Knowledge Discovery, Machine Learning, and Social Mining
Qian Liu CSE spring University of Pennsylvania
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Machine Learning Basics
Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Probabilistic Reasoning
Statistical Relational AI
Presentation transcript:

Practical Probabilistic Relational Learning Sriraam Natarajan

Take-Away Message Learn from rich, highly structured data!

Traditional Learning + DataAttributes(Features) Data is i.i.d. BEAMJ Earthquake Alarm Burglary MaryCalls JohnCalls

Learning Earthquake Alarm Burglary MaryCalls JohnCalls

PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA Real-World Problem: Predicting Adverse Drug Reactions PatientID Gender Birthdate P1 M 3/22/63 PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Date Lab Test Result P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 Patient Table Visit Table Lab Tests SNP Table Prescriptions

Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Logic Probabilities Add Probabilities Statistical Relational Learning (SRL) Several previous SRL Workshops in the past decade This year – AAAI 2013 Add Relations

Propositional Logic First Order Logic Statistical Relational Learning Probability Theory Probabilistic Logic Inductive Logic Programming Classical Machine Learning Prop Rule Learning Deterministic Stochastic Learning No Learning PropFO

Costs and Benefits of the SRL soup  Benefits  Rich pool of different languages  Very likely that there is a language that fits your task at hand well  A lot research remains to be done, ;-)  Costs  “Learning” SRL is much harder  Not all frameworks support all kinds of inference and learning settings How do we actually learn relational models from data?

Why is this problem hard?  Non-convex problem  Repeated search of parameters for every step in induction of the model  First-order logic allows for different levels of generalization  Repeated inference for every step of parameter learning  Inference is P# complete  How can we scale this?

Relational Probability Trees  Each conditional probability distribution can be learned as a tree  Leaves are probabilities  The final model is the set of the RRTs male(X) chol(X,Y,L), Y>40,L>200 diag(X,Hypertension,Z),Z>55 bmi(X,W,55), W> no yes no yes no yes [Blockeel & De Raedt ’98] To predict heartAttack(X) …

 Probability of an example  Weight learning  Gradient of log-likelihood w.r.t w = Δ i  Sum all gradients to get final w  Several gradient-based approaches in SRL Learning Problem #1 : Parameter Learning Logistic Regression Singla & Domingos AAAI’05, Jaeger ICML ’07, Natarajan et al. ICML’05, AMAI’08

Learning Problem #2: Structure Learning  Large space of possible structures  Typical approaches:  Use ILP techniques to learn the structure followed by parameter learning Kersting and De Raedt’02  Learn parameters for every candidate structure May not have closed form solution for parameter learning Kok and Domingos ICML‘05 12

Probability of an example Functional gradient – Gradient of log-likelihood w.r.t (x) – Sum all gradients to get final (x) Functional Gradients xΔ a1a2a3 0.7 b1b2b c1c2c J. Friedman, Annals of Statistics’01

Gradient (Tree) Boosting [Friedman Annals of Statistics 29(5): , 2001]  Models = weighted combination of a large number of small trees (models)  Intuition: Generate an additive model by sequentially fitting small trees to pseudo-residuals from a regression at each iteration… Data Predictions - Residuals = Data + Loss fct Initial Model Induce Iterate Final Model = …

Boosting Results – MLJ 11 AlgoLikelihoodAUC-ROCAUC-PRTime Boosting s MLN hrs Predicting the advisor for a student Movie Recommendation Citation AnalysisMachine Reading

Other Applications  Similar Results in several other problems  Imitation Learning – Learning how to act from demonstrations (Natarajan et al IJCAI ‘11)  Robocup, a grid world domain, traffic signal domain and blocksworld  Prediction of CAC Levels – Predicting cardio-vascular risks in young adults (Natarajan et al – IAAI 13)  Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine 12)  Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12, Natarajan et al IJMLC 2013)

Parallel Lifted Learning

Stochastic ML Statistical Relational Scales well, stochastic gradients, online learning, … Symmetries, compact models, lifted inference, …. Parallel Symmetries, compact models, lifted inference, ….

Symmetry based inference

P(Anna)HI (Bob) P(Bob) HI(Anna) root clause P(Anna)  !P(Bob) neighboring clauses P(Anna) => !HI(Bob) P(Anna) => HI(Anna) P(Bob) => HI(Bob) P(Bob) => !HI(Anna) Tree (set of clauses) P(Anna)  !P(Bob) P(Bob)=> HI(Bob) P(Bob)=> !HI(Anna) Variabilized tree P(X)  !P(Y) P(Y)=> HI(Y) P(Y)=> !HI(X)

Lifted Training Generate tree pieces from corresponding patterns. Compute gradient using lifted BP Update covariance matrix C or some low rank variant Update parameter vector and the corresponding equations Randomly draw mini-batches Generate initial tree pieces and variablize its arguments.

Challenges  Message schedules  Iterative Map-reduce?  How do we take this idea to learning the models?  How can we more efficiently parallelize symmetry identification?  What are the compelling problems? Vision, NLP,…

Conclusion  The world is inherently relational and uncertain  SRL has developed into an exciting field in the past decade  Several previous SRL workshops  Boosting Relational models has promising initial results  Applied to several different problems  First scalable relational learning algorithm  How can we parallelize/scale this algorithm?  Can this benefit from an inference algorithm like Belief Propagation that can be parallelized easily?

Future Work  Develop Lifted Online Structure Learning  Integrate ideas from DB  Exploit relational logic on DB and implement lifted inference techniques on DB  Real-world applications of FGB  Activity Recognition, Localization, Natural Language Processing, Bio-Medical Applications  Predictive Personalized Medicine  Mining information from large-scale medical databases  Use text from the web (blogs) and combine the learned models with the clinical data  Learning from expert  Evaluate in several domains such as Wargus, Robocup