© Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise.

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Exact Inference in Bayes Nets

Learning Probabilistic Relational Models Lise Getoor 1, Nir Friedman 2, Daphne Koller 1, and Avi Pfeffer 3 1 Stanford University, 2 Hebrew University,

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)

Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.

CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Apr, 8, 2015 Slide source: from David Page (MIT) (which were.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Overview Full Bayesian Learning MAP learning

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Lecture 5: Learning models using EM

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.

Statistical Learning from Relational Data Daphne Koller Stanford University Joint work with many many people.

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

Today Logistic Regression Decision Trees Redux Graphical Models

Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Relational Probability Models Brian Milch MIT 9.66 November 27, 2007.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Markov Logic And other SRL Approaches

1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.

Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Randomized Algorithms for Bayesian Hierarchical Clustering

Probabilistic Models of Object-Relational Domains

Slides for “Data Mining” by I. H. Witten and E. Frank.

CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.

CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 34 Dec, 2, 2015 Slide source: from David Page (MIT) (which were.

Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Probabilistic Reasoning Inference and Relational Bayesian Networks.

Learning Deep Generative Models by Ruslan Salakhutdinov

Qian Liu CSE spring University of Pennsylvania

Multimodal Learning with Deep Boltzmann Machines

CS 188: Artificial Intelligence

ITEC 3220A Using and Designing Database Systems

Bayesian Learning Chapter

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Discriminative Probabilistic Models for Relational Data

Label and Link Prediction in Relational Data

Markov Networks.

Presentation transcript:

© Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University

© Daphne Koller, 2003 Overview Relational logic Probabilistic relational models Inference in PRMs Learning PRMs Uncertainty about domain structure Summary

© Daphne Koller, 2003 Attribute-Based Models & Relational Models

© Daphne Koller, 2003 Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Difficulty Grade Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Intell_George Diffic_CS101 Grade_George_CS101 A C These “instances” are not independent

© Daphne Koller, 2003 The University BN Difficulty_ Geo101 Difficulty_ CS101 Grade_ Jane_ CS101 Intelligence_ George Intelligence_ Jane Grade_ George_ CS101 Grade_ George_ Geo101

© Daphne Koller, 2003 G_Homer G_Bart G_Marge G_LisaG_Maggie G_Harry G_Betty G_Selma B_Harry B_Betty B_Selma B_Homer B_Marge B_Bart B_LisaB_Maggie The Genetics BN G = genotype B = bloodtype

© Daphne Koller, 2003 Simple Approach Graphical model with shared parameters … and shared local dependency structure Want to encode this constraint: For human knowledge engineer For network learning algorithm How do we specify which nodes share params? shared (but different) structure across nodes?

© Daphne Koller, 2003 Simple Approach II We can write a special-purpose program for each domain: genetic inheritance (family tree imposes constraints) university (course registrations impose constraints) Is there something more general?

© Daphne Koller, 2003 Attribute-Based Worlds Smart_Jane & easy_CS101  GetA_Jane_CS101 Smart_Mike & easy_Geo101  GetA_Mike_Geo101 Smart_Jane & easy_Geo101  GetA_Jane_Geo101 Smart_Rick & easy_CS221  GetA_Rick_C World = assignment of values to attributes / truth values to propositional symbols

© Daphne Koller, 2003 Object-Relational Worlds World = relational interpretation: Objects in the domain Properties of these objects Relations (links) between objects  x,y(Smart(x) & Easy(y) & Take(x,y)  Grade(A,x,y))

© Daphne Koller, 2003 Relational Logic General framework for representing: objects & their properties classes of objects with same model relations between objects Represent a model at the template level, and apply it to an infinite set of domains Given finite domain, each instantiation of the model is propositional, but the template is not

© Daphne Koller, 2003 Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Student Intelligence Registration Grade Satisfaction Course Difficulty Professor Teaching-Ability Classes Attributes Teach Relations Has In

© Daphne Koller, 2003 St. Nordaf University Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence High Hard Easy A C B Hate Smart Weak High Low Easy A B C Like Hate Like Smart Weak

© Daphne Koller, 2003 Relational Logic: Summary Vocabulary: Classes of objects: Person, Course, Registration, … Individual objects in a class: George, Jane, … Attributes of these objects: George.Intelligence, Reg1.Grade Relationships between these objects Of(Reg1,George), Teaches(CS101,Smith) A world specifies: A set of objects, each in a class The values of the attributes of all objects The relations that hold between the objects

© Daphne Koller, 2003 Binary Relations Any relation can be converted into an object: R(x 1,x 2,…,x k )  new “relation” object y, R 1 (x 1,y), R 2 (x 2,y),…, R k (x k,y) E.g., registrations are “relation objects”  Can restrict attention to binary relations R(x,y)

© Daphne Koller, 2003 Relations & Links Binary relations can also be viewed as links: Specify the set of objects related to x via R R(x,y)  y  x.R 1, x  y.R 2 E.g., Teaches(p,c)  p.Courses = {courses c : Teaches(p,c)} c.Instructor = {professors p : Teaches(p,c)}

© Daphne Koller, 2003 Probabilistic Relational Models: Relational Bayesian Networks

© Daphne Koller, 2003 Probabilistic Models Uncertainty model: space of “possible worlds”; probability distribution over this space. In attribute-based models, world specifies assignment of values to fixed set of random variables In relational models, world specifies Set of domain elements Their properties Relations between them

© Daphne Koller, 2003 PRM Scope Entire set of relational worlds is infinite and too broad Assume circumscribed class of sets of worlds   consistent with some type of background knowledge  PRM  is a template defining P  (   ) for any such  Simplest class  attribute-based PRMs:  = relational skeleton: finite set of entities E and relations between them   = all assignments of values to all attributes of entities in E PRM template defines distribution over   for any such  [K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]

© Daphne Koller, 2003 Relational Bayesian Network Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Student Intelligence Reg Grade Satisfaction Course Difficulty Professor Teaching-Ability [K. & Pfeffer; Poole; Ngo & Haddawy] ABC

© Daphne Koller, 2003 RBN: Semantics  : set of objects & relations between them    the set of all assignments of values to all attributes of all objects in  P  (   ) is defined by a ground Bayesian network: variables: attributes of all objects dependencies: determined by relational links in  dependency structure of RBN model 

© Daphne Koller, 2003 RBN Structure For any object x in class X, x.B is parent of x.A  : link or chain of links For any object x in class X, x. .B is parent of x For each class X and attribute A, structure specifies parents for X.A X.B X.A Course.Level Course.Difficulty X. .B X.A Reg.In.Difficulty Reg.Grade

© Daphne Koller, 2003 Prof. SmithProf. Jones Welcome to CS101 Welcome to Geo101 RBN Semantics Teaching-ability Grade Satisfac Intelligence George Jane Welcome to CS101 Difficulty

© Daphne Koller, 2003 Welcome to CS101 low / high The Web of Influence Welcome to Geo101 A C low high easy / hard

© Daphne Koller, 2003 Student Intelligence Honors Reg Grade Satisfaction Aggregate Dependencies Student Jane Doe Intelligence Honors Reg #5077 Grade Satisfaction Reg #5054 Grade Satisfaction Reg #5639 Grade Satisfaction Problem!!! Need dependencies on sets av g C B A ny [K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]

© Daphne Koller, 2003 Aggregate Dependencies Student Intelligence Honors Reg Grade Satisfaction Course Difficulty Rating Professor Teaching-Ability Stress-Level avg count sum, min, max, avg, mode, count

© Daphne Koller, 2003 Basic RBN: Summary RBN specifies A probabilistic dependency structure S: A set of parents X. .B for each class attribute X.A A set of local probability models: Aggregator to use for each multi-valued dependency Set of CPD parameters  X.A Given relational skeleton structure , RBN induces a probability distribution over worlds  Distribution defined via ground BN over attributes x.A AttributesObjects

© Daphne Koller, 2003 Extension: Class Hierarchy Subclasses inherit all attributes of parents, but may have additional ones For inherited attribute X.A, subclass can: inherit parent’s probabilistic model overwrite with local probabilistic model Example: Professor has subclasses assistant, associate, full Inherit distribution over Stress-Level Modify distribution over Salary [K. & Pfeffer ’98]

© Daphne Koller, 2003 Extension: Class Hierarchies Hierarchies allow reuse in knowledge engineering and in learning Parameters and dependency models shared across more objects If class assignments specified in , class hierarchy does not introduce complications [K. & Pfeffer ’98]

© Daphne Koller, 2003 Coherence & Acyclicity For given skeleton , PRM  asserts dependencies between attributes of objects:  defines coherent probability model over  if  ,  is acyclic Reg Grade Satisfaction Professor Teaching-Ability Stress-Level Smith.Stress-level depends probabilistically on itself [Friedman, Getoor, K. Pfeffer ’99]

© Daphne Koller, 2003 Guaranteeing Acyclicity How do we guarantee that a PRM  is acyclic for every object skeleton  ? PRM dependency structure S template-level dependency graph Y.B X.A if X. .B  Parents(X.A), and class of X.  is Y class dependency graph acyclic   ,  acyclic for all  Attribute stratification:

© Daphne Koller, 2003 Limitation of Stratification Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Person.M-chromPerson.P-chrom Person.B-type

© Daphne Koller, 2003 Limitation of Stratification Prior knowledge: the Father-of relation is acyclic Dependence of Person.A on Person.Father.B cannot induce cycles Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother

© Daphne Koller, 2003 Guaranteeing Acyclicity With guaranteed acyclic relations, some cycles in the dependency graph are guaranteed to be safe. We color the edges in the dependency graph A cycle is safe if it has a green edge it has no red edge yellow: within single object X.B X.A green: via g.a. relation Y.B X.A red: via other relations Y.B X.A Person.M-chrom Person.P-chrom Person.B-type

© Daphne Koller, 2003 Object-Oriented Bayesian Nets* OOBNs are RBN with only one type of relation One object can be a “part-of” another Objects can only interact with component parts Other types of relationships must be embedded into the “part-of” framework Defines “neat” hierarchy of related objects Provides clearly defined object interface between object x and its enclosing object y [K. & Pfeffer ’97] * Simplified view

© Daphne Koller, 2003 OOBN Example Drivers, cars, collision must all be viewed as part of situation object Driver1 Car1 Owner SpeedValue Car2 Owner Driver2 SpeedValue Collision Severity Car1 Damage Car2 Damage

© Daphne Koller, 2003 Part-of hierarchy Objects in the model are encapsulated within each other, forming a tree. Accident situation Driver1Car1SeverityCar2Driver2 PriceEngineSpeed EnginePriceIncome Power

© Daphne Koller, 2003 Lines of Communication Accident situation Driver1Car1SeverityCar2Driver2 Price Engine Speed EnginePrice Income Power

© Daphne Koller, 2003 Probabilistic Relational Models: Relational Markov Networks

© Daphne Koller, 2003 Why Undirected Models? Symmetric, non-causal interactions E.g., web: categories of linked pages are correlated Cannot introduce direct edges because of cycles Patterns involving multiple entities E.g., web: “triangle” patterns Directed edges not appropriate “Solution”: Impose arbitrary direction Not clear how to parameterize CPD for variables involved in multiple interactions Very difficult within a class-based parameterization [Taskar, Abbeel, K. 2001]

© Daphne Koller, 2003 Markov Networks: Review Chris Dave Eve Alice Bob ABC Potential  (A,B,C)

© Daphne Koller, 2003 Markov Networks: Review A Markov network is an undirected graph over some set of variables V Graph associated with a set of potentials  i Each potential is factor over subset V i Variables in V i must be a (sub)clique in network

© Daphne Koller, 2003 Relational Markov Networks Probabilistic patterns hold for groups of objects Groups defined as sets of (typed) elements linked in particular ways Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential [Taskar, Abbeel, K. 2002]

© Daphne Koller, 2003 Relational Markov Networks Probabilistic patterns hold for groups of objects Groups defined as sets of (typed) elements linked in particular ways Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential

© Daphne Koller, 2003 RMN Language Define clique templates All tuples {reg R 1, reg R 2, group G} s.t. In(G, R 1 ), In(G, R 2 ) Compatibility potential  (R 1.Grade, R 2.Grade) Ground Markov network contains potential  (r 1.Grade, r 2.Grade) for all appropriate r 1, r 2

© Daphne Koller, 2003 Welcome to CS101 Ground MN (or Chain Graph) Welcome to Geo101 Difficulty Grade Intelligence George Jane Jill Intelligence Geo Study Group CS Study Group Grade

© Daphne Koller, 2003 PRM Inference

© Daphne Koller, 2003 Inference: Simple Method Define ground network as in semantics Apply standard inference methods Problem: Very large models can be specified very easily Resulting ground network often highly connected Exact inference is typically intractable In practice, often must resort to approximate methods such as belief propagation

© Daphne Koller, 2003 Honors Intelligence Upbringing Personality Job-prospects Student Exploiting Structure: Encapsulation Objects interact only in limited ways Can define object interface: Outputs: Object attributes influencing other objects Inputs: External attributes influencing object Object is independent of everything given interface Inference can be encapsulated within objects, with “communication” limited to interfaces GPA Intelligence [K. & Pfeffer, 1997]

© Daphne Koller, 2003 Exploiting Structure: Encapsulation Marginalize object distribution onto interface Dependency graph over interfaces induced by Inter-object dependencies And hence by the relational structure Perform inference over interfaces If interaction graph has low tree-width, can use exact inference E.g., part-of hierarchy in some OOBNs If relational structure more complex, can use BP A form of Kikuchi BP, where cluster selection is guided by relational structure

© Daphne Koller, 2003 Exploiting Structure: Reuse Objects from same class have same model For generic objects – no internal evidence – marginalize interface is the same Can reuse inference – a form of “lifting” Honors Upbringing Personality Job-prospects George GPA Intelligence Honors Upbringing Personality Job-prospects Jane GPA Intelligence [Pfeffer & K. 1998]

© Daphne Koller, 2003 Exploiting Structure: Reuse Generic objects often play same role in model Multiple students that all take the same class Reuse: compute interface once Combinatorics: compute total contribution to probability in closed form P(k students like the class | teaching-ability = low) = P(generic student likes the class | teaching ability = low) k [Pfeffer & K. 1998]

© Daphne Koller, 2003 Case study Example object classes: Battalion Battery Vehicle Location Weather. Example relations: At-Location Has-Weather Sub-battery/In-battalion Sub-vehicle/In-battery Battlefield situation assessment for missile units several locations many units each has detailed model

© Daphne Koller, 2003 Effect of Exploiting Structure flat BN no reuse with reuse with combinatorics #vehicles of each type in each battery running time in seconds BN has  2400 nodes cost  20 minutes cost  9 sec Query: P(Next-Mission) [Pfeffer & K. 1998]

© Daphne Koller, 2003 PRM Learning

© Daphne Koller, 2003 Outline Relational Bayesian networks Likelihood function ML parameter estimation EM Structure learning Relational Markov networks Parameter estimation Applications: Collective classification – web data Relational clustering – biological data

© Daphne Koller, 2003 PRM Learning: Input Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence High Hard Easy A C B Hate Smart Weak Input: a full world 

© Daphne Koller, 2003 Likelihood Function Likelihood of a BN with shared parameters Joint likelihood is a product of likelihood terms One for each attribute X.A and its family For each X.A, the likelihood function aggregates counts from all occurrences x.A in world  AttributesObjects [Friedman, Getoor, K., Pfeffer, 1999]

© Daphne Koller, 2003 Likelihood Function: Multinomials Sufficient statistics: Log-likelihood:

© Daphne Koller, 2003 RBN Parameter Estimation MLE parameters: Bayesian estimation: Prior for each attribute X.A Posterior uses aggregated sufficient statistics

© Daphne Koller, 2003 Learning w. Missing Data EM Algorithm applies essentially unchanged E-step computes expected sufficient statistics, aggregated over all objects in class M-step uses ML (or MAP) parameter estimation Key difference: In general, the hidden variables are not independent Computation of expected sufficient statistics requires inference over entire network [Same reasoning as for forward-backward algorithm in temporal models]

© Daphne Koller, 2003 P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM low / high easy / hard ABC Courses Students [Dempster et al. 77]

© Daphne Koller, 2003 Learning RBN Structure Define set of legal RBN structures Ones with legal class dependency graphs Define scoring function  Bayesian score Product of family scores: One for each X.A Uses aggregated sufficient statistics Search for high-scoring legal structure [Friedman, Getoor, K., Pfeffer, 1999]

© Daphne Koller, 2003 Learning RBN Structure All operations done at class level Dependency structure = parents for X.A Acyclicity checked using class dependency graph Score computed at class level Individual objects only contribute to sufficient statistics Can be obtained efficiently using standard DB queries

© Daphne Koller, 2003 Exploiting Locality: Phased Search Student Course Reg  score Add C.A  C.B  score Delete S.I  S.P Student Course Reg Student Reg Course Phase 0: consider only dependencies within a class

© Daphne Koller, 2003 Exploiting Locality: Phased Search Phase 1: consider dependencies one link away Student Course Reg  score Add C.A  R.B  score Add S.I  R.C Student Course Reg Student Reg Course

© Daphne Koller, 2003 Exploiting Locality: Phased Search Phase k: consider dependencies k links away  score Add C.A  S.P  score Add S.I  C.B Student Course Reg Student Course Reg Student Course Reg

© Daphne Koller, 2003 Reg Grade Satisfaction Professor Popularity Teaching-Ability Student Intelligence Honors Difficulty Rating Course Exploiting Locality: Phased Search

© Daphne Koller, 2003 TB Patients in San Francisco [Getoor, Rhee, K., Small, 2001] Patients Contacts Strains

© Daphne Koller, 2003 Outbreak Suscept Age POB TB-type XRay Race CareLoc HIV Smear Gender Infected CloseCont Relation Treatment Subcase CareLoc TB Patients in San Francisco Strain Patient Contact Input: Relational database 2300 patients 1000 strains contacts Output: PRM for domain

© Daphne Koller, 2003 Outline Relational Bayesian networks Likelihood function ML parameter estimation EM Structure learning Relational Markov networks Parameter estimation Applications: Collective classification – web data Relational clustering – biological data

© Daphne Koller, 2003 Learning RMN Parameters Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential  Study Group Parameterize potentials as log-linear model [Taskar, Wong, Abbeel, K., 2002]

© Daphne Koller, 2003 Learning RMN Parameters For example: f AA (  ) = # of tuples {reg r 1, reg r 2, group g} s.t. In(g, r 1 ), In(g, r 2 ); r 1.Grade=A, r 2.Grade=A Counts in  Partition function

© Daphne Koller, 2003 Learning RMN Parameters Parameter estimation is not closed form Convex problem  unique global maximum Can use methods such as conjugate gradient actual count - expected count Gradient process tries to find parameters s.t. expected counts = actual counts Computing expected counts requires inference over ground Markov network

© Daphne Koller, 2003 Learning RMNs  (Reg1.Grade,Reg2.Grade) P(Grades,Intelligence,Difficulty) Difficulty Intelligence Grade low / higheasy / hard ABCABC Intelligence Grade L = log Difficulty

© Daphne Koller, 2003 Outline Relational Bayesian networks Likelihood function ML parameter estimation EM Structure learning Relational Markov networks Parameter estimation Applications: Collective classification – web data Relational clustering – biological data

© Daphne Koller, 2003 Model Structure Probabilistic Relational Model Course Student Reg Training Data New Data Learning Inference Conclusions Collective Classification Train on one year of student intelligence, course difficulty, and grades Given only grades in following year, predict all students’ intelligence Example:

© Daphne Koller, 2003 Discriminative Training Goal: Given values of observed variables .O=o, predict desired target values .T=t* Do not necessarily want the model to fit the joint distribution P( .O=o, .T=t*) To maximize classification accuracy, we can consider other optimization criteria Maximize conditional log likelihood Maximize margin P(second highest probability label) [Taskar, Abbeel, K., 2002; Taskar, Guestrin, K. 2003]

© Daphne Koller, 2003 Web  KB Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Project-of Member [Craven et al.]

© Daphne Koller, 2003 Web Classification Experiments WebKB dataset Four CS department websites Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations

© Daphne Koller, 2003 Professor department extract information computer science machine learning … Standard Classification Categories: faculty course project student other Discriminatively trained model = Logistic Regression Page... Category Word 1 Word N

© Daphne Koller, 2003 Standard Classification... LinkWord N workin g with Tom Mitchell … Page... Category Word 1 Word N Logistic

© Daphne Koller, 2003 Power of Context Professor?Student?Post-doc?

© Daphne Koller, 2003 Collective Classification... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- Compatibility  (From,To) FT

© Daphne Koller, 2003 Collective Classification... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- LogisticLinks Classify all pages collectively, maximizing the joint label probability

© Daphne Koller, 2003 More Complex Structure C Wn W1 Faculty S Students S Courses

© Daphne Koller, 2003 More Complex Structure

© Daphne Koller, 2003 More Complex Structure C Wn W1 Faculty S Students S Courses

© Daphne Koller, 2003 Collective Classification: Results LogisticLinksSectionLink+Section 35.4% relative reduction in error relative to strong flat approach [Taskar, Abbeel, K., 2002]

© Daphne Koller, 2003 Model Structure Probabilistic Relational Model Course Stude nt Reg Unlabeled Relational Data Learning Relational Clustering Given only students’ grades, cluster similar students Example: Clustering of instances

© Daphne Koller, 2003 Movie Data Internet Movie Database

© Daphne Koller, 2003 Actor Director Movie Genres Rating Year #Votes MPAA Rating Discovering Hidden Types Type [Taskar, Segal, K., 2001]

© Daphne Koller, 2003 Directors Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Actors Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger … Movies Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson … Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Discovering Hidden Types

© Daphne Koller, 2003 Biology 101: Pathways Pathways are sets of genes that act together to achieve a common function

© Daphne Koller, 2003 Biology 101: Expression Gene 2 Coding Control Gene 1 Coding Control DNA RNAProtein Swi5 Transcription factor Swi5 Cells express different subsets of their genes in different tissues and under different conditions

© Daphne Koller, 2003 Pathway IIPathway IIIPathway I Finding Pathways: Attempt I Use protein-protein interaction data

© Daphne Koller, 2003 Finding Pathways: Attempt I Use protein-protein interaction data Problems: Data is very noisy Structure is lost: Large connected component (3527/3589 genes) in interaction graph

© Daphne Koller, 2003 Finding Pathways: Attempt II Use gene expression data Thousands of arrays available under different conditions Clustering Pathway I Pathway II

© Daphne Koller, 2003 Finding Pathways: Attempt II Use gene expression data Thousands of arrays available under different conditions Pathway I Pathway II Problems: Expression is only ‘weak’ indicator of interaction Data is noisy Interacting pathways are not separable

© Daphne Koller, 2003 Finding Pathways: Our Approach Use both types of data to find pathways Find “active” interactions using gene expression Find pathway-related co-expression using interactions Pathway I Pathway II Pathway III Pathway IV

© Daphne Koller, 2003 Probabilistic Model Genes are partitioned into “pathways”: Every gene is assigned to one of ‘k’ pathways Random variable for each gene with domain {1,…,k} Expression component: Model likelihood is higher when genes in the same pathway have similar expression profiles Interaction component: Model likelihood is higher when genes in the same pathway interact [Segal, Wang, K., 2003]

© Daphne Koller, 2003 Expression Component g.C=1 Naïve Bayes Pathway of gene g 0 0 g.E 1 g.E 2 g.E k g.E 3 … Expression level of gene g in m arrays 00

© Daphne Koller, 2003 Protein Interaction Component Gene 1 g 1.C g 2.C Gene 2 protein product interaction Interacting genes are more likely to be in the same pathway Compatibility potential  (g.C,g.C) g 1.C g 2.C  1

© Daphne Koller, 2003 Joint Probabilistic Model Gene 1 g 1.C g 2.C g 3.C g 4.C Gene 2 Gene 3 Gene 4 g 1.E 1 g 1.E 2 g 1.E 3 g 2.E 1 g 2.E 2 g 2.E 3 g 3.E 1 g 3.E 2 g 3.E 3 g 4.E 1 g 4.E 2 g 4.E 3  Path. I

© Daphne Koller, 2003 Learning Task E-step: compute pathway assignments M-step: Estimate Gaussian distribution parameters Estimate compatibility potentials using cross-validation g 1.C g 2.C g 3.C g 4.C g 1.E 1 g 1.E 2 g 1.E 3 g 2.E 1 g 2.E 2 g 2.E 3 g 3.E 1 g 3.E 2 g 3.E 3 g 4.E 1 g 4.E 2 g 4.E 3  Path. I Large Markov network with high connectivity  Use loopy belief propagation

© Daphne Koller, 2003 Capturing Protein Complexes Independent data set of interacting proteins Complex Coverage (%) Num Complexes Our method Standard expression clustering 124 complexes covered at 50% for our method 46 complexes covered at 50% for clustering

© Daphne Koller, 2003 YHR081W RRP40 RRP42 MTR3 RRP45 RRP4 RRP43 DIS3 TRM7 SKI6 RRP46 CSL4 RNAse Complex Pathway YHR081W SKI6 RRP42 RRP45 RRP46 RRP43 TRM7 RRP40 MTR3 RRP4 DIS3 CSL4 Includes all 10 known pathway genes Only 5 genes found by clustering

© Daphne Koller, 2003 Interaction Clustering RNAse complex found by interaction clustering as part of cluster with 138 genes

© Daphne Koller, 2003 Uncertainty about Domain Structure or PRMs are not just template BNs/MNs

© Daphne Koller, 2003 Structural Uncertainty Class uncertainty: To which class does an object belong Relational uncertainty: What is the relational (link) structure Identity uncertainty: Which “names” refer to the same objects Also covers data association

© Daphne Koller, 2003 Relational Uncertainty Probability distribution over graph structures Link existence model E.g., hyperlinks between webpages Each potential link is a separate event Its existence is a random variable Link reference model E.g., instructors teaching a course Distribution over # of endpoints for each outgoing link Each link has distribution over link endpoint e.g., instructor link for CS course likely to point to CS prof Many other models possible [Getoor, Friedman, K. Taskar, 2002]

© Daphne Koller, 2003 Link Existence Model Background knowledge  is an object skeleton A set of entity objects PRM defines distribution over worlds  Assignments of values to all attributes Existence of links between objects [Getoor, Friedman, K. Taskar, 2002]

© Daphne Koller, 2003 Link Page-from Type Words Type Words Page-to Link Existence as a PRM Define objects for any potential links Potential link object for any pair of webpages w1, w2 Each potential link object has link existence attribute, denoting whether the link exists or not Link existence variables have probabilistic model [Getoor, Friedman, K. Taskar, 2002] From.Type To.Type Student FalseTrue Professor Student Professor Professor Student Exists

© Daphne Koller, 2003 Why Are Link Models Useful? Predict which links exist in the world Which professor teaches each course Which student will register for which course Use known links to infer values of attributes Given that student registered for a hard class, is she more likely to be a graduate student Given that one page points to another, is it more likely to be a faculty page?

© Daphne Koller, 2003 Predicting Relationships Predict and classify relationships between objects Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Member

© Daphne Koller, 2003 Rel Flat Model... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N NONE advisor instructor TA member project-of

© Daphne Koller, 2003 Collective Classification: Links Rel... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N Category [Taskar, Wong, Abbeel, K., 2002]

© Daphne Koller, 2003 Link Model...

© Daphne Koller, 2003 Triad Model ProfessorStudent Group Advisor Member

© Daphne Koller, 2003 Triad Model ProfessorStudent Course Advisor TA Instructor

© Daphne Koller, 2003 Link Prediction: Results Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (  30% for all models) % relative reduction in error relative to strong flat approach [Taskar, Wong, Abbeel, K., 2002]

© Daphne Koller, 2003 Identity Uncertainty Model Background knowledge  is an object universe A set of potential objects PRM defines distribution over worlds  Assignments of values to object attributes Partition of objects into equivalence classes Objects in same class have same attribute values [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

© Daphne Koller, 2003 Citation Matching Model* Each citation object associated with paper object Uncertainty over equivalence classes for papers If P 1 =P 2, have same attributes & links Author Name Citation ObsTitle Text * Simplified Author-as-Cited Name Paper Title PubType Appears-in Refers-to Written-by Link chain: Appears-in. Refers-to. Written-by Title, PubType Authors

© Daphne Koller, 2003 Identity Uncertainty Depending on choice of equivalence classes: Number of objects changes Dependency structure changes No “nice” corresponding ground BN Algorithm: Each partition hypothesis defines simple BN Use MCMC over equivalence class partition Exact inference over resulting BN defines acceptance probability for Markov chain [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

© Daphne Koller, 2003 Identity Uncertainty Results Accuracy of citation recovery: % of actual citation clusters recovered perfectly [Pasula, Marthi, Milch, Russell, Shpitser, 2002] 61.5% relative reduction in error relative to state of the art

© Daphne Koller, 2003 Summary: PRMs … Inherit the advantages of graphical models: Coherent probabilistic semantics Exploit structure of local interactions Allow us to represent the world in terms of: Objects Classes of objects Properties of objects Relations

© Daphne Koller, 2003 So What Do We Gain? Convenient language for specifying complex models “Web of influence”: subtle & intuitive reasoning A mechanism for tying parameters and structure within models across models Framework for learning from relational and heterogeneous data

© Daphne Koller, 2003 So What Do We Gain? New way of thinking about models & problems Incorporating heterogeneous data by connecting related entities New problems: Collective classification Relational clustering Uncertainty about richer structures: Link graph structure Identity

© Daphne Koller, 2003 But What Do We Really Gain? Simple PRMs  relational logic w. fixed domain and  only Induce a “propositional” BN Can augment language with additional expressivity Existential quantifiers & functions Equality Resulting language is inherently more expressive, allowing us to represent distributions over worlds where dependencies vary significantly [Getoor et al., Pasula et al.] worlds with different numbers of objects [Pfeffer et al., Pasula et al.] worlds with infinitely many objects [Pfeffer & K.] Big questions: Inference & Learning Are PRMs just a convenient language for specifying attribute-based graphical models?

© Daphne Koller,  Pieter Abbeel  Nir Friedman  Lise Getoor  Carlos Guestrin  Brian Milch  Avi Pfeffer  Eran Segal  Ken Takusagawa  Ben Taskar  Haidong Wang  Ming-Fai Wong