Probabilistic Models of Object-Relational Domains

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

CS188: Computational Models of Human Behavior

Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise.

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.

Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.

Dynamic Bayesian Networks (DBNs)

Mapping with Known Poses

© Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Modeling the Shape of People from 3D Range Scans

Learning Probabilistic Relational Models Lise Getoor 1, Nir Friedman 2, Daphne Koller 1, and Avi Pfeffer 3 1 Stanford University, 2 Hebrew University,

A Probabilistic Model for Component-Based Shape Synthesis Evangelos Kalogerakis, Siddhartha Chaudhuri, Daphne Koller, Vladlen Koltun Stanford University.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)

SA-1 Probabilistic Robotics Probabilistic Sensor Models Beam-based Scan-based Landmarks.

CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Apr, 8, 2015 Slide source: from David Page (MIT) (which were.

Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Extracting Symbolic Knowledge From The Web Ofer Neiman.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.

Statistical Learning from Relational Data Daphne Koller Stanford University Joint work with many many people.

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

CMSC 828G: Introduction to Statistical Relational Learning (SRL) & Link Analysis (LA) January 28, 2005.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Relational Probability Models Brian Milch MIT 9.66 November 27, 2007.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.

EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.

Inferring High-Level Behavior from Low-Level Sensors Donald J. Patterson, Lin Liao, Dieter Fox, and Henry Kautz.

CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 34 Dec, 2, 2015 Slide source: from David Page (MIT) (which were.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Motivation and Overview

1/17/2016CS225B Kurt Konolige Probabilistic Models of Sensing and Movement Move to probability models of sensing and movement Project 2 is about complex.

CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Introduction on Graphic Models

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Daphne Koller Template Models Plate Models Probabilistic Graphical Models Representation.

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

Probabilistic Reasoning Inference and Relational Bayesian Networks.

Learning Bayesian Networks for Complex Relational Data

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32

Learning Bayesian Network Models from Data

Dynamical Statistical Shape Priors for Level Set Based Tracking

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32

Shared Features in Log-Linear Models

Discriminative Probabilistic Models for Relational Data

Label and Link Prediction in Relational Data

Plate Models Template Models Representation Probabilistic Graphical

Plate Models Template Models Representation Probabilistic Graphical

Presentation transcript:

Probabilistic Models of Object-Relational Domains Daphne Koller Stanford University Joint work with: Lise Getoor Ben Taskar Drago Anguelov Nir Friedman Pieter Abbeel Rahul Biswas Avi Pfeffer Ming-Fai Wong Evan Parker Eran Segal

Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence These “instances” are not independent Difficulty Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_CS101 Grade_George_CS101 A C Grade Intell_George Diffic_Geo101 Grade_George_Geo101 One of the key benefits of the propositional representation was our ability to represent our knowledge without explicit enumeration of the worlds. In turns out that we can do the same in the probabilistic framework. The key idea, introduced by Pearl in the Bayesian network framework, is to use locality of interaction. This is an assumption which seems to be a fairly good approximation of the world in many cases. However, this representation suffers from the same problem as other propositional representations. We have to create separate representational units (propositions) for the different entities in our domain. And the problem is that these instances are not independent. For example, the difficulty of CS101 in one network is the same difficulty as in another, so evidence in one network should influence our beliefs in another.

Probabilistic Relational Models Combine advantages of relational logic & BNs: Natural domain modeling: objects, properties, relations Generalization over a variety of situations Compact, natural probability models Integrate uncertainty with relational model: Properties of domain entities can depend on properties of related entities Uncertainty over relational structure of domain

St. Nordaf University Geo101 CS101 Teaching-ability Prof. Jones Prof. Smith Teaches Teaches Grade In-course Registered Intelligence Satisfac Welcome to Geo101 George Grade Registered Difficulty Welcome to CS101 In-course Satisfac Grade Let us consider an imaginary university called St. Nordaf. St. Nordaf has two faculty, two students, two courses, and three registrations of students in courses, each of which is associated with a registration record. These objects are linked to each other: professors teach classes, students register in classes, etc. Each of the objects in the domain has properties that we care about. Registered Satisfac Jane In-course

Relational Schema Classes Attributes Relations Specifies types of objects in domain, attributes of each type of object & types of relations between objects Classes Professor Student Teaching-Ability Intelligence Teach Take Attributes Relations Registration Grade Satisfaction Course In Difficulty The university can be described in an organized form using a relational database. The schema of a database tells us what types of objects we have, what are the attributes of these objects that are of interest, and how the objects can relate to each other.

Representing the Distribution Very large probability space for a given context All possible assignments of all attributes of all objects Infinitely many potential contexts Each associated with a very different set of worlds Need to represent infinite set of complex distributions Unfortunately, this is a very large number of worlds, and to specify a probabilistic model we have to specify a probability for each one. Furthermore, the resulting distribution would be good only for a limited time. If St. Nordaf hired a new faculty member, or got a new student, or even if the two students registered for different classes next year, it would no longer apply, and St. Nordaf would have to pay Acme consulting all over again. Thus, we want a model that holds for the infinitely many potential universities that hold over this very simple schema. Thus, we are stuck with what seems to be an impossible problem. How do we represent an infinite set of possible distributions, each of which is by itself very complex.

Probabilistic Relational Models Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Professor Teaching-Ability Student Intelligence Course Difficulty A B C Reg Grade Satisfaction The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course. [K. & Pfeffer; Poole; Ngo & Haddawy]

PRM Semantics Instantiated PRM BN variables: attributes of all objects dependencies: determined by links & PRM Teaching-ability Prof. Jones Prof. Smith Grade Satisfac Intelligence Welcome to Geo101 George Difficulty Welcome to CS101 The semantics of this model is as something that generates an appropriate probabilistic model for any domain that we might encounter. The instantiation is the embodiment of universals on the one hand, and locality of interaction, as specified by the links, on the other. Jane

The Web of Influence CS101 C A low high Geo101 easy / hard low / high Welcome to CS101 C A low high Welcome to Geo101 This web of influence has interesting ramifications from the perspective of the types of reasoning patterns that it supports. Consider Forrest Gump. A priori, we believe that he is pretty likely to be smart. Evidence about two classes that he took changes our probabilities only very slightly. However, we see that most people who took CS101 got A’s. In fact, even people who did fairly poorly in other classes got an A in CS101. Therefore, we believe that CS101 is probably an easy class. To get a C in an easy class is unlikely for a smart student, so our probability that Forrest Gump is smart goes down substantially. easy / hard low / high

Reasoning with a PRM Generic approach: Instantiate PRM to produce ground BN Use standard BN inference In most cases, resulting BN is too densely connected to allow exact inference Use approximate inference: belief propagation Improvement: Use domain structure — objects & relations — to guide computation Kikuchi approximation where clusters = objects

Data  Model  Objects Database Learner Probabilistic Model Course Student Reg Learner Probabilistic Model Expert knowledge What are the objects in the new situation? How are they related to each other? Prob. Inference Data for New Situation

Two Recent Instantiations From a relational dataset with objects & links, classify objects and predict relationships: Target application: Recognize terrorist networks Actual application: From webpages to database From raw sensor data to categorized objects Laser data acquired by robot Extract objects, with their static & dynamic properties Discover classes of similar objects

Summary PRMs inherit key advantages of probabilistic graphical models: Coherent probabilistic semantics Exploit structure of local interactions Relational models inherently more expressive “Web of influence”: use multiple sources of information to reach conclusions Exploit both relational information and power of probabilistic reasoning

Discriminative Probabilistic Models for Relational Data Ben Taskar Stanford University Joint work with: Pieter Abbeel Daphne Koller Ming-Fai Wong

Web  KB Tom Mitchell Professor Project-of WebKB Project Member Sean Slattery Student Advisor-of Project-of Member Of course, data is not always so nicely arranged for us as in a relational database. Let us consider the biggest source of data --- the world wide web. Consider the webpages in a computer science department. Here is one webpage, which links to another. This second webpage links to a third, which links back to the first two. There is also a webpage with a lot of outgoing links to webpages on this site. This is not nice clean data. Nobody labels these webpages for us, and tells us what they are. We would like to learn to understand this data, and conclude from it that we have a “Professor Tom Mitchell” one of whose interests is a project called “WebKB”. “Sean Slattery” is one of the students on the project, and Professor Mitchell is his advisor. Finally, Tom Mitchell is a member of the CS CMU faculty, which contains many other faculty members. How do we get from the raw data to this type of analysis? [Craven et al.]

Undirected PRMs: Relational Markov Nets Universals: Probabilistic patterns hold for all groups of objects Locality: Represent local probabilistic dependencies Address limitations of directed models: Increase expressive power by removing acyclicity constraint Improve predictive performance through discriminative training Course Reg Grade Student Difficulty Intelligence Template potential Study Group Student2 Reg2 Grade Intelligence The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course. [Taskar, Abbeel, Koller ‘02]

RMN Semantics Instantiated RMN  MN variables: attributes of all objects dependencies: determined by links & RMN Welcome to Geo101 Grade Intelligence Geo Study Group Difficulty George The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course. Welcome to CS101 CS Study Group Jane Jill

P(Grades,Intelligence|Difficulty) Learning RMNs Parameter estimation is not closed form Convex problem  unique global maximum Maximize L = log P(Grades,Intelligence|Difficulty) (Reg1.Grade,Reg2.Grade) easy / hard ABC low / high Grade Intelligence Grade Intelligence Grade Difficulty Intelligence Parameters are not independent Grade Intelligence Grade Difficulty Intelligence Grade

Web Classification Experiments WebKB dataset Four CS department websites Five categories (faculty,student,project,course,other) Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations

Exploiting Links From- Page ... Page Category Word1 WordN To- Category ... Classify all pages collectively, maximizing the joint label probability Link Word1 WordN 35.4% relative reduction in error relative to strong flat approach

Scalability WebKB data set size Network size / school: 1300 entities 180K attributes 5800 links Network size / school: 40,000 variables 44,000 edges Training time: 20 minutes Classification time: 15-20 seconds

Predicting Relationships Tom Mitchell Professor Advisor-of Member WebKB Project Sean Slattery Student Even more interesting are relationships between objects

WebKB++ Four new department web sites: Labeled page type (8 types): Berkeley, CMU, MIT, Stanford Labeled page type (8 types): faculty, student, research scientist, staff, research group, research project, course, organization Labeled hyperlinks and virtual links (6 types): advisor, instructor, TA, member, project-of, NONE Data set size: 11K pages 110K links 2million words

Flat Model ... ... ... From- Page To- Page Rel Word1 WordN Word1 WordN Type Rel NONE advisor instructor TA member project-of ... LinkWord1 LinkWordN

Flat Model ... ...

Collective Classification: Links From- Page To- Page Category Category ... ... Word1 WordN Word1 WordN Type Rel ... LinkWord1 LinkWordN

Link Model ... ...

Triad Model Advisor Professor Student Member Member Group

Triad Model Advisor Professor Student TA Instructor Course

Triad Model

Link Prediction: Results ... 72.9% relative reduction in error relative to strong flat approach Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (30% for all models)

Summary Use relational models to recognize entities & relations directly from raw data Collective classification: Classifies multiple entities simultaneously Exploits links & correlations between related entities Uses web of influence reasoning to reach strong conclusions from weak evidence Undirected PRMs allow high-accuracy discriminative training & rich graphical patterns

Learning Object Maps from Laser Range Data Dragomir Anguelov Daphne Koller Evan Parker Robotics Lab Stanford University

Occupancy Grid Maps Static world assumption Inadequate for answering symbolic queries person robot

Objects Entities with coherent properties: Shape Color Kinematics (Motion)

Object Maps Natural and concise representation Exploit prior knowledge about object models Walls are straight, doors open and close Learn global properties of the environment Primary orientation of walls, typical door width Generalize properties across objects Objects viewed as instances of object classes, parameter sharing

Learning Object Maps Define a probabilistic generative model Suggest object hypotheses Optimize the object parameters (EM) Select highest-scoring model Object Properties Object Segmentation

Laser Sensor Data

Probabilistic Model Data: A set of scans Global Map M Scan: set of <robot position, laser beam reading> tuples Each scan is associated with a static map M t Global Map M A set of objects {1, …, J} Each object i = {S[i ], Dt[i ]} S[i ] – static parameters Dt[i ] – dynamic parameters Non-static environment – dynamic parameters vary only between static maps Fully dynamic environment – dynamic parameters vary between scans

Probabilistic Model - II General map M Static maps M1… MT Objects i Robot positions sit Laser beams zit Correspondence variables Cit

Generative Model Specification Sensor model Object models Particular instantiation: Walls Doors Model score

Sensor Model Modeling occlusion: Why we should model occlusion: Reading ztk generated from: Random model (uniform probability) First object the beam intersects Actual object (Gaussian probability) MaxRange model (Delta function) Why we should model occlusion: Realistic sensor model Helps to infer motion Improved model search

Wall Object Model Wall model i A line defined by <i, i>, as in S intervals <1, 2> each denoting a segment along the line 2S + 2 independent parameters Collinear segments bias

Door Object Model Door Model i A pivot p Width w A set of angles t (t=1,2,…) Limited rotation (90o) Arc “center” d 4 static + 1 dynamic parameter

Model Score Maximize log-posterior probability of map M, data Z: increased data likelihood Increased number of parameters Maximize log-posterior probability of map M, data Z: Define structure prior p(M) over possible maps: |S[M]| — number of static parameters in M |D[M]| — number of dynamic parameters in M L[M] — total length of segments in M

Learning Model Parameters (EM) E-step Compute expectations M-step Walls Optimize line parameters Optimize segment ends Doors Optimize pivot and angles Optimize door width

Suggesting Object Hypotheses Wall hypotheses Use Hough transform (histogram-based approach) Compute preferred direction of the environment Use both to suggest lines Door hypotheses Use temporal differencing of static maps Check if points along segments in static maps Mt are well explained in the general map M If not, the segment is a potential door

Results for a Single Pass

Results for Two Passes

Future Work Simultaneous localization & mapping Object class hierarchies Dynamic environments Enrich the object representation: More sophisticated shape models Color 3D

Hierarchical Object Maps Learn to recognize object classes   Come visit our poster!