Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Joint work with: Nir Friedman, Hebrew U. Daphne Koller,

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang.
Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise.
A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Probabilistic Relational Models: A Tutorial Lise Getoor University of Maryland, College Park May 4, 2005.
Selectivity Estimation using Probabilistic Models Author: Lise Getoor, Ben Taskar, Daphne Koller Presenter: Qidan Cheng.
Statistical Relational Learning: A Quick Intro Lise Getoor University of Maryland, College Park.
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Apr, 8, 2015 Slide source: from David Page (MIT) (which were.
Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Hierarchical Probabilistic Relational Models for Collaborative Filtering Jack Newton and Russ Greiner
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Bayesian Belief Networks
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
SRL Approaches: Frame-based Probabilistic models February 11, 2005.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Bayesian Networks Alan Ritter.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Relational Probability Models Brian Milch MIT 9.66 November 27, 2007.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Statistical Relational Learning: An Introduction Lise Getoor University of Maryland, College Park September 5, 2007 Progic 2007.
UIUC CS 497: Section EA Lecture #8 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004 (Based on slides by Lise Getoor (UMD))
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Slides for “Data Mining” by I. H. Witten and E. Frank.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 34 Dec, 2, 2015 Slide source: from David Page (MIT) (which were.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Bibliography Scientific Paper ? ? ? Document Collection.
Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne.
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
General Graphical Model Learning Schema
Relational Probability Models
Bayesian Learning Chapter
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Chapter 14 February 26, 2004.
Presentation transcript:

Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Joint work with: Nir Friedman, Hebrew U. Daphne Koller, Stanford Avi Pfeffer, Harvard Ben Taskar, Stanford

Outline Motivation and Background PRMs w/ Attribute Uncertainty PRMs w/ Link Uncertainty PRMs w/ Class Hierarchies Statistical Relational Models

Discovering Patterns in Structured Data Patient Treatment Strain Contact

Learning Statistical Models Traditional approaches –work well with flat representations –fixed length attribute-value vectors –assume independent (IID) sample Patient flatten Problems: –introduces statistical skew –loses relational structure incapable of detecting link-based patterns –must fix attributes in advance Contact

Outline Background: »Bayesian Networks (BNs) [Pearl, 1988] –Probabilistic Relational Models (PRMs) Learning PRMs w/ Attribute Uncertainty PRMs w/ Link Uncertainty PRMs w/ Class Hierarchies Statistical Relational Models

Bayesian Networks nodes = random variables edges = direct probabilistic influence Network structure encodes independence assumptions: XRay conditionally independent of Pneumonia given Infiltrates XRay Lung Infiltrates Sputum Smear TuberculosisPneumonia

Bayesian Networks XRay Lung Infiltrates Sputum Smear TuberculosisPneumonia Associated with each node X i there is a conditional probability distribution P(X i |Pa i :  ) — distribution over X i for each assignment to parents –If variables are discrete, P is usually multinomial –P can be linear Gaussian, mixture of Gaussians, … p t p tp t t p T P P(I |P, T )

BN Semantics Compact & natural representation: –nodes have  k parents  2 k n vs. 2 n params conditional independencies in BN structure + local probability models full joint distribution over domain = X I S TP

Queries Full joint distribution specifies answer to any query: P(variable | evidence about others) XRay Lung Infiltrates Sputum Smear TuberculosisPneumonia XRay Sputum Smear

BN Learning BN models can be learned from empirical data –parameter estimation via numerical optimization –structure learning via combinatorial search. BN hypothesis space biased towards distributions with independence structure. Inducer Data X I S TP

Outline Background: Bayesian Networks (BNs) »Probabilistic Relational Models (PRMs) [Pfeffer, 2000] Learning PRMs w/ Attribute Uncertainty PRMs w/ Link Uncertainty PRMs w/ Class Hierarchies Statistical Relational Models

Probabilistic Relational Models Combine advantages of relational logic & Bayesian networks: –natural domain modeling: objects, properties, relations; –generalization over a variety of situations; –compact, natural probability models. Integrate uncertainty with relational model: –properties of domain entities can depend on properties of related entities; –uncertainty over relational structure of domain.

Relational Schema Strain Unique Infectivity Infected with Interacted with Describes the types of objects and relations in the databaseClasses Relationships Contact Close-Contact Skin-Test Age Patient Homeless HIV-Result Ethnicity Disease-Site Attributes Contact-Type

Probabilistic Relational Model Close-Contact Transmitted Contact-Type Disease Site Strain Unique Infectivity Patient Homeless HIV-Result POB Contact Age           Cont.Contactor.HIV Cont.Close-Contact Cont.Transmitted | P ,,,,, tt ft tf ff P(T | H, C) CH

Relational Skeleton Fixed relational skeleton  –set of objects in each class –relations between them Uncertainty over assignment of values to attributes PRM defines distribution over instantiations of attributes Strain s1 Patient p2 Patient p1 Contact c3 Contact c2 Contact c1 Strain s2 Patient p3

A Portion of the BN P1.Disease Site P1.Homeless P1.HIV-Result P1.POB C1.Close-Contact C1.Transmitted C1.Contact-Type C1.Age C2.Close-Contact C2.Transmitted C2.Contact-Type true falsetrue ,,,,, tt ft tf ff P(T | H, C) CH ,,,,, tt ft tf ff CH C2.Age

PRM: Aggregate Dependencies sum, min, max, avg, mode, count Disease Site Patient Homeless HIV-Result POB Age Close-Contact Transmitted Contact-Type Contact Age.. Patient Jane Doe POB US Homeless no HIV-Result negative Age ??? Disease Site pulmonary A. Contact #5077 Contact-Type coworker Close-Contact no Age middle-aged Transmitted false Contact #5076 Contact-Type spouse Close-Contact yes Age middle-aged Transmitted true Contact #5075 Contact-Type friend Close-Contact no Age middle-aged Transmitted false mode

PRM with AU Semantics Attributes Objects probability distribution over completions I : PRM relational skeleton  += Strain Patient Contact Strain s1 Patient p1 Patient p2 Contact c3 Contact c2 Contact c1 Strain s2 Patient p3

Learning PRMs w/ AU Database Patient Strain Contact Relational Schema Patient Contact Strain Parameter estimation Structure selection

Parameter Estimation in PRMs Assume known dependency structure S Goal: estimate PRM parameters  –entries in local probability models,  is good if it is likely to generate the observed data, instance I. MLE Principle: Choose   so as to maximize l As in Bayesian network learning, crucial property: decomposition separate terms for different X.A

ML Parameter Estimation Contact CloseContact Transmitted Patient HIV DiseaseSite Count Query for counts: Patient table Contact table    P ?? ?? ?? ??,,,,, tt ft tf ff P(T | H, C) CH           Cont.Contactor.HIV Cont.Close-Contact Cont.Transmitted | P

Structure Selection Idea: –define scoring function –do local search over legal structures Key Components: –legal models –scoring models –searching model space

Structure Selection Idea: –define scoring function –do local search over legal structures Key Components: »legal models –scoring models –searching model space

Legal Models author-of PRM defines a coherent probability model over a skeleton  if the dependencies between object attributes is acyclic How do we guarantee that a PRM is acyclic for every skeleton? Researcher Prof. Gump Reputation high Paper P1 Accepted yes Paper P2 Accepted yes sum

Attribute Stratification PRM dependency structure S dependency graph Paper.Accecpted Researcher.Reputation if Researcher.Reputation depends directly on Paper.Accepted dependency graph acyclic  acyclic for any  Attribute stratification: Algorithm more flexible; allows certain cycles along guaranteed acyclic relations

Structure Selection Idea: –define scoring function –do local search over legal structures Key Components: –legal models »scoring models –searching model space

Scoring Models Bayesian approach: Standard approach to scoring models; used in Bayesian network learning

Structure Selection Idea: –define scoring function –do local search over legal structures Key Components: –legal models –scoring models »searching model space

Searching Model Space Contact Strain Patient  score Delete C.C  C.T Contact Strain Patient  score Add S.I  S.U Strain Contact Patient Phase 0: consider only dependencies within a class

Contact Strain Patient  score Add S.I  P.D  score Add P.H  C.T Contact Strain Patient Contact Patient Strain Phase 1: consider dependencies from “neighboring” classes, via schema relations Phased Structure Search

 score Add S.I  C.T  score Add C.P  S.I Phase 2: consider dependencies from “further” classes, via relation chains Contact Strain Patient Contact Strain Patient Contact Strain Patient

Experimental Evaluation

Synthetic Data Simple ‘genetic’ domain Construct training set of various sizes Compare the log-likelihood of test set of size 100,000 –‘gold’ standard model –Learn parameters (model structure given) –Learn model (learn both structure and parameters)

Blood Type M-chromosome P-chromosome Person Result Contaminated Blood Test Blood Type M-chromosome P-chromosome Person Blood Type M-chromosome P-chromosome Person (Father) (Mother)

Error on Test Set

Error Variance

Errors in Learned Structure

TB Cases in SF Patient (  2300) Ethnicity Homeless diagnosis HIV result Disease-site X-ray Contact (  20000) Contact-type Age Care Infected Strain (  1000) Unique Drug-Resistance

hivres # contacts result transmitted infectivity smrpos care closecont ageatdx closecont hh_oohh ethnic # infected % infected hh_oohh contype homeless gender contype disease site contage xray pob Contact Strain Subcase Patient TB PRM

total assets # roles rtn earn assets age  rtn assets fired # employees top_role  total_assets retired salary Company Role Prev- Role Person SEC PRM 20, ,000 40,000

Outline Motivation and Background PRMs w/ Attribute Uncertainty »PRMs w/ Link Uncertainty PRMs w/ Class Hierarchies Statistical Relational Models

Introduction Topic TheoryAI Agent Theory papers Cornell Scientific Paper Topic TheoryAI Attributes of object Attributes of linked objects Attributes of heterogeneous linked objects Collective Classification

Our Approach Motivation: relational structure provides useful information for density estimation and prediction Construct probabilistic models of relational structure that capture link uncertainty Here we propose two new mechanisms: –Reference uncertainty –Existence uncertainty

PRMs w/ AU: another example Vote Rank Movie IncomeGender Person AgeGenre PRM consists of: Relational Schema Dependency Structure           Vote.Person.Gender, Vote.Person.Age Vote.Movie.Genre, Vote.Rank | P   Local Probability Models

Fixed relational skeleton  : –set of objects in each class –relations between them Movie m1 Vote v1 Movie: m1 Person: p1 Person p2 Person p1 Movie m2 Uncertainty over assignment of values to attributes PRM w/ Attribute Uncertainty Vote v2 Movie: m1 Person: p2 Vote v3 Movie: m2 Person: p2 Primary Keys Foreign Keys

PRM w/ AU Semantics Attributes Objects Ground BN defining distribution over complete instantiations of attributes I : PRM relational skeleton  += Patient p2 Vote Movie Person Movie Vote Vote Person Movie Vote

Issue PRM w/ AU applicable only in domains where we have full knowledge of the relational structure Next we introduce PRMs which allow uncertainty over relational structure…

PRMs w/ Link Uncertainty Advantages: –Applicable in cases where we do not have full knowledge of relational structure –Incorporating uncertainty over relational structure into probabilistic model can improve predictive accuracy Two approaches: –Reference uncertainty –Existence uncertainty Different probabilistic models; varying amount of background knowledge required for each

Citation Relational Schema Wrote Paper Topic Word1 WordN … Word2 Paper Topic Word1 WordN … Word2 Cites Count Citing Paper Cited Paper Author Institution Research Area

Attribute Uncertainty Paper Word1 Topic WordN Wrote Author... Research Area P( WordN | Topic) P( Topic | Paper.Author.Research Area Institution P( Institution | Research Area)

Reference Uncertainty Bibliography Scientific Paper ` ? ? ? Document Collection

PRM w/ Reference Uncertainty Cites Cited Citing Dependency model for foreign keys Paper Topic Words Paper Topic Words Naïve Approach: multinomial over primary key noncompact limits ability to generalize

Reference Uncertainty Example Paper P5 Topic AI Paper P4 Topic AI Paper P3 Topic AI Paper M2 Topic AI Paper P1 Topic Theory Cites Cited Citing Paper P5 Topic AI Paper P3 Topic AI Paper P4 Topic Theory Paper P2 Topic Theory Paper P1 Topic Theory Paper.Topic = AI Paper.Topic = Theory P1 P2 Paper Topic Words P1 P P1 P Topic Theory AI

PRMs w/ RU Semantics PRM-RU + entity skeleton   probability distribution over full instantiations I Cites Cited Citing Paper Topic Words Paper Topic Words PRM RU Paper P5 Topic AI Paper P4 Topic Theory Paper P2 Topic Theory Paper P3 Topic AI Paper P1 Topic ??? Paper P5 Topic AI Paper P4 Topic Theory Paper P2 Topic Theory Paper P3 Topic AI Paper P1 Topic ??? Reg Cites entity skeleton 

Learning PRMs w/ RU Idea: just like in PRMs w/ AU –define scoring function –do greedy local structure search Issues: –expanded search space construct partitions new operators

Learning Idea: –define scoring function –do phased local search over legal structures Key Components: –legal models –scoring models –searching model space PRMs w/ RU model new dependencies new operators unchanged

Structure Search: New Operators Cites Cited Citing Paper Topic Words Paper Topic Words Citing Papers 1.0 Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Topic = AI Δscore Refine on Topic Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Paper Δscore Refine on Author.Instition Author Institution Institution = MIT

PRMs w/ RU Summary Define semantics for uncertainty over foreign-key values Search now includes operators Refine and Abstract for constructing foreign-key dependency model Provides one simple mechanism for link uncertainty

Existence Uncertainty Document Collection ? ? ?

PRM w/ Exists Uncertainty Cites Dependency model for existence of relationship Paper Topic Words Paper Topic Words Exists

Exists Uncertainty Example Cites Paper Topic Words Paper Topic Words Exists Citer.Topic Cited.Topic Theory FalseTrue AI Theory AI AI Theory

PRMs w/ EU Semantics PRM-EU + object skeleton   probability distribution over full instantiations I Paper P5 Topic AI Paper P4 Topic Theory Paper P2 Topic Theory Paper P3 Topic AI Paper P1 Topic ??? Paper P5 Topic AI Paper P4 Topic Theory Paper P2 Topic Theory Paper P3 Topic AI Paper P1 Topic ??? object skeleton  ??? PRM EU Cites Exists Paper Topic Words Paper Topic Words

Learning PRMs w/ EU Idea: just like in PRMs w/ AU –define scoring function –do greedy local structure search Issues: –efficiency Computation of sufficient statistics for exists attribute Do not explicitly consider relations that do not exist

Structure Selection Idea: –define scoring function –do phased local search over legal structures Key Components: –legal models –scoring models –searching model space PRMs w/ EU model new dependencies unchanged

Experiment I: EachMovie + thriller action horrorgender theater_status gender video_status age animation art_foreignclassic personal_income comedydrama rank household_income familyromance Movie Person Movie Actor MOVIE ROLE VOTE PERSON ACTOR education * © Internet Movie Database Limited † Size: 1600 Size: 35,000 Size: 50,000 Size: 25,000 Size: 300,000 * †

EachMovie + PRM-RU thriller action horror gender theater_status gender video_status age animation art_foreign classic personal_income comedy drama rank household_income family romance Movie Person Movie Actor MOVIE ROLE VOTE PERSON ACTOR education M F Action true false Typical Voter: male, young adult, college w/o degree, middle income

EachMovie + PRM-EU age comedy drama rank gender family personal_income horror romance exists household_income thriller exists gender theater_status video_status action education animation art_foreign classic MOVIE ROLE VOTE PERSON ACTOR + - Men much more likely to vote on action movies

Experiment II: Prediction Paper P506 Paper P516 Topic Reinforcement Learning Words … Paper P1309 Topic Probabilistic Reasoning Words … Paper P289 Topic Reinforcement Learning Words … Cited Papers Paper P134 Topic Reinforcement Learning Words … Paper P1067 Topic Reinforcement Learning Words … Citing Papers Topic ?? w1wN...

Domains Cites Exists Paper Topic w1wN... Paper Topic w1wN... cited paperciting paper Cora Dataset, McCallum, et. al Link Exists Web Page Category w1wN... Category w1wN... From PageTo Page Web Page WebKB, Craven, et. al

Prediction Accuracy

Experiment III: Collective Classification Paper#2 Topic Paper#3 Topic WordN Paper#1 Word1 Topic... Author#1 Area Ins t #1-#2 Author#2 Area Inst Exists #2-#3 Exists #2-#1 Exists #3-#1 Exists #1-#3 Exists WordN Word1 WordN Word1 Exists WordN Word1 WordN Word1 WordN Word1 Exists Ins t Topic Area Topic Area Topic Area #3-#2

Inference in Unrolled BN Prediction requires inference in “unrolled” network –Infeasible for large networks –Use approximate inference for E-step Loopy belief propagation (Pearl, 88; McEliece, 98) –Scales linearly with size of network –Guaranteed to converge only for polytrees –Empirically, often converges in general nets (Murphy,99) Local message passing –Belief messages transferred between related instances –Induces a natural “influence” propagation behavior Instances give information about related instances

... From-Page Category Word1 WordN Exists From To Link Hub To-Page Word Anchor Has... Category Word1 WordN Hub Web Domain

WebKB Results* * from “Probabilistic Models of Text and Link Structure for Hypertext Classification”, Getoor, Segal, Taskar and Koller in IJCAI 01 Workshop Text Learning: Beyond Classification

Outline Motivation and Background PRMs w/ Attribute Uncertainty PRMs w/ Structural Uncertainty »PRMs w/ Class Hierarchies Statistical Relational Models

From Instances to Classes in Probabilistic Relational Models Compare two approaches –Probabilistic Relational Models (PRMs) –Bayesian Network (BNs) PRMs with Class Hierarchies (PRM-CH) –bridge gap between BNs and PRMs Learning PRM-CHs –hierarchy supplied –discovering hierarchy

Vote Program Voter Ranking PRM for Collaborative Filtering Vote Program Voter Ranking Income bs hs sitcom bs doc hs doc hml E G sitcom + Dependency Model TV-Program Genre Budget Time-slot Network TV-Program Genre Budget Time-slot Network Relational Schema Person Age Gender Education

PRM Instantiation TV-Program Nova Genre doc Budget low Timeslot primetime Network PBS Person Jane Doe Age elderly Gender female Education bs Income medium Vote #5630 Ranking ? TV-Program Seinfeld Genre sitcom Budget high Timeslot rerun Network ABC TV-Program Frasier Genre sitcom Budget medium Timeslot primetime Network ABC Person John Deer Age middle-aged Gender male Education hs Income low Vote #5631 Ranking ? Vote #5632 Ranking ? Vote #5632 Ranking ? Vote #5633 Ranking ? Income bs hs sitcom bs doc hs doc hmlEG sitcom

BN for Collaborative filtering Law & Order Frasier NBC Monday Night Movies Mad about you Beverly Hills Seinfeld Friends Melrose Place Models Inc. Breese, et al. UAI-98

Limitations of PRMs In PRM, all instances of the same class must use the same dependency mode, it cannot distinguish: –documentaries and sitcoms –“60 Minutes” and Seinfeld PRM cannot have dependencies that are “cyclic” –ranking for Frasier depends on ranking for Friends

Limitations of BNs In BN, each instance has its own dependency model, cannot generalize over instances –If John tends to like sitcoms, he will probably like next season’s offerings –whether a person enjoys sitcom reruns depends on whether they watch primetime sitcoms BN can only model relationships between at most one class of instances at a time –In previous model, cannot model relationships between people –if my roommate watches Seinfeld I am more likely to join in

Desired Model Allows both class and instance dependencies WWWF Person Age Gender Education Income Soap Genre Budget Time-slot Network Genre Budget Time-slot Network Documentary Sitcom-Vote Program Voter Ranking Doc-Vote Program Voter Ranking Vote Program Voter Ranking TV-Program Genre Budget Time-slot Network

PRMs w/ Class Hierarchies Allows us to: Refine a “heterogenous” class into more coherent subclasses Refine probabilistic model along class hierarchy –Can specialize/inherit CPDs –Construct new dependencies that were originally “acyclic” Provides bridge from class-based model to instance-based model

PRM-CH Person Age TV-Program Genre Budget Time-slot Network Gender Education Income Vote Program Voter Ranking Relational Schema Class Hierarchy SoapOpera TV-Program SitComDocumentaryDrama Legal-DramaMedical-Drama Dependency Model Budget SoapOpera Budget TV -Program Budget SitCom Budget Documentary Budget Drama Budget Legal-Drama Budget Medical -Drama Koller & Pfeffer 1998 Pfeffer 2000

Learning PRM-CHs Relational Schema Database: TVProgram Person Vote Person Vote TVProgram Instance I Class hierarchy provided Learn class hierarchy

Bayesian Model Selection for PRMs Idea: –define scoring function –do phased local search over legal structures Key Components: –scoring models –searching model space PRM-CHs new operators unchanged

Guaranteeing Acyclicity with Subclasses Soap-Vote Program Voter Ranking Doc-Vote Program Voter Ranking Soap-Vote.Ranking Doc-Vote.Ranking Vote Program Voter Ranking Vote.Ranking Vote.Class

Scenario 1: Class hierarchy is provided New Operators –Specialize/Inherit Learning PRM-CH Budget SoapOpera Budget TV -Program Budget SitCom Budget Drama Budget Legal-Drama Budget Medical -Drama Budget Documentary

Learning Class Hierarchy Issue: partially observable data set Construct decision tree for class defined over attributes observed in training set TV-Program.Genre sitcom drama class1 class3 documentary class2 class4 English TV-.Network.Nationality class5 French class6 American New operator – Split on class attribute – Related class attribute

MOVIE Animation Famil y Dram a Comed y Romance Actio n Horror Thriller Theater Status Video Status Art/Foreig n Classic VOTE Rating PERSON Gender Age Personal Income Household Income Education EachMovie + PRM 1400 Movies 5000 People 240,000 Votes

Theater Status Video Status Art/Foreign Classic Drama ROMANCE-MOVIE Animation Family Horror Thriller Gender Age Personal Income Household Income Education PERSON ROMANCE-VOTE Rating OTHER-VOTE Rating COMEDY-VOTE Rating ACTION-VOTE Rating Theater Status Video Status Art/Foreign Classic Drama ACTION-MOVIE Animation Family Horror Thriller Theater Status Video Status Art/Foreign Classic Drama COMEDY-MOVIE Animation Family Horror Thriller PRM-CH Theater Status Video Status Art/Foreign Classic OTHER-MOVIE Thriller Drama Horror Animation Family

Comparison 5 Test Sets: 1000 votes, ~100 movies, ~115 people –PRM Mean LL: -12,079, std –PRM-CH Mean LL: , std Using standard t-test, PRM-CH model outperforms PRM model with over 99% confidence

PRM-CH Summary PRMs with class hierarchies are a natural extension of PRMs: –Specialization/Inheritance of CPDs –Allows new dependency structures Provide bridge from class-based to instance- based models Learning techniques proposed –Need efficient heuristics –Empirical validation on real-world domains

Outline Motivation and Background PRMs w/ Attribute Uncertainty PRMs w/ Structural Uncertainty PRMs w/ Class Hierarchies »Statistical Relational Models Summary

Statistical Relational Models Capture the frequency information, rather than probabilistic information about individuals Application to selectivity estimation, SIGMOD01 Here, development of theory

A Comparison of Two Aproaches Possible WorldsDomain Frequency Generalization Compression Syntax - (almost) same Learning - (almost) same Semantics - (very) different Inference - (very) different

Crucial for –Multirelational data mining –cost-based query optimization –query profilers Key: joint frequency distribution F D (A 1,…,A n ) Application: Query Result Size Estimation F D (X,Y) = x1x2x3x1x2x3 y 1 y 2 y 3 y 4 F D ( x 2, y 3 ) Problem: exponential in # of attributes v n  representing distribution exactly is infeasible

Traditional Approaches to Selectivity Estimation Approximate joint distribution by making several key independence assumptions: –Attribute Value Independence: joint distribution is product of single attribute distributions –Join Uniformity Assumption: tuple in one relation is equally likely to join with any tuple in the other relation

SRMs Use graphical models to compactly represent joint distribution –over single table for select selectivity –over multiple tables for join selectivity Provides a unified framework for estimating the selectivity of select-join queries over multiple tables

System Architecture Model Constructor Database offline Selectivity Estimator execution time Query Q Size(Q) Methods for incremental maintenance of BN Friedman and Goldszmidt, 1997

BN Construction* Heuristic search over graph and tree structure at nodes Learn more complex networks when required, simpler networks when possible; subject to storage size restrictions Key computational step: –computation of sufficient statistics - frequency of different instantiations of a node and its parents in DB H I C AE Database Construct BN B s.t. P D (A 1,…,A n )  F D (A 1,…,A n )/ |R| * Cooper and Herkovits, 1992; Heckerman, 1995

BNs for Selectivity Estimation Query: select * from R where R.A 1 = a 1 and … and R.A k = a k Size(Q) = |R|  P D (a 1,…,a k ) Use Bayesian inference algorithm* to compute P D (a 1,…,a k ) Algorithm complexity depends on BN connectivity; efficient in practice * Pearl, 1988; Lauritzen and Spiegelhalter, 1988

Foreign-key Join Selectivity PersonPurchase Uniform Join Assumption Size(Purchase Person) = | Purchase | Assuming referential integrity

Correlated Attributes PersonPurchase Income = high Income = low Type = luxury Type = necessity

Skewed Join PersonPurchase Income = high Income = low Type = luxury Type = necessity

Join Indicator S R Query: select * from R, S where R.F = S.K and R.A = a and S.B = b P(J F ) = prob. randomly chosen tuple from R joins with a randomly chosen tuple from S size(Q) = | R | | S | P(J F, a, b)

Universal Foreign Key Closure A DB schema is table-stratified if we can order the tables s.t. if R.F refers to S.K, S precedes R.F in the stratification ordering The universal foreign key closure is the query constructed by introducing a tuple variable for each leaf in the stratification, and, introducing, for each foreign key, a new tuple variable

Universal Foreign Key Closure Schema: R, S, T R.F refers to S, S.F refers to T stratification: T < S < R r s t r s t r.F 1 = s.K s.F 2 = t.K Schema: R, S R.F 1 refers to S, R.F 2 refers to S stratification: S < R r s1 s2 r r.F 1 = s 1.K s1s1 s2s2 r.F 2 = s 2.K

Statistical Relational Models Model distribution of attributes across multiple tables Allow attribute values to depend on attributes in the same table (like a BN) Allow attribute values to depend on attributes in other tables along a foreign key join Can model the join probability of two tuples using join indicator variable

Example SRM Person Income Age School Prestige J school Purchase J person Type , Type=necessity false true false true 0.999, Income = high 0.99, 0.01 Attended Bought-by

Path Dependency Graph Construct path dependency graph T C S JTJT U D JSJS 2 R A JSJS 1 B Q JRJR E JUJU t 1.C s 1.J T q.J R q.E q.J U u.D u.J S 2 r.A r.J S 1 s 1.B t 2.C s 2.J T s 2.B 21

SRM Semantics Theorem: If D is a model of  then P U (V) = P  (V) Definition: D is a model of  if over same table-stratified schema I U (V,nondescendants(V)|Pa(V),J * =T) P U (V|Pa(V),J * =T) = P  (V|Pa(V),J*=T)

Answering Queries Using SRMs Construct Query Evaluation BN for Query: select * from Person, Purchase where Person.id = Purchase.buyer-id and Person.Income = high and Purchase.Type = luxury Compute upward closure of query attributes by including all parents as well Person Income Age School Prestige Purchase J person Type J person Income Type J school Age Prestige J school

SRM Learning Learn parameters & qualitative dependency structure Extend known techniques for learning Bayesian networks from data and learning PRMs Database Patient Strain Contact

Structure selection Idea: like in BNs –define scoring function Generalization: Bayesian Score, MDL Compression: modified LL –do greedy local structure search Issues: –immense set of structures searching over large space –efficiency sufficient statistics harder to compute: associated with multiple entities requires intelligent use of DB technology

closecont hiv age smear care us_born hhoohh J_patient infected disease_site J_strain gender contage treatment race contype homeless unique Strain Contact Patient SRM for TB Database

hiv us_born J_strain gender race homeless unique Query Evaluation BN for TB Query: select * from patient, strain where patient.strain = strain.id and patient.homeless = true

Experimental Setup Compare –Sampling –BN w/ Uniform Join –SRM with same storage restrictions Two realworld databases: TB database and financial database* Relative error on three different query sets; each query joins three relations, select on one attribute from each relation *

Results on Select-Join Queries Account 4.5K tuples Transaction 106K tuples District 77 tuples Query Set Average Relative Error (%) SAMPLE BN+UJ SRM Patient 2.3K tuples Contact 20K tuples Strain 1K tuples Construction Time: 124 sec Estimation Time: sec Construction Time: 157 sec Estimation Time: sec Query Set Average Relative Error (%) SAMPLE BN+UJ SRM

SRMs vs PRMs Syntax - (almost) same Learning - (almost) same Semantics - (very) different Inference - (very) different

Conclusions PRMs can represent distribution over attributes from multiple tables PRMs can capture link uncertainty PRMs allow inferences about individuals while taking into account relational structure (they do not make inapproriate independence assuptions) SRMs provide a unified framework for selectivity estimation for both select and join operations SRMs provide extremely compact model that captures frequency information in multirelational data

Selected Publications “Learning Probabilistic Models of Link Structure”, L. Getoor, N. Friedman, D. Koller and B. Taskar, JMLR “Probabilistic Models of Text and Link Structure for Hypertext Classification”, L. Getoor, E. Segal, B. Taskar and D. Koller, IJCAI WS ‘Text Learning: Beyond Classification’, “Selectivity Estimation using Probabilistic Models”, L. Getoor, B. Taskar and D. Koller, SIGMOD-01. “Learning Probabilistic Relational Models”, L. Getoor, N. Friedman, D. Koller, and A. Pfeffer, chapter in Relation Data Mining, eds. S. Dzeroski and N. Lavrac, –see also N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, IJCAI-99. “Learning Probabilistic Models of Relational Structure”, L. Getoor, N. Friedman, D. Koller, and B. Taskar, ICML-01. “From Instances to Classes in Probabilistic Relational Models”, L. Getoor, D. Koller and N. Friedman, ICML Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries, Notes from AAAI Workshop on Learning Statistical Models from Relational Data, eds. L.Getoor and D. Jensen, Notes from IJCAI Workshop on Learning Statistical Models from Relational Data, eds. L.Getoor and D. Jensen, See