Pseudo-Likelihood for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada To appear at SIAM SDM conference.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Introduction of Probabilistic Reasoning and Bayesian Networks
A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers School of Computing Science Simon Fraser University Vancouver, Canada.
Random Regression: Example Target Query: P(gender(sam) = F)? Sam is friends with Bob and Anna. Unnormalized Probability: Oliver Schulte, Hassan Khosravi,
School of Computing Science Simon Fraser University Vancouver, Canada.
Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
A Tractable Pseudo-Likelihood for Bayes Nets Applied To Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Causal Modelling for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
Today Logistic Regression Decision Trees Redux Graphical Models
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
A Brief Introduction to Graphical Models
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Learning Bayesian Networks for Relational Databases Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Markov Logic And other SRL Approaches
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International joint work with Eyal Amir and Dan Roth.
Announcements Project 4: Ghostbusters Homework 7
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Lecture 2: Statistical learning primer for biologists
Challenge Paper: Marginal Probabilities for Instances and Classes Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
Learning Bayesian Networks for Complex Relational Data
First-Order Bayesian Networks
Oliver Schulte Machine Learning 726
Reasoning Under Uncertainty: Belief Networks
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
General Graphical Model Learning Schema
Artificial Intelligence
CS 4/527: Artificial Intelligence
MANAGING DATA RESOURCES
CAP 5636 – Advanced Artificial Intelligence
Read R&N Ch Next lecture: Read R&N
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
Discriminative Probabilistic Models for Relational Data
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Presentation transcript:

Pseudo-Likelihood for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada To appear at SIAM SDM conference on data mining.

2 The Main Topic In relational data, units are interdependent  no product likelihood function for model. How to do model selection? Proposal of this talk: use pseudo likelihood. Unnormalized product likelihood. Like independent-unit likelihood, but with event frequencies instead of event counts. Pseudo-Likelihood for Relational Data - Statistics Seminar

Overview 3 Define pseudo log-likelihood for directed graphical models (Bayes Nets). Interpretation as expected log-likelihood of random small groups of units. Learning Algorithms: MLE solution. Model Selection. Simulations. Pseudo-Likelihood for Relational Data - Statistics Seminar

Outline 4 Brief intro to relational databases. Statistics and Relational Databases. Briefer intro to Bayes nets. Relational Random Variables. Relational (pseudo)-likelihoods. Pseudo-Likelihood for Relational Data - Statistics Seminar

Relational Databases Pseudo-Likelihood for Relational Data - Statistics Seminar s: Computers are spreading. Many organizations use them to store their data. Ad hoc formats  hard to build general data management systems.  lots of duplicated effort. The Standardization Dilemma: Too restrictive: doesn’t fit users’ needs. Too loose: back to ad-hoc solutions.

The Relational Format Pseudo-Likelihood for Relational Data - Statistics Seminar 6 Codd (IBM Research 1970) The fundamental question: What kinds of information do users need to represent? Answered by 1 st -order predicate logic! (Russell, Tarski). The world consists of Individuals/entities. Relationships/links among them.

Tabular Representation 7 Tables for Entity Types, Relationships. Student s-idIntelligenceRanking Jack31 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty RA s-idp-idSalaryCapability JackOliverHigh3 KimOliverLow1 PaulJimMed2 Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 Pseudo-Likelihood for Relational Data - Statistics Seminar

Database Management Systems Pseudo-Likelihood for Relational Data - Statistics Seminar 8 Maintain data in linked tables. Structured Query Language (SQL) allows fast data retrieval. E.g., find all SFU students who are statistics majors with gpa > 3.0. Multi-billion dollar industry, $15+ bill in IBM, Microsoft, Oracle, SAP, Peoplesoft.

Relational Domain Models Pseudo-Likelihood for Relational Data - Statistics Seminar 9 Visualizing Domain Ontology. Active Area of Research. Unified Modelling Language (UML). Semantic Web (XML). Classic Tool: The Entity-Relationship (ER) Diagram.

ER Diagram Example Pseudo-Likelihood for Relational Data - Statistics Seminar 10 Students Professors Courses teaches Registered nameintelligence ranking namepopularity teaching ability number difficulty rating grade satisfaction

ER Model for Social Network Pseudo-Likelihood for Relational Data - Statistics Seminar 11 Actors Friend name Smokes Cancer Ring Diagram NameSmokes Cancer AnnaTT BobTF Name1Name2 AnnaBob Anna ActorsFriend Data Tables Bob Anna Smokes = true Cancer = true Smokes = true Cancer = false Social Network

Pseudo-Likelihood for Relational Data - Statistics Seminar 12

Relationship to Social Network Analysis Pseudo-Likelihood for Relational Data - Statistics Seminar 13 A single-relation social network is a simple special case of a relational database. Converse also true if you allow: Different types of nodes (“actors”). Labels on nodes. Different types of (hyper)edges. Labels on edges. See Newman (2003) SIAM Review. Observation A relational database is equivalent to a general network as described.

Outline 14  Brief intro to relational databases. Statistics and Relational Databases. Briefer intro to Bayes nets. Relational Random Variables. Relational (pseudo)-likelihoods. Pseudo-Likelihood for Relational Data - Statistics Seminar

Beyond storing and retrieving data Pseudo-Likelihood for Relational Data - Statistics Seminar 15 Much new interest in analyzing databases. Data Mining. Data Warehousing. Business Intelligence. Predictive Analytics. Fundamental Question: how to combine logic and probability? Domingos (U of W, CS): “Logic handles complexity, probability represents uncertainty.”

Typical Tasks for Statistical-Relational Learning (SRL) Pseudo-Likelihood for Relational Data - Statistics Seminar Link-based Classification: given the links of a target entity and the attributes of related entities, predict the class label of the target entity. Link Prediction: given the attributes of entities and their other links, predict the existence of a link.

Link-based Classification Predict Attributes given Links, other Attributes E.g., P(diff(101))? 17 Student s-idIntelligenceRanking Jack31 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty 1013??? RA s-idp-idSalaryCapability JackOliverHigh3 KimOliverLow1 PaulJimMed2 Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 Pseudo-Likelihood for Relational Data - Statistics Seminar

Link prediction Predict links given links, attributes. E.g.,P(Registered(jack,101))? 18 Student s-idIntelligenceRanking Jack31 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty RA s-idp-idSalaryCapability JackOliverHigh3 KimOliverLow1 PaulJimMed2 Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 Pseudo-Likelihood for Relational Data - Statistics Seminar

Generative Models Pseudo-Likelihood for Relational Data - Statistics Seminar 19 Model the joint distribution over links and attributes. Today’s Topic. We’ll use Bayes nets as the model class.

Qualitative part: Directed acyclic graph (DAG) Nodes - random vars. Edges - direct influence Quantitative part: Set of conditional probability distributions e b e be b b e BE P(A | E,B) Family of Alarm Earthquake Radio Burglary Alarm Call Compact representation of joint probability distributions via conditional independence Together: Define a unique distribution in a factored form What is a Bayes (belief) net? Figure from N. Friedman

Why are Bayes nets useful? Graph structure supports Modular representation of knowledge Local, distributed algorithms for inference and learning Intuitive (possibly causal) interpretation A solution to the relevance problem: Easy to compute “Is X relevant to Y given Z”. Nice UBC Demo. Nice UBC Demo Pseudo-Likelihood for Relational Data - Statistics Seminar

Outline 22  Brief intro to relational databases.  Statistics and Relational Databases.  Briefer intro to Bayes nets. Relational Random Variables. Relational (pseudo)-likelihoods. Pseudo-Likelihood for Relational Data - Statistics Seminar

Relational Data: what are the random variables? Pseudo-Likelihood for Relational Data - Statistics Seminar Intuitively, the attributes and relationships in the database. i.e., the columns plus link existence. i.e., the components of the ER diagrams. Proposal from David Poole (CS UBC): apply the concept of functors from Logic Programming. I’m combining this with Halpern (CS Cornell) and Bacchus’ (CS U of T) random selection probabilistic semantics for logic. 23

Population Variables Pseudo-Likelihood for Relational Data - Statistics Seminar 24 Russell: “A good notation thinks for us”. Consider a model with multiple populations. Let X 1, X 2, Y 1,Y 2,.. be population variables. Each variable represents a random draw from a population. Population variables are jointly independent. A functor f is a function of one or more population variables. A functor random variable is written as f 1 (X) or f 2 (X,Y) or f 3 (X,Y,Z).

Unary Functors = Descriptive Attributes of Entities Pseudo-Likelihood for Relational Data - Statistics Seminar 25 Population of Students, Professors. Population variables S,P. Attributes r.v.s age(S), gpa(S), age(P),rank(P). Can have several selections age(S 1 ),age(S 2 ). If S is uniform over students in the database: P(gpa(S)=3.0) = empirical or database frequency of 3.0 gpa in student population. Can instantiate or ground functors with constants. E.g., gpa(jack) returns the gpa of Jack.

Binary Functors = Relationships Pseudo-Likelihood for Relational Data - Statistics Seminar 26 Registered(S,C): indicator function of existence of relationship. If S,C uniformly distributed over observed population: P(Registered(S,C)=1) = #(s,c) s.t. Student s is registered in course c/ #Students x #Courses. = Database Frequency of Registration. Can also form chains: P(grade(S,C)=A,Teaches(C,P)=1).

Functor Bayes Nets Pseudo-Likelihood for Relational Data - Statistics Seminar 27 Poole IJCAI 2003: A functor Bayes Net is a Bayes net whose nodes are functor random variables.

Likelihood Functions for Functor Bayes Nets: Latent Variables Pseudo-Likelihood for Relational Data - Statistics Seminar 28 Problem: Given a database D and an FBN model B, how to define P(D|B)? Fundamental Issue: interdependent units, not iid. One approach: introduce latent variables such that units are independent conditional on hidden “state” (e.g., Kersting et al. IJCAI 2009). Cf. social network analysis Hoff, Rafferty (U of W Stats), Linkletter SFU Stats. Cf. nonnegative matrix factorization----Netflix challenge.

Pseudo-Likelihood for Relational Data - Statistics Seminar 29 For single table T: Likelihood Function for Single-Table Data Parameter of Bayes net Table count of co- occurrences of child node value and parent state Cancer(Y)Smokes(Y) NameSmokes Cancer AnnaTT BobTF Actors

Proposed Pseudo Log-Likelihood Pseudo-Likelihood for Relational Data - Statistics Seminar 30 For database D: Parameter of Bayes net Database joint frequency of child node value and parent state NameSmokes Cancer AnnaTT BobTF Actors Cancer(Y) Smokes(X)Friend(X,Y) Smokes(Y) Name1Name2 AnnaBob Anna Friend

Random Selection Log-Likelihood Randomly select instances X 1 = x 1,…,X n =x n. for each variable in FBN. 2. Look up their properties, relationships in database. 3. Compute log-likelihood for the FBN assignment obtained from the instances. 4. L R = expected log-likelihood over uniform random selection of instances. Proposition The random selection log-likelihood equals the pseudo log-likelihood. Cancer(Y) Smokes(X)Friend(X,Y) Smokes(Y) L R = -( )/4 ≈ -1.8

Parameter Estimation Pseudo-Likelihood for Relational Data - Statistics Seminar 32 Proposition For a given database D, the parameter values that maximize the pseudo likelihood are the empirical conditional frequencies.

Model Selection Pseudo-Likelihood for Relational Data - Statistics Seminar 33 New model selection algorithm (Khosravi, Schulte et al. AAAI 2010). Level-wise search through table join lattice.

Running time on benchmarks Time in Minutes. NT = did not terminate. x + y = structure learning + parametrization (with Markov net methods). JBN: Our join-based algorithm. MLN, CMLN: standard programs from the U of Washington (Alchemy) 34

Accuracy 35 Basically, leave- one-out average.

Future Work: Inference Pseudo-Likelihood for Relational Data - Statistics Seminar 36 Prediction is usually based on knowledge-based model construction (Ngo and Haddaway, 1997; Koller and Pfeffer, 1997; Haddaway, 1999). Basic Idea: instantiate population variables with all population members. Predict using instantiated model. With Bayes nets, can lead to cycles. My conjecture: cycles can be handled with a normalization constant that has a closed form. Help?!

Summary: Likelihood for relational data. Pseudo-Likelihood for Relational Data - Statistics Seminar 37 Combining relational databases and statistics. Very important in practice. Combine logic and probability. Interdependent units  hard to define model likelihood. Proposal: Consider a randomly selected small group of individuals. Pseudo log-likelihood = expected log-likelihood of randomly selected group.

Summary: Statistics with Pseudo- Likelihood Pseudo-Likelihood for Relational Data - Statistics Seminar 38 Theorem: Random pseudo log-likelihood equivalent to standard single-table likelihood, replacing table counts with database frequencies. Maximum likelihood estimates = database frequencies. Efficient Model Selection Algorithm based on lattice search. In simulations, very fast (minutes vs. days), much better predictive accuracy.

Thank you! 39 Any questions? Pseudo-Likelihood for Relational Data - Statistics Seminar

Choice of Functors Pseudo-Likelihood for Relational Data - Statistics Seminar 40 Can have complex functors, e.g. Nested: wealth(father(father(X))). Aggregate: AVG C {grade(S,C): Registered(S,C)}. In remainder of this talk, use functors corresponding to Attributes (columns), e.g., intelligence(S), grade(S,C) Boolean Relationship indicators, e.g. Friend(X,Y).

Hidden Variables Avoid Cycles Pseudo-Likelihood for Relational Data - Statistics Seminar 41 Rich(X)Friend(X,Y)Rich(Y) U(X) U(Y)U(Y) Assign unobserved values u(jack), u(jane). Probability that Jack and Jane are friends depends on their unobserved “type”. In ground model, rich(jack) and rich(jane) are correlated given that they are friends, but neither is an ancestor. Common in social network analysis (Hoff 2001, Hoff and Rafferty 2003, Fienberg 2009). $1M prize in Netflix challenge. Also for multiple types of relationships (Kersting et al. 2009). Computationally demanding.

Typical Tasks for Statistical-Relational Learning (SRL) Pseudo-Likelihood for Relational Data - Statistics Seminar Link-based Classification: given the links of a target entity and the attributes of related entities, predict the class label of the target entity. Link Prediction: given the attributes of entities and their other links, predict the existence of a link. 42