Label and Link Prediction in Relational Data

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Dynamic Bayesian Networks (DBNs)
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Apr, 8, 2015 Slide source: from David Page (MIT) (which were.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI.
Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Probabilistic Models of Object-Relational Domains
LAC group, 16/06/2011. So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Lecture 7: Constrained Conditional Models
Deep Feedforward Networks
Reasoning Under Uncertainty: More on BNets structure and construction
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Qian Liu CSE spring University of Pennsylvania
Chapter 7: Entity-Relationship Model
Reasoning Under Uncertainty: More on BNets structure and construction
CSCI 5822 Probabilistic Models of Human and Machine Learning
Morphological Segmentation Inside-Out
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
CAP 5636 – Advanced Artificial Intelligence
Markov Random Fields Presented by: Vladan Radosavljevic.
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Spring 2007
Chapter 7: Entity-Relationship Model
Discriminative Probabilistic Models for Relational Data
Plate Models Template Models Representation Probabilistic Graphical
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Markov Networks.
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Label and Link Prediction in Relational Data Ben Taskar, Pieter Abbeel, Ming-Fai Wong, Daphne Koller Stanford University Presented by Yufei Pan Rutgers University

Key Ideas Flat VS Relational Directed VS Undirected

Flat VS Relational The vast majority of work in statistical classification methods has focused on ``flat‘’ data -- data consisting of identically structured entities, typically assumed to be independent and identically distributed (IID). However, the relations among entities could potentially help us achieve better classification accuracy.

Example Tom Mitchell Professor Project-of WebKB Project Member Sean Slattery Student Advisor-of Project-of Member Of course, data is not always so nicely arranged for us as in a relational database. Let us consider the biggest source of data --- the world wide web. Consider the webpages in a computer science department. Here is one webpage, which links to another. This second webpage links to a third, which links back to the first two. There is also a webpage with a lot of outgoing links to webpages on this site. This is not nice clean data. Nobody labels these webpages for us, and tells us what they are. We would like to learn to understand this data, and conclude from it that we have a “Professor Tom Mitchell” one of whose interests is a project called “WebKB”. “Sean Slattery” is one of the students on the project, and Professor Mitchell is his advisor. Finally, Tom Mitchell is a member of the CS CMU faculty, which contains many other faculty members. How do we get from the raw data to this type of analysis?

collective classification Therefore, rather than classifying each document separately, we want to provide a form of collective classification. Simultaneously decide on the class labels of all of the entities together, and thereby can explicitly take advantage of the correlations between the labels of related entities.

A little bit history… Probabilistic Relational Models, a relational version of Bayesian networks, were used to define a joint probabilistic model for a collection of related entities. Proposed by the Koller, 1998 Two limitations undirected models do not impose the acyclicity constraint that hinders representation of many important relational dependencies in directed models. undirected models are well suited for discriminative training, where we optimize the conditional likelihood of the labels given the features, which generally improves classification accuracy.

Undirected PRMs: Relational Markov Nets compactly defines a Markov network over a relational data set. The graphical structure of an RMN is based on the relational structure of the domain, and can easily model complex patterns over related entities Markov Nets The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course.

Markov Network

Markov Network – Contd

Relational Markov Network A relational Markov network (RMN) specifies a conditional distribution over all of the labels of all of the entities in an instantiation given the relational structure and the content attributes. Roughly speaking, it specifies the cliques and potentials between attributes of related entities at a template level, so a single model provides a coherent distribution for any collection of instances from the schema.

Example suppose that pages with the same label tend to link to each other We can capture this correlation between labels by introducing, for each link, a clique between the labels of the source and the target page.

Relational clique template SELECT doc1.Category, doc2.Category FROM Doc doc1, Doc doc2, Link link WHERE link.From = doc1.Key and link.To = doc2.Key

Another Example links in a webpage tend to point to pages of the same category. This pattern can be expressed by the following template: SELECT doc1.Category, doc2.Category FROM Doc doc1, Doc doc2, Link link1, Link link2 WHERE link1.From = link2.From and link1.To = doc1.Key and link2.To = doc2.Key and not doc1.Key = doc2.Key

Learning RMNs Using Gradient Conditional classification Log(Gradient) Parameters are not independent

Inference in Markov Networks compute the posterior distribution over the label variables in the instantiation given the content variables. However, the networks resulting from domains such as our hypertext classification task are very large and densely connected. Exact inference is completely intractable in these cases. approximate inference with belief propagation, which is guaranteed to converge to the correct marginal probabilities for each node only for singly connected Markov networks.

Exploiting Links From- Page ... Page Category Word1 WordN To- Category ... Classify all pages collectively, maximizing the joint label probability Link Word1 WordN 35.4% relative reduction in error relative to strong flat approach

Predicting Relationships Tom Mitchell Professor Advisor-of Member WebKB Project Sean Slattery Student Even more interesting are relationships between objects

Flat Model ... ... ... From- Page To- Page Rel Word1 WordN Word1 WordN Type Rel NONE advisor instructor TA member project-of ... LinkWord1 LinkWordN

Flat Model ... ...

Collective Classification: Links From- Page To- Page Category Category ... ... Word1 WordN Word1 WordN Type Rel ... LinkWord1 LinkWordN

Link Model ... ...

Web Classification Experiments Flat Model Logistic Relation Models Link template SELECT doc1.Category, doc2.Category FROM Doc doc1, Doc doc2, Link link WHERE link.From = doc1.Key and link.To = doc2.Key Section template SELECT page.Category, sec.Category FROM Page page, Section sec WHERE sec.Page = page.Key Link+Section template SELECT sec.Category, page.Category FROM Section sec, Link link, Page page WHERE link.Sec = sec.Key and link.To = page.Key

Result

Summary Use relational models to recognize entities & relations directly from raw data Collective classification: Classifies multiple entities simultaneously Exploits links & correlations between related entities Uses web of influence reasoning to reach strong conclusions from weak evidence Undirected PRMs allow high-accuracy discriminative training & rich graphical patterns

Thank you !