Presentation is loading. Please wait.

Presentation is loading. Please wait.

Label and Link Prediction in Relational Data

Similar presentations


Presentation on theme: "Label and Link Prediction in Relational Data"— Presentation transcript:

1 Label and Link Prediction in Relational Data
Ben Taskar, Pieter Abbeel, Ming-Fai Wong, Daphne Koller Stanford University Presented by Yufei Pan Rutgers University

2 Key Ideas Flat VS Relational Directed VS Undirected

3 Flat VS Relational The vast majority of work in statistical classification methods has focused on ``flat‘’ data -- data consisting of identically structured entities, typically assumed to be independent and identically distributed (IID). However, the relations among entities could potentially help us achieve better classification accuracy.

4 Example Tom Mitchell Professor Project-of WebKB Project Member
Sean Slattery Student Advisor-of Project-of Member Of course, data is not always so nicely arranged for us as in a relational database. Let us consider the biggest source of data --- the world wide web. Consider the webpages in a computer science department. Here is one webpage, which links to another. This second webpage links to a third, which links back to the first two. There is also a webpage with a lot of outgoing links to webpages on this site. This is not nice clean data. Nobody labels these webpages for us, and tells us what they are. We would like to learn to understand this data, and conclude from it that we have a “Professor Tom Mitchell” one of whose interests is a project called “WebKB”. “Sean Slattery” is one of the students on the project, and Professor Mitchell is his advisor. Finally, Tom Mitchell is a member of the CS CMU faculty, which contains many other faculty members. How do we get from the raw data to this type of analysis?

5 collective classification
Therefore, rather than classifying each document separately, we want to provide a form of collective classification. Simultaneously decide on the class labels of all of the entities together, and thereby can explicitly take advantage of the correlations between the labels of related entities.

6 A little bit history… Probabilistic Relational Models, a relational version of Bayesian networks, were used to define a joint probabilistic model for a collection of related entities. Proposed by the Koller, 1998 Two limitations undirected models do not impose the acyclicity constraint that hinders representation of many important relational dependencies in directed models. undirected models are well suited for discriminative training, where we optimize the conditional likelihood of the labels given the features, which generally improves classification accuracy.

7 Undirected PRMs: Relational Markov Nets
compactly defines a Markov network over a relational data set. The graphical structure of an RMN is based on the relational structure of the domain, and can easily model complex patterns over related entities Markov Nets The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course.

8 Markov Network

9 Markov Network – Contd

10 Relational Markov Network
A relational Markov network (RMN) specifies a conditional distribution over all of the labels of all of the entities in an instantiation given the relational structure and the content attributes. Roughly speaking, it specifies the cliques and potentials between attributes of related entities at a template level, so a single model provides a coherent distribution for any collection of instances from the schema.

11 Example suppose that pages with the same label tend to link to each other We can capture this correlation between labels by introducing, for each link, a clique between the labels of the source and the target page.

12 Relational clique template
SELECT doc1.Category, doc2.Category FROM Doc doc1, Doc doc2, Link link WHERE link.From = doc1.Key and link.To = doc2.Key

13 Another Example links in a webpage tend to point to pages of the same category. This pattern can be expressed by the following template: SELECT doc1.Category, doc2.Category FROM Doc doc1, Doc doc2, Link link1, Link link2 WHERE link1.From = link2.From and link1.To = doc1.Key and link2.To = doc2.Key and not doc1.Key = doc2.Key

14 Learning RMNs Using Gradient Conditional classification Log(Gradient)
Parameters are not independent

15 Inference in Markov Networks
compute the posterior distribution over the label variables in the instantiation given the content variables. However, the networks resulting from domains such as our hypertext classification task are very large and densely connected. Exact inference is completely intractable in these cases. approximate inference with belief propagation, which is guaranteed to converge to the correct marginal probabilities for each node only for singly connected Markov networks.

16 Exploiting Links From- Page ... Page Category Word1 WordN To- Category ... Classify all pages collectively, maximizing the joint label probability Link Word1 WordN 35.4% relative reduction in error relative to strong flat approach

17 Predicting Relationships
Tom Mitchell Professor Advisor-of Member WebKB Project Sean Slattery Student Even more interesting are relationships between objects

18 Flat Model ... ... ... From- Page To- Page Rel Word1 WordN Word1 WordN
Type Rel NONE advisor instructor TA member project-of ... LinkWord1 LinkWordN

19 Flat Model ... ...

20 Collective Classification: Links
From- Page To- Page Category Category ... ... Word1 WordN Word1 WordN Type Rel ... LinkWord1 LinkWordN

21 Link Model ... ...

22 Web Classification Experiments
Flat Model Logistic Relation Models Link template SELECT doc1.Category, doc2.Category FROM Doc doc1, Doc doc2, Link link WHERE link.From = doc1.Key and link.To = doc2.Key Section template SELECT page.Category, sec.Category FROM Page page, Section sec WHERE sec.Page = page.Key Link+Section template SELECT sec.Category, page.Category FROM Section sec, Link link, Page page WHERE link.Sec = sec.Key and link.To = page.Key

23 Result

24 Summary Use relational models to recognize entities & relations directly from raw data Collective classification: Classifies multiple entities simultaneously Exploits links & correlations between related entities Uses web of influence reasoning to reach strong conclusions from weak evidence Undirected PRMs allow high-accuracy discriminative training & rich graphical patterns

25 Thank you !


Download ppt "Label and Link Prediction in Relational Data"

Similar presentations


Ads by Google