Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel.

Similar presentations


Presentation on theme: "Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel."— Presentation transcript:

1 Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel Nir Friedman Ben Taskar

2 Why Relational? The real world is composed of objects that have properties and are related to each other Natural language is all about objects and how they relate to each other “George got an A in Geography 101”

3 Attribute-Based Worlds Smart students get A’s in easy classes Smart_Jane & easy_CS101  GetA_Jane_CS101 Smart_Mike & easy_Geo101  GetA_Mike_Geo101 Smart_Jane & easy_Geo101  GetA_Jane_Geo101 Smart_Rick & easy_CS221  GetA_Rick_C World = assignment of values to attributes / truth values to propositional symbols

4 Object-Relational Worlds World = relational interpretation: Objects in the domain Properties of these objects Relations (links) between objects  x,y(Smart(x) & Easy(y) & Take(x,y)  Grade(A,x,y))

5 Why Probabilities? All universals are false Smart students get A’s in easy classes True universals are rarely useful Smart students get either A, B, C, D, or F C student The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful … (almost) James Clerk Maxwell Therefore the true logic for this world is the calculus of probabilities …

6 Probable Worlds Probabilistic semantics: A set of possible worlds Each world associated with a probability hard smart A hard smart B hard smart C hard weak A hard weak B hard weak C easy smart A easy smart B easy smart C easy weak A easy weak B easy weak C course difficulty student intell. grade

7 Representation: Design Axes AttributesObjects Categorical Probabilistic Epistemic state World state Propositional logic CSPs First-order logic Relational databases Sequences AutomataGrammars Bayesian nets Markov nets n-gram models HMMs Prob. CFGs

8 Outline Bayesian Networks Representation & Semantics Reasoning Probabilistic Relational Models Collective Classification Undirected discriminative models Collective Classification Revisited PRMs for NLP

9 Bayesian Networks nodes = variables edges = direct influence Graph structure encodes independence assumptions: Letter conditionally independent of Intelligence given Grade ABC CPD P(G|D,I) Letter Grade SAT Intelligence Difficulty

10 BN semantics Compact & natural representation: nodes have  k parents  2 k n vs. 2 n params parameters natural and easy to elicit conditional independencies in BN structure + local probability models full joint distribution over domain = L G S ID

11 Full joint distribution specifies answer to any query: P(variable | evidence about others) Reasoning using BNs Letter Grade SAT Intelligence Difficulty Letter SAT Probability theory is nothing but common sense reduced to calculation. Pierre Simon Laplace

12 BN Inference BN Inference is NP-hard Structure can use graph structure: Graph separation  conditional independence Do separate inference in parts Results combined over interface. A C B D FE Complexity: exponential in largest separator Structured BNs allow effective inference Exact inference in dense BNs is intractable

13 Approximate BN Inference Belief propagation is an iterative message passing algorithm for approximate inference in BNs Each iteration (until “convergence”): Nodes pass “beliefs” as messages to neighboring nodes Cons: Limited theoretical guarantees Might not converge Pros: Linear time per iteration Works very well in practice, even for dense networks

14 Outline Bayesian Networks Probabilistic Relational Models Language & Semantics Web of Influence Collective Classification Undirected discriminative models Collective Classification Revisited PRMs for NLP

15 Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Difficulty Grade Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Intell_George Diffic_CS101 Grade_George_CS101 A C These “instances” are not independent

16 Probabilistic Relational Models Combine advantages of relational logic & BNs: Natural domain modeling: objects, properties, relations Generalization over a variety of situations Compact, natural probability models Integrate uncertainty with relational model: Properties of domain entities can depend on properties of related entities Uncertainty over relational structure of domain

17 St. Nordaf University Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence

18 Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Teach Student Intelligence Registration Grade Satisfaction Course Difficulty Professor Teaching-Ability In Take Classes Relations Attributes

19 Probabilistic Relational Models Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Student Intelligence Reg Grade Satisfaction Course Difficulty Professor Teaching-Ability [K. & Pfeffer; Poole; Ngo & Haddawy] ABC

20 Prof. SmithProf. Jones Welcome to CS101 Welcome to Geo101 PRM Semantics Teaching-ability Difficulty Grade Satisfac Intelligence Instantiated PRM  BN  variables: attributes of all objects  dependencies: determined by links & PRM George Jane

21 Welcome to CS101 low / high The Web of Influence Welcome to Geo101 A C low high easy / hard

22 Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Learning models from data Collective classification of webpages Undirected discriminative models Collective Classification Revisited PRMs for NLP

23 Learning PRMs Learner Relational Database Course Student Reg D Expert knowledge [Friedman, Getoor, K., Pfeffer]

24 Learning PRMs Parameter estimation: Probabilistic model with shared parameters Grades for all students share same model Can use standard techniques for max-likelihood or Bayesian parameter estimation Structure learning: Define scoring function over structures Use combinatorial search to find high-scoring structure

25 Web  KB Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Project-of Member [Craven et al.]

26 Web Classification Experiments WebKB dataset Four CS department websites Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations

27 Professor department extract information computer science machine learning … Standard Classification Categories: faculty course project student other 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 words only Naïve Bayes Page... Category Word 1 Word N

28 Exploiting Links... LinkWord N workin g with Tom Mitchell … 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 words onlylink words Page... Category Word 1 Word N

29 Collective Classification... Page Category Word 1 Word N From-... Page Category Word 1 Word N Link Exists To- [Getoor, Segal, Taskar, Koller] Approx. inference: belief propagation 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 words onlylink wordscollective Classify all pages collectively, maximizing the joint label probability

30 P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM low / high easy / hard ABC Courses Students [Dempster et al. 77]

31 Discovering Hidden Types Internet Movie Database http://www.imdb.com

32 Actor Director Movie Genres Rating Year #Votes MPAA Rating Discovering Hidden Types Type [Taskar, Segal, Koller]

33 Directors Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Actors Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger … Movies Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson … Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Discovering Hidden Types

34 Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Markov Networks Relational Markov Networks Collective Classification Revisited PRMs for NLP

35 Directed Models: Limitations Acyclicity constraint limits expressive power: Two objects linked to by a student probably not both professors Allow arbitrary patterns over sets of objects & links Acyclicity forces modeling of all potential links: Network size O(N 2 ) Inference is quadratic Generative training: Train to fit all of data, not to maximize accuracy Influence flows over existing links, exploiting link graph sparsity Network size O(N) Allow discriminative training: Max P (labels | observations) Solution: Undirected Models [Lafferty, McCallum, Pereira]

36 Markov Networks Graph structure encodes independence assumptions: Chris conditionally independent of Eve given Alice & Dave ChrisDave EveAlice Betty ABC Compatibility  (A,B,C)

37 Relational Markov Networks Universals: Probabilistic patterns hold for all groups of objects Locality: Represent local probabilistic dependencies Sets of links give us possible interactions Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence [Taskar, Abbeel, Koller ‘02] Template potential

38 RMN Semantics Instantiated RMN  MN  variables: attributes of all objects  dependencies: determined by links & RMN George Jane Welcome to CS101 Welcome to Geo101 Difficulty Jill Geo Study Group CS Study Group Intelligence Grade

39 Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Collective Classification Revisited Discriminative training of RMNs Webpage classification Link prediction PRMs for NLP

40 Learning RMNs Parameter estimation is not closed form Convex problem  unique global maximum  (Reg1.Grade,Reg2.Grade) P(Grades,Intelligence|Difficulty) Difficulty Intelligence Grade low / higheasy / hard ABCABC L = log Intelligence Grade Intelligence Grade Maximize

41 Flat Models... Page Category Word 1 Word N LinkWord N... P(Category|Words) Logistic Regression

42 Exploiting Links... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- 42.1% relative reduction in error relative to generative approach

43 More Complex Structure C Wn W1 Faculty S Students S Courses

44 Collective Classification: Results 35.4% relative reduction in error relative to strong flat approach

45 Scalability WebKB data set size 1300 entities 180K attributes 5800 links Network size / school: Directed model 200,000 variables 360,000 edges Undirected model 40,000 variables 44,000 edges Difference in training time decreases substantially when some training data is unobserved want to model with hidden variables 3 sec180 sec 20 minutes15-20 sec Directed models Undirected models TrainingClassification

46 Predicting Relationships Even more interesting are the relationships between objects e.g., verbs are almost always relationships Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Member

47 Rel Flat Model... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N NONE advisor instructor TA member project-of

48 Flat Model...

49 Collective Classification: Links Rel... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N Category

50 Link Model...

51 Triad Model ProfessorStudent Group Advisor Member

52 Triad Model ProfessorStudent Course Advisor TA Instructor

53 Triad Model

54 WebKB++ Four new department web sites: Berkeley, CMU, MIT, Stanford Labeled page type (8 types): faculty, student, research scientist, staff, research group, research project, course, organization Labeled hyperlinks and virtual links (6 types): advisor, instructor, TA, member, project-of, NONE Data set size: 11K pages 110K links 2million words

55 Link Prediction: Results Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (  30% for all models)... 72.9% relative reduction in error relative to strong flat approach

56 Summary PRMs inherit key advantages of probabilistic graphical models: Coherent probabilistic semantics Exploit structure of local interactions Relational models inherently more expressive “Web of influence”: use all available information to reach powerful conclusions Exploit both relational information and power of probabilistic reasoning

57 Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Collective Classification Revisited PRMs for NLP Word-Sense Disambiguation Relation Extraction Natural Language Understanding (?) * An outsider’s perspective or “Why Should I Care?”*

58 Her advisor gave her feedback about the draft. Word Sense Disambiguation Neighboring words alone may not provide enough information to disambiguate We can gain insight by considering compatibility between senses of related words financial academic physical figurative electrical criticism wind paper

59 Collective Disambiguation Objects: words in text Attributes: sense, gender, number, pos, … Links: Grammatical relations (subject-object, modifier,…) Close semantic relations (is-a, cause-of, …) Same word in different sentences (one-sense-per-discourse) Compatibility parameters: Learned from tagged data Based on prior knowledge (e.g., WordNet, FrameNet) Her advisor gave her feedback about the draft. financial academic physical figurative electrical criticism wind paper Can we infer grammatical structure and disambiguate word senses simultaneously rather than sequentially? Can we integrate inter-word relationships directly into our probabilistic model?

60 Relation Extraction Announcement Miller Jackson Made Candidate Concerns Departs CEO Of ACME’s board of directors began a search for a new CEO after the departure of current CEO, James Jackson, following allegations of creative accounting practices at ACME. [6/01] … In an attempt to improve the company’s image, ACME is considering former judge Mary Miller for the job. [7/01] … As her first act in her new position, Miller announced that ACME will be doing a stock buyback. [9/01] … Hired??

61 Professor Sarah met Jane. She explained the hole in her proof. Understanding Language Proof: Theorem: P=NP N=1 Most likely interpretation: Student JaneProfessor Sarah

62 Resolving Ambiguity Professors often meet with students Jane is probably a student Professors like to explain “She” is probably Prof. Sarah Attribute values Link types Object identity [Goldman & Charniak, Pasula & Russell] Professor Sarah met Jane. She explained the hole in her proof. Probabilistic reasoning about objects, their attributes, and the relationships between them

63 Acquiring Semantic Models Statistical NLP reveals patterns: Standard models learn patterns at word level But word-patterns are only implicit surrogates for underlying semantic patterns “Teacher” objects tend to participate in certain relationships Can use this pattern for objects not explicitly labeled as a teacher teacher be train hire pay fire serenade 24% 3% 1.5% 1.4% 0.3%

64 Competing Approaches Logical Statistical Semantic Understanding Scaling Up (via learning) PRMs Noise & Ambiguity Desiderata: Complementary Approaches

65 Statistics: from Words to Semantics Represent statistical patterns at semantic level What types of objects participate in what types of relationships Learn statistical models of semantics from text Reason using the models to obtain global semantic understanding of the text Georgia O’Keefe Ladder to the Moon


Download ppt "Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel."

Similar presentations


Ads by Google