Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Similar presentations


Presentation on theme: "© Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University."— Presentation transcript:

1 © Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University

2 © Daphne Koller, 2003 Overview Relational logic Probabilistic relational models Inference in PRMs Learning PRMs Uncertainty about domain structure Summary

3 © Daphne Koller, 2003 Attribute-Based Models & Relational Models

4 © Daphne Koller, 2003 Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Difficulty Grade Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Intell_George Diffic_CS101 Grade_George_CS101 A C These “instances” are not independent

5 © Daphne Koller, 2003 The University BN Difficulty_ Geo101 Difficulty_ CS101 Grade_ Jane_ CS101 Intelligence_ George Intelligence_ Jane Grade_ George_ CS101 Grade_ George_ Geo101

6 © Daphne Koller, 2003 G_Homer G_Bart G_Marge G_LisaG_Maggie G_Harry G_Betty G_Selma B_Harry B_Betty B_Selma B_Homer B_Marge B_Bart B_LisaB_Maggie The Genetics BN G = genotype B = bloodtype

7 © Daphne Koller, 2003 Simple Approach Graphical model with shared parameters … and shared local dependency structure Want to encode this constraint: For human knowledge engineer For network learning algorithm How do we specify which nodes share params? shared (but different) structure across nodes?

8 © Daphne Koller, 2003 Simple Approach II We can write a special-purpose program for each domain: genetic inheritance (family tree imposes constraints) university (course registrations impose constraints) Is there something more general?

9 © Daphne Koller, 2003 Attribute-Based Worlds Smart_Jane & easy_CS101  GetA_Jane_CS101 Smart_Mike & easy_Geo101  GetA_Mike_Geo101 Smart_Jane & easy_Geo101  GetA_Jane_Geo101 Smart_Rick & easy_CS221  GetA_Rick_C World = assignment of values to attributes / truth values to propositional symbols

10 © Daphne Koller, 2003 Object-Relational Worlds World = relational interpretation: Objects in the domain Properties of these objects Relations (links) between objects  x,y(Smart(x) & Easy(y) & Take(x,y)  Grade(A,x,y))

11 © Daphne Koller, 2003 Relational Logic General framework for representing: objects & their properties classes of objects with same model relations between objects Represent a model at the template level, and apply it to an infinite set of domains Given finite domain, each instantiation of the model is propositional, but the template is not

12 © Daphne Koller, 2003 Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Student Intelligence Registration Grade Satisfaction Course Difficulty Professor Teaching-Ability Classes Attributes Teach Relations Has In

13 © Daphne Koller, 2003 St. Nordaf University Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence High Hard Easy A C B Hate Smart Weak High Low Easy A B C Like Hate Like Smart Weak

14 © Daphne Koller, 2003 Relational Logic: Summary Vocabulary: Classes of objects: Person, Course, Registration, … Individual objects in a class: George, Jane, … Attributes of these objects: George.Intelligence, Reg1.Grade Relationships between these objects Of(Reg1,George), Teaches(CS101,Smith) A world specifies: A set of objects, each in a class The values of the attributes of all objects The relations that hold between the objects

15 © Daphne Koller, 2003 Binary Relations Any relation can be converted into an object: R(x 1,x 2,…,x k )  new “relation” object y, R 1 (x 1,y), R 2 (x 2,y),…, R k (x k,y) E.g., registrations are “relation objects”  Can restrict attention to binary relations R(x,y)

16 © Daphne Koller, 2003 Relations & Links Binary relations can also be viewed as links: Specify the set of objects related to x via R R(x,y)  y  x.R 1, x  y.R 2 E.g., Teaches(p,c)  p.Courses = {courses c : Teaches(p,c)} c.Instructor = {professors p : Teaches(p,c)}

17 © Daphne Koller, 2003 Probabilistic Relational Models: Relational Bayesian Networks

18 © Daphne Koller, 2003 Probabilistic Models Uncertainty model: space of “possible worlds”; probability distribution over this space. In attribute-based models, world specifies assignment of values to fixed set of random variables In relational models, world specifies Set of domain elements Their properties Relations between them

19 © Daphne Koller, 2003 PRM Scope Entire set of relational worlds is infinite and too broad Assume circumscribed class of sets of worlds   consistent with some type of background knowledge  PRM  is a template defining P  (   ) for any such  Simplest class  attribute-based PRMs:  = relational skeleton: finite set of entities E and relations between them   = all assignments of values to all attributes of entities in E PRM template defines distribution over   for any such  [K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]

20 © Daphne Koller, 2003 Relational Bayesian Network Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Student Intelligence Reg Grade Satisfaction Course Difficulty Professor Teaching-Ability [K. & Pfeffer; Poole; Ngo & Haddawy] ABC

21 © Daphne Koller, 2003 RBN: Semantics  : set of objects & relations between them    the set of all assignments of values to all attributes of all objects in  P  (   ) is defined by a ground Bayesian network: variables: attributes of all objects dependencies: determined by relational links in  dependency structure of RBN model 

22 © Daphne Koller, 2003 RBN Structure For any object x in class X, x.B is parent of x.A  : link or chain of links For any object x in class X, x. .B is parent of x For each class X and attribute A, structure specifies parents for X.A X.B X.A Course.Level Course.Difficulty X. .B X.A Reg.In.Difficulty Reg.Grade

23 © Daphne Koller, 2003 Prof. SmithProf. Jones Welcome to CS101 Welcome to Geo101 RBN Semantics Teaching-ability Grade Satisfac Intelligence George Jane Welcome to CS101 Difficulty

24 © Daphne Koller, 2003 Welcome to CS101 low / high The Web of Influence Welcome to Geo101 A C low high easy / hard

25 © Daphne Koller, 2003 Student Intelligence Honors Reg Grade Satisfaction Aggregate Dependencies Student Jane Doe Intelligence Honors Reg #5077 Grade Satisfaction Reg #5054 Grade Satisfaction Reg #5639 Grade Satisfaction Problem!!! Need dependencies on sets av g 9.01.0 6.04.0 2.08.0 C B A ny [K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]

26 © Daphne Koller, 2003 Aggregate Dependencies Student Intelligence Honors Reg Grade Satisfaction Course Difficulty Rating Professor Teaching-Ability Stress-Level avg count sum, min, max, avg, mode, count

27 © Daphne Koller, 2003 Basic RBN: Summary RBN specifies A probabilistic dependency structure S: A set of parents X. .B for each class attribute X.A A set of local probability models: Aggregator to use for each multi-valued dependency Set of CPD parameters  X.A Given relational skeleton structure , RBN induces a probability distribution over worlds  Distribution defined via ground BN over attributes x.A AttributesObjects

28 © Daphne Koller, 2003 Extension: Class Hierarchy Subclasses inherit all attributes of parents, but may have additional ones For inherited attribute X.A, subclass can: inherit parent’s probabilistic model overwrite with local probabilistic model Example: Professor has subclasses assistant, associate, full Inherit distribution over Stress-Level Modify distribution over Salary [K. & Pfeffer ’98]

29 © Daphne Koller, 2003 Extension: Class Hierarchies Hierarchies allow reuse in knowledge engineering and in learning Parameters and dependency models shared across more objects If class assignments specified in , class hierarchy does not introduce complications [K. & Pfeffer ’98]

30 © Daphne Koller, 2003 Coherence & Acyclicity For given skeleton , PRM  asserts dependencies between attributes of objects:  defines coherent probability model over  if  ,  is acyclic Reg Grade Satisfaction Professor Teaching-Ability Stress-Level Smith.Stress-level depends probabilistically on itself [Friedman, Getoor, K. Pfeffer ’99]

31 © Daphne Koller, 2003 Guaranteeing Acyclicity How do we guarantee that a PRM  is acyclic for every object skeleton  ? PRM dependency structure S template-level dependency graph Y.B X.A if X. .B  Parents(X.A), and class of X.  is Y class dependency graph acyclic   ,  acyclic for all  Attribute stratification:

32 © Daphne Koller, 2003 Limitation of Stratification Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Person.M-chromPerson.P-chrom Person.B-type

33 © Daphne Koller, 2003 Limitation of Stratification Prior knowledge: the Father-of relation is acyclic Dependence of Person.A on Person.Father.B cannot induce cycles Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother

34 © Daphne Koller, 2003 Guaranteeing Acyclicity With guaranteed acyclic relations, some cycles in the dependency graph are guaranteed to be safe. We color the edges in the dependency graph A cycle is safe if it has a green edge it has no red edge yellow: within single object X.B X.A green: via g.a. relation Y.B X.A red: via other relations Y.B X.A Person.M-chrom Person.P-chrom Person.B-type

35 © Daphne Koller, 2003 Object-Oriented Bayesian Nets* OOBNs are RBN with only one type of relation One object can be a “part-of” another Objects can only interact with component parts Other types of relationships must be embedded into the “part-of” framework Defines “neat” hierarchy of related objects Provides clearly defined object interface between object x and its enclosing object y [K. & Pfeffer ’97] * Simplified view

36 © Daphne Koller, 2003 OOBN Example Drivers, cars, collision must all be viewed as part of situation object Driver1 Car1 Owner SpeedValue Car2 Owner Driver2 SpeedValue Collision Severity Car1 Damage Car2 Damage

37 © Daphne Koller, 2003 Part-of hierarchy Objects in the model are encapsulated within each other, forming a tree. Accident situation Driver1Car1SeverityCar2Driver2 PriceEngineSpeed EnginePriceIncome Power

38 © Daphne Koller, 2003 Lines of Communication Accident situation Driver1Car1SeverityCar2Driver2 Price Engine Speed EnginePrice Income Power

39 © Daphne Koller, 2003 Probabilistic Relational Models: Relational Markov Networks

40 © Daphne Koller, 2003 Why Undirected Models? Symmetric, non-causal interactions E.g., web: categories of linked pages are correlated Cannot introduce direct edges because of cycles Patterns involving multiple entities E.g., web: “triangle” patterns Directed edges not appropriate “Solution”: Impose arbitrary direction Not clear how to parameterize CPD for variables involved in multiple interactions Very difficult within a class-based parameterization [Taskar, Abbeel, K. 2001]

41 © Daphne Koller, 2003 Markov Networks: Review Chris Dave Eve Alice Bob ABC Potential  (A,B,C)

42 © Daphne Koller, 2003 Markov Networks: Review A Markov network is an undirected graph over some set of variables V Graph associated with a set of potentials  i Each potential is factor over subset V i Variables in V i must be a (sub)clique in network

43 © Daphne Koller, 2003 Relational Markov Networks Probabilistic patterns hold for groups of objects Groups defined as sets of (typed) elements linked in particular ways Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential [Taskar, Abbeel, K. 2002]

44 © Daphne Koller, 2003 Relational Markov Networks Probabilistic patterns hold for groups of objects Groups defined as sets of (typed) elements linked in particular ways Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential

45 © Daphne Koller, 2003 RMN Language Define clique templates All tuples {reg R 1, reg R 2, group G} s.t. In(G, R 1 ), In(G, R 2 ) Compatibility potential  (R 1.Grade, R 2.Grade) Ground Markov network contains potential  (r 1.Grade, r 2.Grade) for all appropriate r 1, r 2

46 © Daphne Koller, 2003 Welcome to CS101 Ground MN (or Chain Graph) Welcome to Geo101 Difficulty Grade Intelligence George Jane Jill Intelligence Geo Study Group CS Study Group Grade

47 © Daphne Koller, 2003 PRM Inference

48 © Daphne Koller, 2003 Inference: Simple Method Define ground network as in semantics Apply standard inference methods Problem: Very large models can be specified very easily Resulting ground network often highly connected Exact inference is typically intractable In practice, often must resort to approximate methods such as belief propagation

49 © Daphne Koller, 2003 Honors Intelligence Upbringing Personality Job-prospects Student Exploiting Structure: Encapsulation Objects interact only in limited ways Can define object interface: Outputs: Object attributes influencing other objects Inputs: External attributes influencing object Object is independent of everything given interface Inference can be encapsulated within objects, with “communication” limited to interfaces GPA Intelligence [K. & Pfeffer, 1997]

50 © Daphne Koller, 2003 Exploiting Structure: Encapsulation Marginalize object distribution onto interface Dependency graph over interfaces induced by Inter-object dependencies And hence by the relational structure Perform inference over interfaces If interaction graph has low tree-width, can use exact inference E.g., part-of hierarchy in some OOBNs If relational structure more complex, can use BP A form of Kikuchi BP, where cluster selection is guided by relational structure

51 © Daphne Koller, 2003 Exploiting Structure: Reuse Objects from same class have same model For generic objects – no internal evidence – marginalize interface is the same Can reuse inference – a form of “lifting” Honors Upbringing Personality Job-prospects George GPA Intelligence Honors Upbringing Personality Job-prospects Jane GPA Intelligence [Pfeffer & K. 1998]

52 © Daphne Koller, 2003 Exploiting Structure: Reuse Generic objects often play same role in model Multiple students that all take the same class Reuse: compute interface once Combinatorics: compute total contribution to probability in closed form P(k students like the class | teaching-ability = low) = P(generic student likes the class | teaching ability = low) k [Pfeffer & K. 1998]

53 © Daphne Koller, 2003 Case study Example object classes: Battalion Battery Vehicle Location Weather. Example relations: At-Location Has-Weather Sub-battery/In-battalion Sub-vehicle/In-battery Battlefield situation assessment for missile units several locations many units each has detailed model

54 © Daphne Koller, 2003 Effect of Exploiting Structure 0 1000 2000 3000 4000 5000 6000 12345678910 flat BN no reuse with reuse with combinatorics #vehicles of each type in each battery running time in seconds BN has  2400 nodes cost  20 minutes cost  9 sec Query: P(Next-Mission) [Pfeffer & K. 1998]

55 © Daphne Koller, 2003 PRM Learning

56 © Daphne Koller, 2003 Outline Relational Bayesian networks Likelihood function ML parameter estimation EM Structure learning Relational Markov networks Parameter estimation Applications: Collective classification – web data Relational clustering – biological data

57 © Daphne Koller, 2003 PRM Learning: Input Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence High Hard Easy A C B Hate Smart Weak Input: a full world 

58 © Daphne Koller, 2003 Likelihood Function Likelihood of a BN with shared parameters Joint likelihood is a product of likelihood terms One for each attribute X.A and its family For each X.A, the likelihood function aggregates counts from all occurrences x.A in world  AttributesObjects [Friedman, Getoor, K., Pfeffer, 1999]

59 © Daphne Koller, 2003 Likelihood Function: Multinomials Sufficient statistics: Log-likelihood:

60 © Daphne Koller, 2003 RBN Parameter Estimation MLE parameters: Bayesian estimation: Prior for each attribute X.A Posterior uses aggregated sufficient statistics

61 © Daphne Koller, 2003 Learning w. Missing Data EM Algorithm applies essentially unchanged E-step computes expected sufficient statistics, aggregated over all objects in class M-step uses ML (or MAP) parameter estimation Key difference: In general, the hidden variables are not independent Computation of expected sufficient statistics requires inference over entire network [Same reasoning as for forward-backward algorithm in temporal models]

62 © Daphne Koller, 2003 P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM low / high easy / hard ABC Courses Students [Dempster et al. 77]

63 © Daphne Koller, 2003 Learning RBN Structure Define set of legal RBN structures Ones with legal class dependency graphs Define scoring function  Bayesian score Product of family scores: One for each X.A Uses aggregated sufficient statistics Search for high-scoring legal structure [Friedman, Getoor, K., Pfeffer, 1999]

64 © Daphne Koller, 2003 Learning RBN Structure All operations done at class level Dependency structure = parents for X.A Acyclicity checked using class dependency graph Score computed at class level Individual objects only contribute to sufficient statistics Can be obtained efficiently using standard DB queries

65 © Daphne Koller, 2003 Exploiting Locality: Phased Search Student Course Reg  score Add C.A  C.B  score Delete S.I  S.P Student Course Reg Student Reg Course Phase 0: consider only dependencies within a class

66 © Daphne Koller, 2003 Exploiting Locality: Phased Search Phase 1: consider dependencies one link away Student Course Reg  score Add C.A  R.B  score Add S.I  R.C Student Course Reg Student Reg Course

67 © Daphne Koller, 2003 Exploiting Locality: Phased Search Phase k: consider dependencies k links away  score Add C.A  S.P  score Add S.I  C.B Student Course Reg Student Course Reg Student Course Reg

68 © Daphne Koller, 2003 Reg Grade Satisfaction Professor Popularity Teaching-Ability Student Intelligence Honors Difficulty Rating Course Exploiting Locality: Phased Search

69 © Daphne Koller, 2003 TB Patients in San Francisco [Getoor, Rhee, K., Small, 2001] Patients Contacts Strains

70 © Daphne Koller, 2003 Outbreak Suscept Age POB TB-type XRay Race CareLoc HIV Smear Gender Infected CloseCont Relation Treatment Subcase CareLoc TB Patients in San Francisco Strain Patient Contact Input: Relational database 2300 patients 1000 strains 20000 contacts Output: PRM for domain

71 © Daphne Koller, 2003 Outline Relational Bayesian networks Likelihood function ML parameter estimation EM Structure learning Relational Markov networks Parameter estimation Applications: Collective classification – web data Relational clustering – biological data

72 © Daphne Koller, 2003 Learning RMN Parameters Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential  Study Group Parameterize potentials as log-linear model [Taskar, Wong, Abbeel, K., 2002]

73 © Daphne Koller, 2003 Learning RMN Parameters For example: f AA (  ) = # of tuples {reg r 1, reg r 2, group g} s.t. In(g, r 1 ), In(g, r 2 ); r 1.Grade=A, r 2.Grade=A Counts in  Partition function

74 © Daphne Koller, 2003 Learning RMN Parameters Parameter estimation is not closed form Convex problem  unique global maximum Can use methods such as conjugate gradient actual count - expected count Gradient process tries to find parameters s.t. expected counts = actual counts Computing expected counts requires inference over ground Markov network

75 © Daphne Koller, 2003 Learning RMNs  (Reg1.Grade,Reg2.Grade) P(Grades,Intelligence,Difficulty) Difficulty Intelligence Grade low / higheasy / hard ABCABC Intelligence Grade L = log Difficulty

76 © Daphne Koller, 2003 Outline Relational Bayesian networks Likelihood function ML parameter estimation EM Structure learning Relational Markov networks Parameter estimation Applications: Collective classification – web data Relational clustering – biological data

77 © Daphne Koller, 2003 Model Structure Probabilistic Relational Model Course Student Reg Training Data New Data Learning Inference Conclusions Collective Classification Train on one year of student intelligence, course difficulty, and grades Given only grades in following year, predict all students’ intelligence Example:

78 © Daphne Koller, 2003 Discriminative Training Goal: Given values of observed variables .O=o, predict desired target values .T=t* Do not necessarily want the model to fit the joint distribution P( .O=o, .T=t*) To maximize classification accuracy, we can consider other optimization criteria Maximize conditional log likelihood Maximize margin P(second highest probability label) [Taskar, Abbeel, K., 2002; Taskar, Guestrin, K. 2003]

79 © Daphne Koller, 2003 Web  KB Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Project-of Member [Craven et al.]

80 © Daphne Koller, 2003 Web Classification Experiments WebKB dataset Four CS department websites Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations

81 © Daphne Koller, 2003 Professor department extract information computer science machine learning … Standard Classification Categories: faculty course project student other Discriminatively trained model = Logistic Regression Page... Category Word 1 Word N

82 © Daphne Koller, 2003 Standard Classification... LinkWord N workin g with Tom Mitchell … Page... Category Word 1 Word N 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Logistic

83 © Daphne Koller, 2003 Power of Context Professor?Student?Post-doc?

84 © Daphne Koller, 2003 Collective Classification... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- Compatibility  (From,To) FT

85 © Daphne Koller, 2003 Collective Classification... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- LogisticLinks Classify all pages collectively, maximizing the joint label probability 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

86 © Daphne Koller, 2003 More Complex Structure C Wn W1 Faculty S Students S Courses

87 © Daphne Koller, 2003 More Complex Structure

88 © Daphne Koller, 2003 More Complex Structure C Wn W1 Faculty S Students S Courses

89 © Daphne Koller, 2003 Collective Classification: Results 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 LogisticLinksSectionLink+Section 35.4% relative reduction in error relative to strong flat approach [Taskar, Abbeel, K., 2002]

90 © Daphne Koller, 2003 Model Structure Probabilistic Relational Model Course Stude nt Reg Unlabeled Relational Data Learning Relational Clustering Given only students’ grades, cluster similar students Example: Clustering of instances

91 © Daphne Koller, 2003 Movie Data Internet Movie Database http://www.imdb.com

92 © Daphne Koller, 2003 Actor Director Movie Genres Rating Year #Votes MPAA Rating Discovering Hidden Types Type [Taskar, Segal, K., 2001]

93 © Daphne Koller, 2003 Directors Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Actors Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger … Movies Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson … Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Discovering Hidden Types

94 © Daphne Koller, 2003 Biology 101: Pathways Pathways are sets of genes that act together to achieve a common function

95 © Daphne Koller, 2003 Biology 101: Expression Gene 2 Coding Control Gene 1 Coding Control DNA RNAProtein Swi5 Transcription factor Swi5 Cells express different subsets of their genes in different tissues and under different conditions

96 © Daphne Koller, 2003 Pathway IIPathway IIIPathway I Finding Pathways: Attempt I Use protein-protein interaction data

97 © Daphne Koller, 2003 Finding Pathways: Attempt I Use protein-protein interaction data Problems: Data is very noisy Structure is lost: Large connected component (3527/3589 genes) in interaction graph

98 © Daphne Koller, 2003 Finding Pathways: Attempt II Use gene expression data Thousands of arrays available under different conditions Clustering Pathway I Pathway II

99 © Daphne Koller, 2003 Finding Pathways: Attempt II Use gene expression data Thousands of arrays available under different conditions Pathway I Pathway II Problems: Expression is only ‘weak’ indicator of interaction Data is noisy Interacting pathways are not separable

100 © Daphne Koller, 2003 Finding Pathways: Our Approach Use both types of data to find pathways Find “active” interactions using gene expression Find pathway-related co-expression using interactions Pathway I Pathway II Pathway III Pathway IV

101 © Daphne Koller, 2003 Probabilistic Model Genes are partitioned into “pathways”: Every gene is assigned to one of ‘k’ pathways Random variable for each gene with domain {1,…,k} Expression component: Model likelihood is higher when genes in the same pathway have similar expression profiles Interaction component: Model likelihood is higher when genes in the same pathway interact [Segal, Wang, K., 2003]

102 © Daphne Koller, 2003 Expression Component g.C=1 Naïve Bayes Pathway of gene g 0 0 g.E 1 g.E 2 g.E k g.E 3 … Expression level of gene g in m arrays 00

103 © Daphne Koller, 2003 Protein Interaction Component Gene 1 g 1.C g 2.C Gene 2 protein product interaction Interacting genes are more likely to be in the same pathway Compatibility potential  (g.C,g.C) g 1.C g 2.C 123123123123123123 111222333111222333  1

104 © Daphne Koller, 2003 Joint Probabilistic Model Gene 1 g 1.C g 2.C g 3.C g 4.C Gene 2 Gene 3 Gene 4 g 1.E 1 g 1.E 2 g 1.E 3 g 2.E 1 g 2.E 2 g 2.E 3 g 3.E 1 g 3.E 2 g 3.E 3 g 4.E 1 g 4.E 2 g 4.E 3  1 123123123123123123 111222333111222333 1 2 3 0 1 1 2 3 0 0 Path. I

105 © Daphne Koller, 2003 Learning Task E-step: compute pathway assignments M-step: Estimate Gaussian distribution parameters Estimate compatibility potentials using cross-validation g 1.C g 2.C g 3.C g 4.C g 1.E 1 g 1.E 2 g 1.E 3 g 2.E 1 g 2.E 2 g 2.E 3 g 3.E 1 g 3.E 2 g 3.E 3 g 4.E 1 g 4.E 2 g 4.E 3  1 123123123123123123 111222333111222333 1 2 3 0 0 Path. I Large Markov network with high connectivity  Use loopy belief propagation

106 © Daphne Koller, 2003 Capturing Protein Complexes Independent data set of interacting proteins 0 50 100 150 200 250 300 350 400 0102030405060708090100 Complex Coverage (%) Num Complexes Our method Standard expression clustering 124 complexes covered at 50% for our method 46 complexes covered at 50% for clustering

107 © Daphne Koller, 2003 YHR081W RRP40 RRP42 MTR3 RRP45 RRP4 RRP43 DIS3 TRM7 SKI6 RRP46 CSL4 RNAse Complex Pathway YHR081W SKI6 RRP42 RRP45 RRP46 RRP43 TRM7 RRP40 MTR3 RRP4 DIS3 CSL4 Includes all 10 known pathway genes Only 5 genes found by clustering

108 © Daphne Koller, 2003 Interaction Clustering RNAse complex found by interaction clustering as part of cluster with 138 genes

109 © Daphne Koller, 2003 Uncertainty about Domain Structure or PRMs are not just template BNs/MNs

110 © Daphne Koller, 2003 Structural Uncertainty Class uncertainty: To which class does an object belong Relational uncertainty: What is the relational (link) structure Identity uncertainty: Which “names” refer to the same objects Also covers data association

111 © Daphne Koller, 2003 Relational Uncertainty Probability distribution over graph structures Link existence model E.g., hyperlinks between webpages Each potential link is a separate event Its existence is a random variable Link reference model E.g., instructors teaching a course Distribution over # of endpoints for each outgoing link Each link has distribution over link endpoint e.g., instructor link for CS course likely to point to CS prof Many other models possible [Getoor, Friedman, K. Taskar, 2002]

112 © Daphne Koller, 2003 Link Existence Model Background knowledge  is an object skeleton A set of entity objects PRM defines distribution over worlds  Assignments of values to all attributes Existence of links between objects [Getoor, Friedman, K. Taskar, 2002]

113 © Daphne Koller, 2003 Link Page-from Type Words Type Words Page-to Link Existence as a PRM Define objects for any potential links Potential link object for any pair of webpages w1, w2 Each potential link object has link existence attribute, denoting whether the link exists or not Link existence variables have probabilistic model [Getoor, Friedman, K. Taskar, 2002] From.Type To.Type 0.9990001 Student FalseTrue Professor Student0.9950005 Professor 0.9990001 Professor Student0.9980002 Exists

114 © Daphne Koller, 2003 Why Are Link Models Useful? Predict which links exist in the world Which professor teaches each course Which student will register for which course Use known links to infer values of attributes Given that student registered for a hard class, is she more likely to be a graduate student Given that one page points to another, is it more likely to be a faculty page?

115 © Daphne Koller, 2003 Predicting Relationships Predict and classify relationships between objects Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Member

116 © Daphne Koller, 2003 Rel Flat Model... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N NONE advisor instructor TA member project-of

117 © Daphne Koller, 2003 Collective Classification: Links Rel... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N Category [Taskar, Wong, Abbeel, K., 2002]

118 © Daphne Koller, 2003 Link Model...

119 © Daphne Koller, 2003 Triad Model ProfessorStudent Group Advisor Member

120 © Daphne Koller, 2003 Triad Model ProfessorStudent Course Advisor TA Instructor

121 © Daphne Koller, 2003 Link Prediction: Results Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (  30% for all models)... 72.9% relative reduction in error relative to strong flat approach [Taskar, Wong, Abbeel, K., 2002]

122 © Daphne Koller, 2003 Identity Uncertainty Model Background knowledge  is an object universe A set of potential objects PRM defines distribution over worlds  Assignments of values to object attributes Partition of objects into equivalence classes Objects in same class have same attribute values [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

123 © Daphne Koller, 2003 Citation Matching Model* Each citation object associated with paper object Uncertainty over equivalence classes for papers If P 1 =P 2, have same attributes & links Author Name Citation ObsTitle Text * Simplified Author-as-Cited Name Paper Title PubType Appears-in Refers-to Written-by Link chain: Appears-in. Refers-to. Written-by Title, PubType Authors

124 © Daphne Koller, 2003 Identity Uncertainty Depending on choice of equivalence classes: Number of objects changes Dependency structure changes No “nice” corresponding ground BN Algorithm: Each partition hypothesis defines simple BN Use MCMC over equivalence class partition Exact inference over resulting BN defines acceptance probability for Markov chain [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

125 © Daphne Koller, 2003 Identity Uncertainty Results Accuracy of citation recovery: % of actual citation clusters recovered perfectly [Pasula, Marthi, Milch, Russell, Shpitser, 2002] 61.5% relative reduction in error relative to state of the art

126 © Daphne Koller, 2003 Summary: PRMs … Inherit the advantages of graphical models: Coherent probabilistic semantics Exploit structure of local interactions Allow us to represent the world in terms of: Objects Classes of objects Properties of objects Relations

127 © Daphne Koller, 2003 So What Do We Gain? Convenient language for specifying complex models “Web of influence”: subtle & intuitive reasoning A mechanism for tying parameters and structure within models across models Framework for learning from relational and heterogeneous data

128 © Daphne Koller, 2003 So What Do We Gain? New way of thinking about models & problems Incorporating heterogeneous data by connecting related entities New problems: Collective classification Relational clustering Uncertainty about richer structures: Link graph structure Identity

129 © Daphne Koller, 2003 But What Do We Really Gain? Simple PRMs  relational logic w. fixed domain and  only Induce a “propositional” BN Can augment language with additional expressivity Existential quantifiers & functions Equality Resulting language is inherently more expressive, allowing us to represent distributions over worlds where dependencies vary significantly [Getoor et al., Pasula et al.] worlds with different numbers of objects [Pfeffer et al., Pasula et al.] worlds with infinitely many objects [Pfeffer & K.] Big questions: Inference & Learning Are PRMs just a convenient language for specifying attribute-based graphical models?

130 © Daphne Koller, 2003 http://robotics.stanford.edu/~koller/  Pieter Abbeel  Nir Friedman  Lise Getoor  Carlos Guestrin  Brian Milch  Avi Pfeffer  Eran Segal  Ken Takusagawa  Ben Taskar  Haidong Wang  Ming-Fai Wong http://dags.stanford.edu/


Download ppt "© Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University."

Similar presentations


Ads by Google