© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Relations are Everywhere The web Webpages (& the entities they represent), hyperlinks Corporate databases Customers, products, transactions Social networks People, institutions, friendship links Biological data Genes, proteins, interactions, regulation Sensor data about physical world 3D points, objects, spatial relationships

Relational Data is Different Data instances not independent Topics of linked webpages are correlated Data instances are not identically distributed: Heterogeneous instances (papers, authors) No IID assumption  This is a good thing

Bayesian Networks nodes = variables edges = direct influence Graph structure encodes independence assumptions: Job conditionally independent of Intelligence given Grade ABC CPD P(G|D,I) Job Grade SAT Intelligence Difficulty

Job Full joint distribution specifies answer to any query: P(variable | evidence about others) Reasoning using BNs Grade SAT Intelligence Difficulty SAT

Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Difficulty Grade Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Intell_George Diffic_CS101 Grade_George_CS101 A C These “instances” are not independent

The University BN Difficulty_ Geo101 Difficulty_ CS101 Grade_ Jane_ CS101 Intelligence_ George Intelligence_ Jane Grade_ George_ CS101 Grade_ George_ Geo101

G_Homer G_Bart G_Marge G_LisaG_Maggie G_Harry G_Betty G_Selma B_Harry B_Betty B_Selma B_Homer B_Marge B_Bart B_LisaB_Maggie The Genetics BN G = genotype B = bloodtype

Simple Approach Graphical model with shared parameters … and shared local dependency structure Want to encode this constraint: For human knowledge engineer For network learning algorithm How do we specify which nodes share params? shared (but different) structure across nodes?

Simple Approach II We can write a special-purpose program for each domain: genetic inheritance (family tree imposes constraints) university (course registrations impose constraints) Is there something more general?

Relational Logic General framework for representing: objects & their properties classes of objects with same model relations between objects Represent a model at the template level, and apply it to an infinite set of domains Given finite domain, each instantiation of the model is propositional, but the template is not

Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Student Intelligence Registration Grade Satisfaction Course Difficulty Professor Teaching-Ability Classes Attributes Teach Relations Has In

St. Nordaf University Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence High Hard Easy A C B Hate Smart Weak High Low Easy A B C Like Hate Like Smart Weak

Relational Logic: Summary Vocabulary: Classes of objects: Person, Course, Registration, … Individual objects in a class: George, Jane, … Attributes of these objects: George.Intelligence, Reg1.Grade Relationships between these objects Of(Reg1,George), Teaches(CS101,Smith) A world specifies: A set of objects, each in a class The values of the attributes of all objects The relations that hold between the objects

Binary Relations Any relation can be converted into an object: R(x 1,x 2,…,x k )  new “relation” object y, R 1 (x 1,y), R 2 (x 2,y),…, R k (x k,y) E.g., registrations are “relation objects”  Can restrict attention to binary relations R(x,y)

Relations & Links Binary relations can also be viewed as links: Specify the set of objects related to x via R R(x,y)  y  x.R 1, x  y.R 2 E.g., Teaches(p,c)  p.Courses = {courses c : Teaches(p,c)} c.Instructor = {professors p : Teaches(p,c)}

Relational Bayesian Network Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Student Intelligence Reg Grade Satisfaction Course Difficulty Professor Teaching-Ability [K. & Pfeffer; Poole; Ngo & Haddawy] ABC

Prof. SmithProf. Jones Welcome to CS101 Welcome to Geo101 RBN Semantics Teaching-ability Grade Satisfac Intelligence George Jane Welcome to CS101 Difficulty

Welcome to CS101 low / high The Web of Influence Welcome to Geo101 A C low high easy / hard

Why Undirected Models? Symmetric, non-causal interactions E.g., web: categories of linked pages are correlated Cannot introduce direct edges because of cycles Patterns involving multiple entities E.g., web: “triangle” patterns Directed edges not appropriate “Solution”: Impose arbitrary direction Not clear how to parameterize CPD for variables involved in multiple interactions Very difficult within a class-based parameterization [Taskar, Abbeel, K. 2001]

Markov Networks Laura Noah Mary James Kyle Template potential

Markov Networks: Review A Markov network is an undirected graph over some set of variables V Graph associated with a set of potentials  i Each potential is factor over subset V i Variables in V i must be a (sub)clique in network

Relational Markov Networks Probabilistic patterns hold for groups of objects Groups defined as sets of (typed) elements linked in particular ways Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential [Taskar, Abbeel, K. 2002]

RMN Language Define clique templates All tuples {reg R 1, reg R 2, group G} s.t. In(G, R 1 ), In(G, R 2 ) Compatibility potential  (R 1.Grade, R 2.Grade) Ground Markov network contains potential  (r 1.Grade, r 2.Grade) for all appropriate r 1, r 2

Welcome to CS101 Ground MN (or Chain Graph) Welcome to Geo101 Difficulty Grade Intelligence George Jane Jill Intelligence Geo Study Group CS Study Group Grade

Web  KB Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Project-of Member [Craven et al.]

Professor department extract information computer science machine learning … Standard Classification Categories: faculty course project student other Page... Category Word 1 Word N

Standard Classification... LinkWord N workin g with Tom Mitchell … Page... Category Word 1 Word N 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Logistic test set error 4-fold CV: Trained on 3 universities Tested on 4th Discriminatively trained naïve Markov = Logistic Regression

Power of Context Professor?Student?Post-doc?

Model Structure Probabilistic Relational Model Course Student Reg Training Data New Data Learning Inference Conclusions Collective Classification Train on one year of student intelligence, course difficulty, and grades Given only grades in following year, predict all students’ intelligence Example:

[Taskar, Abbeel, K., 2002] Collective Classification Model... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- LogisticLinks test set error 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Compatibility  (From,To) FT Classify all pages collectively, maximizing the joint label probability

[Taskar, Abbeel, K., 2002] More Complex Structure

C Wn W1 Faculty S Students S Courses

Collective Classification: Results 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 LogisticLinksSectionLink+Section [Taskar, Abbeel, K., 2002] test set error 35.4% error reduction over logistic

Max Conditional Likelihood maximize w EstimationClassification argmax y . x . y* We don’t care about the conditional distribution P ( . y | . x )

margin # labeling mistakes in y Max Margin Estimation [Taskar, Guestrin, K., 2003] (see also [Collins, 2002; Hoffman 2003]) Quadratic program Exponentially many constraints  maximize || w || =1  EstimationClassification argmax y . x . y* What we really want: correct class labels

Max Margin Markov Networks We use structure of Markov network to provide equivalent formulation of QP Exponential only in tree width of network Complexity = max-likelihood classification Can solve approximately in networks where induced width is too large Analogous to loopy belief propagation Can use kernel-based features! SVMs meet graphical models [Taskar, Guestrin, K., 2003]

WebKB Revisited 16.1% relative reduction in error relative to cond. likelihood RMNs

Predicting Relationships Even more interesting: relationships between objects Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Member

Rel Flat Model... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N NONE advisor instructor TA member project-of Introduce exists/type attribute for each potential link Learn discriminative model for this attribute

Collective Classification: Links Rel... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N Category [Taskar, Wong, Abbeel, K., 2002]

Link Prediction: Results Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (  30% for all models)... 72.9% relative reduction in error relative to strong flat approach [Taskar, Wong, Abbeel, K., 2002]

Identity Uncertainty Model Background knowledge  is an object universe A set of potential objects PRM defines distribution over worlds  Assignments of values to object attributes Partition of objects into equivalence classes Objects in same class have same attribute values [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

Citation Matching Model* Each citation object associated with paper object Uncertainty over equivalence classes for papers If P 1 =P 2, have same attributes & links Author Name Citation ObsTitle Text * Simplified Author-as-Cited Name Paper Title PubType Appears-in Refers-to Written-by Link chain: Appears-in. Refers-to. Written-by Title, PubType Authors

Identity Uncertainty Depending on choice of equivalence classes: Number of objects changes Dependency structure changes No “nice” corresponding ground BN Algorithm: Each partition hypothesis defines simple BN Use MCMC over equivalence class partition Exact inference over resulting BN defines acceptance probability for Markov chain [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

Identity Uncertainty Results Accuracy of citation recovery: % of actual citation clusters recovered perfectly [Pasula, Marthi, Milch, Russell, Shpitser, 2002] 61.5% relative reduction in error relative to state of the art

3D Scene Understanding Goal: Understand 3D data in terms of objects and relations “puppet holding stick”

3D Object Models Pose variation Shape variation Object models: Discover object parts Model pose variation in terms of parts Class models: Model shape variation within class Models learned from data

The Dataset Cyberware Scans 4 views, ~125k polygons ~65k points each missing surfaces 70 scans 48 scans

Standard Modeling Pipeline [Allen, Curless, Popovic 2002] 1.Articulated Template 2. Fit Template to Scans 3. Interpolation A lot of human intervention Pose or body shape deformations modeled, but not both Similar to: [Lewis et al. ‘00] [Sloan et al. ’01] [Mohr, Gleicher ’03], …

Data Preprocessing: Registration Task: Establish correspondences between two surfaces [Anguelov et al., 2004]

Generative Model Model mesh XTransformed mesh X’ Deformation / Transformation  Goal: Given model mesh X and data mesh Z, recover transformation  and correspondences C Data mesh Z Data Generation / Correspondences C Correspondence c k specifies which point x ’ i generated point z k [Anguelov, Srinivasan, K., Thrun, Pang, Davis, 2004]

Standard Method: Nonrigid ICP X Z c1c1 c2c2 Correspondences for different points computed independently Poor correspondences Poor transformations

Geodesic Potentials: close  close Nearby points in Z must be nearby in X Correlates pairs of adjacent points z k, z l ZX

Geodesic Potentials: far  far Distant points in Z must be distant in X Correlates pairs of distant points z k, z l Z X

Collective Correspondence Model Scan Point z i Local appearance Model Point x 1 Local appearance Scan Point z j Local appearance Model Point x 2 Local appearance 12…N12…N Link CiCi  (C i,C j ) Deformation potential CjCj (Ci)(Ci) (Cj)(Cj) Appearance potential Label all points collectively, maximizing the joint label probability

Inference is hard! Large model, many edges Exact inference is intractable! Loopy belief propagation: Approximate inference algorithm Passes messages between nodes in graph Often works fairly well in practice Doesn’t always converge When it does, convergence point not always very good

Inference is hard! In our case: With very fine mesh, can converge to poor result Use coarse-grained mesh, and then refine There are O(n 2 ) “farness” constraints Inference in resulting fully connected model is completely intractable Most constraints never relevant Add constraints only as needed

Results: Pose Variation [Anguelov, Srinivasan, K., Thrun, Pang, Davis, 2004]

Results: Shape Variation [Anguelov, Srinivasan, K., Thrun, Pang, Davis, 2004]

Recovering articulated models Input: models, correspondences Output: rigid parts, skeleton [Anguelov, Koller, Pang, Srinivasan, Thrun ‘04]

Recovering Articulation: State of the art Algorithm assigns points to parts independently; ignoring the correlations between the assignments Prone to local minima Each joint is estimated from a separate sequence Skeleton: 9 parts combine [Cheung et al., ‘03]

Recovering articulation [Anguelov et al. ’04] Stages of the process 1. Register meshes using Correlated Correspondences algorithm 2. Cluster surface into rigid parts 3. Estimate joints

Model Structure Probabilistic Relational Model Course Stude nt Reg Unlabeled Relational Data Learning Collective Clustering Given only students’ grades, cluster similar students Given a set of 3D meshes, cluster “related” points Example: Clustering of instances

Learning w. Missing Data: EM Learn joint probabilistic model with hidden vars EM Algorithm applies essentially unchanged E-step computes expected sufficient statistics, aggregated over all objects in class M-step uses ML (or MAP) parameter estimation Key difference: In general, the hidden variables are not independent Computation of expected sufficient statistics requires inference over entire network

P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM low / high easy / hard ABC Courses Students [Dempster et al. 77]

Collective Clustering Model Orig pos Part Transformation New pos Model Point Data Pos Data Point Correspond In Near Orig pos Part Transformation New pos Model Point Data Pos Data Point In Correspond

Associative Markov Nets yiyi yjyj  ij ii For K = 2, can be found using min-cut* For K > 2, solve within factor of 2 of optimal *Greig et al. 89, Kolmogorov & Zabih 02 E.g.: nearby pixels or laser scan points, similar webpages {1, …, K}

Finding Parts Hidden variables: Assignment of points to parts Parameters: Part transformations Optimize using EM algorithm E step performed efficiently using min-cut algorithm Number of clusters determined automatically [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Results: Puppet articulation [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Results: Arm articulation [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Results: 50 Human Scans Tree-shaped skeleton found Rigid parts found [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Modeling Human Deformation Deformed polygon Template polygon Pose deformation Body shape deformation Rigid part rotation Predict from nearby Joint angles Linear subspace (PCA) [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Pose Deformation input Joint angles Deformations output Regression function

Pose deformation [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Body Shape Deformation input output Low-dimensional subspace (PCA)

Shape Deformation Model

Shape Transfer [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Shape Completion Sparse surface markers Find most probable surface w.r.t. model Joint angles  R Body shape  in PCA space Completed surface [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Partial View Completion [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Motion Capture Animation [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Segmentation Train model to assign points to parts Discriminative training using pre-segmented images Collective classification Neighboring points more likely assigned to same part Use associative Markov network, with min-cut for inference [Anguelov, Taskar, Chatalbashev, Gupta, K., Heitz, Ng, 2005]

3D Mapping Laser Range Finder GPS IMU Data provided by: Michael Montemerlo & Sebastian Thrun Label: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million points

Segmentation results Hand labeled 180K test points ModelAccuracy SVM68% V-SVM73% AMN93%

[Anguelov, Taskar, Chatalbashev, Gupta, K., Heitz, Ng, 2005] Segmentation Results Comparison RMN SVMRMN without links ModelAccuracy SVM86.5% V-SVM87.2% AMN94.4% Labels: head, torso, legs, background

Model Based Approach Biological processes are about objects & relations Classes of objects: Genes, experiments, tissues, patients Properties Observed: gene sequence, experiment conditions Hidden: gene function Relations Gene regulation Protein-protein interactions

Biology 101: Gene Expression Gene 2 Coding Control Gene 1 Coding Control DNA RNAProtein Swi5 Transcription factor Swi5 Cells express different subsets of their genes in different tissues and under different conditions

Gene Expression Microarrays Measure mRNA level for all genes in one condition Hundreds of experiments Highly noisy Expression of gene i in experiment j Experiments Genes Induced Repressed

Expression level in each module is a function of expression of regulators Gene Regulation Model Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level Module assignment of gene “g” Expression level of Regulator 1 in experiment Segal et al. (Nature Genetics, 2003)

Module Networks Goal: Discover regulatory modules and their regulators Module genes: set of genes that are similarly controlled Regulation program: expression as function of regulators Modules HAP4  CMK1  true false true false

Global Module Map Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? Hap4 Xbp1Yer184cYap6 Gat1 Ime4Lsg1Msn4Gac1Gis1 Ypl230w Not3Sip2 123 253341 4 26394730423136516 Kin82Cmk1Tpk1Ppt1 8109 Tpk2Pph3 13141517 Bmh1Gcn20 1811 46/50 30/50 Segal et al. (Nature Genetics, 2003)

Wet Lab Experiments Summary 3/3 regulators regulate computationally predicted genes New yeast biology suggested Ypl230w activates protein-folding, cell wall and ATP-binding genes Ppt1 represses phosphate metabolism and rRNA processing Kin82 activates energy and osmotic stress genes Segal et al. (Nature Genetics, 2003)

Human Data Human is a lot more complicated than yeast… More genes, regulators, noise Less ability to perturb the system How do we identify “real” regulatory relationships? Idea: use comparative genomics “Accidental” relationships in expression data uncorrelated in different species Relevant relationships confer selective advantage, and are likely maintained Goal: Discover regulatory modules that are shared across organisms

Gene Experiment Expression Regulator 1 Regulator 2 Regulator 3 Level Organism 2 Module Experiment Gene Expression Regulator 1 Regulator 2 Regulator 3 Level Organism 1 Module Conserved Gene Regulation Model Orthologs are more likely to be in the same module Regulation programs for the same module are more likely to share regulators Goal: Discover regulators that are shared across organisms

Human (90 arrays)Mouse (43 arrays) Conserved Regulation: Data Normal brain (4) Medulloblastoma (60) Other brain tumors (26) Gliomas, AT/RT, PNETs Normal brain (20) Medulloblastoma (23) 3718 human-mouse orthologous gene pairs measured in both human & mouse microarrays 604 candidate regulators based on GO annotations Include both transcription factors & signaling proteins

Does adding mouse data help? Improvement in Expression Prediction Accuracy Test Data Log-Likelihood (gain per gene) Human* C: bonus for assigning orthologs to corresponding modules improvement in expression prediction for unseen arrays By combining expression data from two species, we learn a better model of gene regulation in each * similar results for mouse Human-only module network

Conserved Cyclin D Module 34/38 (human), 34/40 (mouse) genes are shared Significant split on medulloblastoma for both human (p < 0.02) and mouse (p < 10 -6 ), and poor survival in human (p < 0.03) mousehuman 17/22 MB 2/11 MB 23/24 MB0/19 MB

Cyclin D1 & Medulloblastoma Cyclin D1 is known to be an important mediator of Shh- induced proliferation and tumorigenesis in medulloblastoma (Oliver, 2003) mousehuman

Conclusion

Under the Hood: Representation “Correct” modeling granularity is key Too simple  miss important structure Too rich  cannot be identified from the data Relational models provide significant value: Exploiting correlations between related instances Integrating across multiple data sources, multiple levels of abstraction

Under the Hood: Inference Huge graphical models  Exact inference is intractable 3000-50,000 hidden variables Often very densely connected Often use belief propagation, but additional ideas key to scaling, convergence: “Smart” initialization Hierarchical model, coarse to fine Incremental inference, with gradual introduction of constraints Pruning the space using heuristics Important to identify & exploit additional structure Use of min-cut for segmentation

Under the Hood: Learning Relational models inherently based on reuse Fewer parameters to estimate More identifiable Algorithmic ideas key to good accuracy: Phased learning of model components Discriminative training using max-margin approach Important to identify & exploit additional structure Convex optimization (relaxed linear programming) for discriminative training of certain Markov networks

The Web of Influence World contains many different types of objects, spanning multiple scales Objects are related in complex networks When we try to pick out anything by itself, we find that it is bound fast by a thousand invisible cords that cannot be broken, to everything in the universe. John Muir, 1869 “Web of influence” that provides powerful clues for understanding the world

© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Similar presentations

Presentation on theme: "© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Similar presentations

Presentation on theme: "© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback