1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.

Slides:

Advertisements

Similar presentations

Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.

Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,

Representing and Querying Correlated Tuples in Probabilistic Databases

Identifying Conditional Independencies in Bayes Nets Lecture 4.

Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.

Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.

Derek Hao Hu, Qiang Yang Hong Kong University of Science and Technology.

Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal, Jian Li & Amol Deshpande.

Probabilistic inference

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Review: Probability Random variables, events Axioms of probability

Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.

Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.

Introduction to Bayesian Networks

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

The Object-Oriented Database System Manifesto Malcolm Atkinson, François Bancilhon, David deWitt, Klaus Dittrich, David Maier, Stanley Zdonik DOOD'89,

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Page 1 Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008 Survey of Graph Database Models.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.

Learning Bayesian Networks for Complex Relational Data

Parallel Databases.

Boosted Augmented Naive Bayes. Efficient discriminative learning of

A paper on Join Synopses for Approximate Query Answering

Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule

Markov Networks.

Lecture 16: Probabilistic Databases

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CAP 5636 – Advanced Artificial Intelligence

Markov Random Fields Presented by: Vladan Radosavljevic.

Class #19 – Tuesday, November 3

CS 188: Artificial Intelligence

Expectation-Maximization & Belief Propagation

Class #16 – Tuesday, October 26

Conditional Random Fields

NER with Models Allowing Long-Range Dependencies

Markov Networks.

Statistical Relational AI

Sequential Learning with Dependency Nets

Probabilistic Databases with MarkoViews

Presentation transcript:

1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010

2 Outline  Background of research  Key contributions  FACTORIE language  Models for information extraction  MCMC with database “assist”  Experimental results  Implications for information extraction more generally

Background of research  McCallum an ML researcher crossing bridge to DB  Mostly tools and apps (incl. IE) for undirected models  “Probabilistic databases” undergoing significant evolution (see survey by Dalvi et al, CACM, 2009):  Early PDB systems attached probabilities to tuples:  0.7: Employs(IBM,John)  0.95 Employs(IBM,Mary) etc  Aggregation queries etc. under global independence  Around 2005, model-based approaches took over, but faced the same issues (expressive power, complexity) as in AI 3

4

5

6

7

8

Key contributions  Increasingly sophisticated CRF-like models for extraction, entity resolution, schema mapping, etc.  FACTORIE for model construction and inference  Efficient MCMC inference on relational worlds  Handles very large models without blowing up  Efficient local computation for each MC step  Integration with database technology:  Possible world = database, MC step = database update  Query evaluation directly on database  Incremental re-evaluation after each MC step 9

Key contributions  Increasingly sophisticated CRF-like models for extraction, entity resolution, schema mapping, etc.  FACTORIE for model construction and inference  Efficient MCMC inference on relational worlds  Handles very large models without blowing up  Efficient local computation for each MC step  Integration with database technology:  Possible world = database, MC step = database update  Query evaluation directly on database  Incremental re-evaluation after each MC step 10

11

12

Factor graphs  Nodes are variables and factors (potentials on sets of variables)  Links connect variables to factors that include them  P(x 1,…,x n ) = Π j F j (s j )/Z and (in this paper) F j (s j ) = exp( ϕ j (s j ) θ j ) w/ features ϕ j  FACTORIE uses loops in a way analogous to BUGS (plates) 13

MCMC (Metropolis-Hastings)  Worlds x, evidence e, posterior π(x) = P(x | e) = P(x,e)/P(e)  Proposal distribution q(x’ | x) determines neighborhood of x  MH samples x’ from q(x’ | x), accepts with probability  α(x’ | x) = min(1, π(x’) q(x | x’) / π(x) q(x’ | x) ) = min(1, P(x’,e) q(x | x’) / P(x,e) q(x’ | x) )  For graphical models (and BLOG), P(x,e) is a product of local conditional probabilities (or potentials)  If the change from x to x’ is local (e.g., a single tuple becomes true or false), almost all terms in P(x,e) and P(x’,e) cancel out  Hence the per-step computation cost is independent of model size 14

Additional efficiency  Pasula and Russell, IJCAI-01Approximate inference for first- order probabilistic languagesApproximate inference for first- order probabilistic languages  MCMC over relational worlds with identity uncertainty  One specific issue: How to sample a new object for a function value?  E.g., sample a Prof as value of Advisor(Student 12 ), where the student’s choice depends on the funding that each Prof has  Gibbs sampling: n Profs => compute probabilities for n networks!  Metropolis-Hastings: propose a new advisor, evaluate the ratio 15

MCMC on values 16 B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 )

MCMC on values 17 B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 )

MCMC on values 18 B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 )

MCMC on values 19 B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 )

MCMC on values 20 B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 ) B(H 1 ) A(H 1 ) Earthquake(R a ) B(H 2 ) A(H 2 ) B(H 3 ) A(H 3 ) Earthquake(R b ) B(H 4 ) A(H 4 ) B(H 5 ) A(H 5 )

Integration with DB technology  Databases are designed for  storing lots of data  efficient processing of queries on lots of data  How much can we borrow from DB technology to help with probabilistic IE? 21

22

23

24

25

26

27

28

29

30

31

32

Optimizing query evaluation  In databases, running a query can be expensive, especially if it involves scanning all the data:  Aggregation, e.g., #{x,y: R(x,y) ^ R(y,x)}  Quantifier alternation, chains of literals, etc.  A materialized view is a cached database table representation of any query result  Incremental view maintenance recomputes the materialized view whenever any tuple changes  E.g., if R(A,B) is set to true, check R(B,A) and add 1  So query can be re-evaluated much faster after each MC step 33

34

Drawbacks of black-box DB technology  Modifying tuples in a disk-resident DB is expensive  DB technology designed mostly for atomic transactions; 500/second on $10K system  Difficult to add new types of optimization, e.g., maintaining efficient summaries (min, etc.)  Not suitable for some data types, e.g., images  A “database” sounds like a “possible world”, but only under Herbrand semantics 35

Experiments - NER 36  Skip-chain CRF includes links between labels for identical tokens (but not across docs!!)

Experiments - NER  Proposal distribution:  Choose up to five documents at random  Choose one label variable at random among these  Choose a label at random  Data: 1788 NYT articles  Query # B-PER labels (evaluate every 10k MC steps) plus/minus 50 Essentially each B-PER decision is independent; Too many parameters, too little context, no parameter uncertainty!

Summary  A serious attempt to create scalable, nontrivial probability models and inference technology for IE  Experiments unconvincing, both for raw efficiency and reasonableness  Not clear if FACTORIE is “elegantly” usable to create very complex models  Some continuing work…. 38

39