Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS

Slides:

Advertisements

Similar presentations

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Advertisements

Bayesian Belief Propagation

Autonomic Scaling of Cloud Computing Resources

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Discriminative Training of Markov Logic Networks

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)

DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.

Andrew Courter Texas Tech University CS5331.  PKS Why PKS? STRIPS The Databases Inference Algorithm Extended Features  PKS Examples  Conclusion and.

Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.

Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)

School of Computing Science Simon Fraser University Vancouver, Canada.

Optimizing Statistical Information Extraction Programs Over Evolving Text Fei Chen Xixuan (Aaron) Feng Christopher Ré Min Wang.

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Research at Intel Distributed Localization of Modular Robot Ensembles Robotics: Science and Systems 25 June 2008 Stanislav Funiak, Michael Ashley-Rollman.

BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,

1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.

Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.

Boosting Markov Logic Networks

Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.

Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.

Hazy: Making Statistical Applications Easier to Build and Maintain Christopher Ré Computer Sciences University of Wisconsin-Madison.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

CHAPTERS 7, 8 Oliver Schulte Logical Inference: Through Proof to Truth.

第十讲概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models

Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,

Daphne Koller Variable Elimination Graph-Based Perspective Probabilistic Graphical Models Inference.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.

1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From AAAI 2008 William Pentney, Department of Computer Science & Engineering University of.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.

“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.

Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University.

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

FACTORBASE: SQL for Multi-Relational Model Learning Zhensong Qian and Oliver Schulte, Simon Fraser University, Canada 1.Qian, Z.; Schulte, O. The BayesBase.

Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.

RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.

From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.

DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Single-Pass Belief Propagation

1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.

Community detection via random walk Draft slides.

Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.

Artificial Intelligence Lecture No. 8 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Happy Mittal Advisor : Parag Singla IIT Delhi Lifted Inference Rules With Constraints.

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Markov Random Fields with Efficient Approximations

Markov Logic Networks for NLP CSCI-GA.2591

Knowledge Base Completion

Lecture 20: Scaling Inference with RDBMS

Dong Deng+, Yu Jiang+, Guoliang Li+, Jian Li+, Cong Yu^

Yingze Wang and Shi-Kuo Chang University of Pittsburgh

Presentation transcript:

Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS Feng Niu, Chris Ré, AnHai Doan, and Jude Shavlik University of Wisconsin-Madison

Tuffy scales up Markov Logic by orders of magnitude using an RDBMS One Slide Summary Machine Reading is a DARPA program to capture knowledge expressed in free-form text Similar challenges in enterprise applications We use Markov Logic, a language that allows rules that are likely – but not certain – to be correct Markov Logic yields high quality, but current implementations are confined to small scales Tuffy scales up Markov Logic by orders of magnitude using an RDBMS

Outline Markov Logic Tuffy the System Data model Query language Inference = grounding then search Tuffy the System Scaling up grounding with RDBMS Scaling up search with partitioning

A Familiar Data Model Datalog? EDB IDB Markov Logic program Datalog + Weights ≈ Markov Logic Relations with known facts Relations to be predicted EDB IDB

* [Richardson & Domingos 2006] Markov Logic* Syntax: a set of weighted logical rules Weights: cost for rule violation Semantics: a distribution over possible worlds Each possible world 𝐼 incurs total cost cost(𝐼) Pr[𝐼] ∝⁡ exp⁡(−cost(𝐼)) Thus most likely world has lowest cost 3 wrote(s,t) ∧ advisedBy(s,p)  wrote(p,t) // students’ papers tend to be co-authored by advisors exponential models * [Richardson & Domingos 2006]

Markov Logic by Example Rules 3 wrote(s,t) ∧ advisedBy(s,p)  wrote(p,t) // students’ papers tend to be co-authored by advisors 5 advisedBy(s,p) ∧ advisedBy(s,q)  p = q // students tend to have at most one advisor ∞ advisedBy(s,p)  professor(p) // advisors must be professors EDB IDB Evidence wrote(Tom, Paper1) wrote(Tom, Paper2) wrote(Jerry, Paper1) professor(John) … Query advisedBy(?, ?) // who advises whom

Inference Rules Inference Query Relations Evidence Relations MAP regular tuples Marginal tuple probabilities

Inference Rules Query Relations Grounding Search Evidence Relations Find tuples that are relevant (to the query) Find tuples that are true (in most likely world)

How to Perform Inference Step 1: Grounding Instantiate the rules wrote(s, t) ∧ advisedBy(s, p)  wrote(p, t) 3 wrote(Tom, P1) ∧ advisedBy(Tom, Jerry)  wrote (Jerry, P1) 3 wrote(Tom, P1) ∧ advisedBy(Tom, Chuck)  wrote (Chuck, P1) 3 wrote(Chuck, P1) ∧ advisedBy(Chuck, Jerry)  wrote (Jerry, P1) 3 wrote(Chuck, P2) ∧ advisedBy(Chuck, Jerry)  wrote (Jerry, P2) … Grounding

How to Perform Inference Step 1: Grounding Instantiated rules  Markov Random Field (MRF) A graphical structure of correlations 3 wrote(Tom, P1) ∧ advisedBy(Tom, Jerry)  wrote (Jerry, P1) 3 wrote(Tom, P1) ∧ advisedBy(Tom, Chuck)  wrote (Chuck, P1) 3 wrote(Chuck, P1) ∧ advisedBy(Chuck, Jerry)  wrote (Jerry, P1) 3 wrote(Chuck, P2) ∧ advisedBy(Chuck, Jerry)  wrote (Jerry, P2) … Nodes: Truth values of tuples Edges: Instantiated rules

How to Perform Inference Step 2: Search Problem: Find most likely state of the MRF (NP-hard) Algorithm: WalkSAT*, random walk with heuristics Remember lowest-cost world ever seen advisee advisor Tom Jerry Chuck Search False True * [Kautz et al. 2006]

Outline Markov Logic Tuffy the System Data model Query language Inference = grounding then search Tuffy the System Scaling up grounding with RDBMS Scaling up search with partitioning

Challenge 1: Scaling Grounding Previous approaches Store all data in RAM Top-down evaluation [Singla and Domingos 2006] [Shavlik and Natarajan 2009] RAM size quickly becomes bottleneck Even when runnable, grounding takes long time

3 wrote(s, t) ∧ advisedBy(s, p)  wrote(p, t) Grounding in Alchemy* Prolog-style top-down grounding with C++ loops Hand-coded pruning, reordering strategies 3 wrote(s, t) ∧ advisedBy(s, p)  wrote(p, t) For each person s: For each paper t: If !wrote(s, t) then continue For each person p: If wrote(p, t) then continue Emit grounding using <s, t, p> Grounding sometimes accounts for over 90% of Alchemy’s run time [*] reference system from UWash

Grounding in Tuffy Encode grounding as SQL queries Executed and optimized by RDBMS

Grounding Performance Tuffy achieves orders of magnitude speed-up Relational Classification Entity Resolution Alchemy [C++] 68 min 420 min Tuffy [Java + PostgreSQL] 1 min 3 min Evidence tuples 430K 676 Query tuples 10K 16K Rules 15 3.8K Yes, join algorithms & optimizer are the key!

Challenge 2: Scaling Search Grounding Search

Challenge 2: Scaling Search First attempt: pure RDBMS, search also in SQL No-go: millions of random accesses Obvious fix: hybrid architecture RDBMS RAM Grounding Search Alchemy Tuffy-DB Tuffy Problem: stuck if |MRF | > |RAM|!

Partition to Scale up Search Observation MRF sometimes have multiple components Solution Partition graph into components Process in turn

Effect of Partitioning Pro Con (?) Motivated by scalability Willing to sacrifice quality Scalability Parallelism What’s the effect on quality?

Partitioning Hurts Quality? Alchemy took over 1 hr. Quality similar to Tuffy-no-part Relational Classification Goal: lower the cost quickly Partitioning can actually improve quality!

Partitioning (Actually) Improves Quality Reason: Tuffy Tuffy-no-part WalkSATiteration cost1 cost2 cost1 + cost2 1 5 20 25 min

Partitioning (Actually) Improves Quality Reason: Tuffy Tuffy-no-part WalkSATiteration cost1 cost2 cost1 + cost2 1 5 20 25 2 10 30 min

Partitioning (Actually) Improves Quality Reason: Tuffy Tuffy-no-part WalkSATiteration cost1 cost2 cost1 + cost2 1 5 20 25 2 10 30 3 min cost[Tuffy] = 10 cost[Tuffy-no-part] = 25

Partitioning (Actually) Improves Quality Theorem (roughly): Under certain technical conditions, component-wise partitioning reduces expected time to hit an optimal state by (2 ^ #components) steps. 100 components  100 years of gap!

In the paper: cost-based trade-off model Further Partitioning Partition one component further into pieces Graph Scalability Quality   Sparse / Dense In the paper: cost-based trade-off model

Conclusion Markov Logic is a powerful framework for statistical inference But existing implementations do not scale Tuffy scales up Markov Logic inference RDBMS query processing is perfect fit for grounding Partitioning improves search scalability and quality Try it out! http://www.cs.wisc.edu/hazy/tuffy