1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint.

Slides:

Advertisements

Similar presentations

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Advertisements

Joint Inference in Information Extraction Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Discriminative Training of Markov Logic Networks

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.

Improving the Accuracy and Scalability of Discriminative Learning Methods for Markov Logic Networks Tuyen N. Huynh Adviser: Prof. Raymond J. Mooney PhD.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.

Online Structure Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Department of Computer Science The University of Texas at Austin.

Markov Logic Networks: Exploring their Application to Social Network Analysis Parag Singla Dept. of Computer Science and Engineering Indian Institute of.

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.

Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u

Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.

Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.

Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson

Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.

School of Computing Science Simon Fraser University Vancouver, Canada.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,

1 Statistical Predicate Invention Stanley Kok Dept. of Computer Science and Eng. University of Washington Joint work with Pedro Domingos.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.

Markov Logic Networks: A Unified Approach To Language Processing Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

1 Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University.

Learning, Logic, and Probability: A Unified View Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Stanley Kok, Matt.

Clique Cover Cook’s Theorem 3SAT and Independent Set

1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.

Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.

Boosting Markov Logic Networks

Chapter 2 Modeling and Finding Abnormal Nodes. How to define abnormal nodes ? One plausible answer is : –A node is abnormal if there are no or very few.

Markov Logic Parag Singla Dept. of Computer Science University of Texas, Austin.

1 Transfer Learning by Mapping and Revising Relational Knowledge Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova,

Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Markov Logic And other SRL Approaches

Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,

1 Learning the Structure of Markov Logic Networks Stanley Kok.

Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

1 Extracting Semantic Networks From Text Via Relational Clustering Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA.

Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS

Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From AAAI 2008 William Pentney, Department of Computer Science & Engineering University of.

Modeling Speech Acts and Joint Intentions in Modal Markov Logic Henry Kautz University of Washington.

CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.

Output Grouping Method Based on a Similarity of Boolean Functions Petr Fišer, Pavel Kubalík, Hana Kubátová Czech Technical University in Prague Department.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.

Decision List LING 572 Fei Xia 1/12/06. Outline Basic concepts and properties Case study.

CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.

Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.

1 Bottom-Up Search and Transfer Learning in SRL Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova Tuyen Huynh and.

Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:

Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.

Probabilistic Reasoning Inference and Relational Bayesian Networks.

 2005 SDU Lecture15 P,NP,NP-complete.  2005 SDU 2 The PATH problem PATH = { | G is a directed graph that has a directed path from s to t} s t

Happy Mittal Advisor : Parag Singla IIT Delhi Lifted Inference Rules With Constraints.

Frank DiMaio and Jude Shavlik Computer Sciences Department

Lecture 7: Constrained Conditional Models

Sofus A. Macskassy Fetch Technologies

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Inference in Bayesian Networks

NP-Completeness (2) NP-Completeness Graphs 7/23/ :02 PM x x x x

Probabilistic Data Management

Probabilistic Databases with MarkoViews

Presentation transcript:

1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos

2 Synopsis of LHL Input: Relational DB Advises PeteSam PeteSaul PaulSara …… TAs SamCS1 SamCS2 SaraCS1 …… Teaches PeteCS1 PeteCS2 PaulCS2 …… 2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c) -1.1 TAs(s, c) Æ Advises (s, p) … Output: Probabilistic KB Input: Relational DB Advises PeteSam PeteSaul PaulSar …… TAs SamCS1 SamCS2 SaraCS1 …… Teaches PeteCS1 PeteCS2 PaulCS2 …… 2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c) -1.1 TAs(s, c) ) Advises(s, p) … Output: Probabilistic KB Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue TAs Advises Teaches Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises Professor Student Course Goal of LHL

3 Area under Prec Recall Curve (AUC) Conditional Log-Likelihood (CLL) LHL BUSLMSL LHLBUSLMSL Experimental Results LHL BUSL MSL BUSLMSL

Background Learning via Hypergraph Lifting Experiments Future Work Background Learning via Hypergraph Lifting Experiments Future Work 4 Outline

5 Markov Logic A logical KB is a set of hard constraints on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, it becomes less probable, not impossible Give each formula a weight (Higher weight  Stronger constraint)

6 Markov Logic A Markov logic network (MLN) is a set of pairs (F,w) F is a formula in first-order logic w is a real number vector of truth assignments to ground atoms partition function weight of i th formula #true groundings of i th formula

Challenging task Few approaches to date [Kok & Domingos, ICML’05; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08] Most MLN structure learners Greedily and systematically enumerate formulas Computationally expensive; large search space Susceptible to local optima 7 MLN Structure Learning

While beam not empty Add unit clauses to beam While beam has changed For each clause c in beam c’ Ã add a literal to c newClauses Ã newClauses [ c’ beam Ã k best clauses in beam [ newClauses Add best clause in beam to MLN 8 MSL [Kok & Domingos, ICML’05]

Find paths of linked ground atoms ! formulas Path ´ conjunction that is true at least once Exponential search space of paths Restricted to short paths 9 Relational Pathfinding [Richards & Mooney, AAAI’92] Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue Teaches TAs Advises Pete CS1 Sam Advises(Pete, Sam) Æ Teaches(Pete, CS1) Æ TAs(Sam, CS1)Advises( p, s ) Æ Teaches( p, c ) Æ TAs( s, c )

Find short paths with a form of relational pathfinding Path ! Boolean variable ! Node in Markov network Greedily tries to link the nodes with edges Cliques ! clauses Form disjunctions of atoms in clique’s nodes Greedily adds clauses to an empty MLN 10 BUSL [Mihalkova & Mooney, ICML’07] Advises( p,s) Æ Teaches(p,c) TAs(s,c) … Advises(p,s) V Teaches(p,c) V TAs(s,c) : Advises(p,s) V : Teaches(p,c) V TAs(s,c) …

Background Learning via Hypergraph Lifting Experiments Future Work 11 Outline

Uses relational pathfinding to fuller extent Induces a hypergraph over clusters of constants 12 Learning via Hypergraph Lifting (LHL) Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue Teaches TAs Advises Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises “ Lift ”

Uses a hypergraph (V,E) V : set of nodes E : set of labeled, non-empty, ordered subsets of V Find paths in a hypergraph Path: set of hyperedges s.t. for any two e 0 and e n, 9 sequence of hyperedges in set that leads from e 0 Ã e n 13 Learning via Hypergraph Lifting (LHL)

Relational DB can be viewed as hypergraph Nodes ´ Constants Hyperedges ´ True ground atoms 14 DB Advises PeteSam PeteSaul PaulSara …… TAs SamCS1 SamCS2 SaraCS1 …… Teache s PeteCS1 PeteCS2 PaulCS2 …… Learning via Hypergraph Lifting (LHL) Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue TAs Advises Teaches

LHL “lifts” hypergraph into more compact rep. Jointly clusters nodes into higher-level concepts Clusters hyperedges Traces paths in lifted hypergraph 15 LHL = Clustering + Relational Pathfinding Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue Teaches TAs Advises Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises “ Lift ”

LHL has three components LiftGraph: Lifts hypergraph FindPaths: Finds paths in lifted hypergraph CreateMLN: Creates rules from paths, and adds good ones to empty MLN 16 Learning via Hypergraph Lifting

Defined using Markov logic Jointly clusters constants in bottom-up agglomerative manner Allows information to propagate from one cluster to another Ground atoms also clustered #Clusters need not be specified in advance Each lifted hyperedge contains ¸ one true ground atom 17 LiftGraph

Find cluster assignment C that maxmizes posterior prob. P(C | D) / P(D | C) P(C) 18 Learning Problem in LiftGraph Truth values of ground atoms Defined with an MLN Defined with another MLN

For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule 19 LiftGraph’s P(D|C) MLN Pete Paul Pat Phil Professor Student Sam Sara Saul Sue Teaches TAs Advises CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course Pete Paul Pat Phil Professor CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course Teaches

For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course 20 LiftGraph’s P(D|C) MLN Pete Paul Pat Phil Professor Teaches Teaches(p,c ) p 2Æ c 2 )

For each predicate r, we have a default atom prediction rule 21 LiftGraph’s P(D|C) MLN Pete Paul Pat Phil Professor CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course Sam Sara Saul Sue x 2 Æ y 2 x 2 Pete Paul Pat Phil Professor Æ y 2 … Default Cluster Combination ) Teaches(x,y) Student

Each symbol belongs to exactly one cluster Infinite weight Exponential prior on #cluster combinations Negative weight - ¸ 22 LiftGraph’s P(C) MLN

Hard assignments of constants to clusters Weights and log-posterior computed in closed form Searches for cluster assignment with highest log-posterior 23 LiftGraph

24 LiftGraph’s Search Algm Pete Paul CS1 CS2 CS3 Sam Sara Teaches Advises Pete Paul Pete Paul

25 LiftGraph’s Search Algm CS1 CS2 CS3 Sam Sara Teaches Advises Pete Paul CS1 CS2 CS1 CS2 CS3 CS1 CS2 CS3 Sam Sara Sam Sara Teaches Advises

26 FindPaths Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises Paths Found Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Advises (, ) Advises (, ), Teaches (, ) Advises (, ), Teaches (, ), TAs (, )

Advises (, ), Pete Paul Pat Phil Sam Sara Saul Sue Teaches (, ), CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Pete Paul Pat Phil TAs (, ) Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 : Advises(p, s) V : Teaches(p, c) V : TAs(s, c) 27 Clause Creation Advises (, ) Pete Paul Pat Phil Sam Sara Saul Sue Teaches (, ) CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Pete Paul Pat Phil TAs (, ) Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Æ Æ Advises (, ) Teaches (, ) TAs (, ) Æ Æ p p s s c c Advises(p, s) Æ Teaches(p, c) Æ TAs(s, c) Advises(p, s) V : Teaches(p, c) V : TAs(s, c) Advises(p, s) V Teaches(p, c) V : TAs(s, c) …

28 Clause Pruning : Advises(p, s) V : Teaches(p, c) V TAs(s, c) Advises(p, s) V : Teaches(p, c) V TAs(s, c) … : Advises(p, s) V : Teaches(p, c) : Advises(p, s) V TAs(s, c) … : Advises(p, s) : Teaches(p, c) : Teaches(p, c) V TAs(s, c) TAs(s, c) Score … … `

29 Clause Pruning : Advises(p, s) V : Teaches(p, c) V TAs(s, c) Advises(p, s) V : Teaches(p, c) V TAs(s, c) … : Advises(p, s) V : Teaches(p, c) : Advises(p, s) V TAs(s, c) … : Advises(p, s) : Teaches(p, c) : Teaches(p, c) V TAs(s, c) TAs(s, c) Score … … Compare each clause against its sub-clauses (taken individually)

Add clauses to empty MLN in order of decreasing score Retrain weights of clauses each time clause is added Retain clause in MLN if overall score improves 30 MLN Creation

Background Learning via Hypergraph Lifting Experiments Future Work 31 Outline

IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones 32 Datasets

C ora Citations to computer science papers Papers, authors, titles, etc., and their relationships 687,422 ground atoms; 42,558 true ones 33 Datasets

Five-fold cross validation Inferred prob. true for groundings of each predicate Groundings of all other predicates as evidence Evaluation measures Area under precision-recall curve (AUC) Average conditional log-likelihood (CLL) 34 Methodology

MCMC inference algms in Alchemy to evaluate the test atoms 1 million samples 24 hours 35 Methodology

Compared with MSL [Kok & Domingos, ICML’05] BUSL [Mihalkova & Mooney, ICML’07] Lesion study NoLiftGraph: LHL with no hypergraph lifting Find paths directly from unlifted hypergraph NoPathFinding: LHL with no pathfinding Use MLN representing LiftGraph 36 Methodology

37 LHL vs. BUSL vs. MSL Area under Prec-Recall Curve System IMDBUW-CSE AUCCLLAUCCLL LHL 0.69 § § § § 0.00 BUSL0.47 § § § § 0.00 MSL0.47 § § § § 0.00 LHLBUSLMSL IMDB UW-CSE Cora System Cora AUCCLL LHL 0.87 § § 0.00 BUSL0.17 § § 0.00 MSL0.17 § § 0.00 LHLBUSLMSL LHLBUSLMSL

LHL vs. BUSL vs. MSL Conditional Log-likelihood System IMDBUW-CSE AUCCLLAUCCLL LHL 0.69 § § § § 0.00 BUSL0.47 § § § § 0.00 MSL0.47 § § § § 0.00 IMDBUW-CSE System Cora AUCCLL LHL 0.87 § § 0.00 BUSL0.17 § § 0.00 MSL0.17 § § 0.00 Cora LHLBUSLMSL LHLBUSLMSL LHLBUSLMSL

39 LHL vs. BUSL vs. MSL Runtime System IMDBUW-CSECora (Minutes)(Hours) LHL § § § 1.78 BUSL4.69 § § § 9.52 MSL0.17 § § § 1.82 UW-CSE IMDB Cora min hr LHLBUSLMSL LHLBUSLMSL LHLBUSLMSL

40 LHL vs. NoLiftGraph Area under Prec-Recall Curve System Cora AUCCLL LHL 0.87 § § 0.00 LHL- FindPaths 0.91 § § 0.00 IMDBUW-CSE Cora NoLift Graph LHL NoLift Graph LHL NoLift Graph LHL

41 LHL vs. NoLiftGraph Conditional Log-likelihood System Cora AUCCLL LHL 0.87 § § 0.00 LHL- FindPaths 0.91 § § 0.00 IMDBUW-CSE Cora NoLift Graph LHL NoLift Graph LHL NoLift Graph LHL

42 System IMDBUW-CSECora (Minutes)(Hours) LHL LHL- FindPaths LHL vs. NoLiftGraph Runtime IMDBUW-CSE Cora min hr NoLift Graph LHL NoLift Graph LHL NoLift Graph LHL

43 LHL vs. NoPathFinding System IMDBUW-CSE AUCCLLAUCCLL LHL 0.69 § § § § 0.00 LHL- LiftGraph 0.45 § § § § 0.00 AUC CLL IMDB UW-CSE NoPath Finding LHL NoPath Finding LHL NoPath Finding LHL NoPath Finding LHL

if a is an actor and d is a director, and they both worked in the same movie, then a probably worked under d if p is a professor, and p co-authored a paper with s, then s is likely a student if papers x and y have same author then x and y are likely to be same paper 44 Examples of Rules Learned

Motivation Background Learning via Hypergraph Lifting Experiments Future Work 45 Outline

Integrate the components of LHL Integrate LHL with lifted inference [Singla & Domingos, AAAI’08] Construct ontology simultaneously with probabilistic KB Further scale LHL up Apply LHL to larger, richer domains e.g., the Web 46 Future Work

LHL = Clustering + Relational Pathfinding “Lifts” data into more compact form Essential for speeding up relational pathfinding LHL outperforms state-of-the-art structure learners 47 Conclusion