Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint.

Similar presentations


Presentation on theme: "1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint."— Presentation transcript:

1 1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos

2 2 Synopsis of LHL Input: Relational DB Advises PeteSam PeteSaul PaulSara …… TAs SamCS1 SamCS2 SaraCS1 …… Teaches PeteCS1 PeteCS2 PaulCS2 …… 2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c) -1.1 TAs(s, c) Æ Advises (s, p) … Output: Probabilistic KB Input: Relational DB Advises PeteSam PeteSaul PaulSar …… TAs SamCS1 SamCS2 SaraCS1 …… Teaches PeteCS1 PeteCS2 PaulCS2 …… 2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c) -1.1 TAs(s, c) ) Advises(s, p) … Output: Probabilistic KB Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue TAs Advises Teaches Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises Professor Student Course Goal of LHL

3 3 Area under Prec Recall Curve (AUC) Conditional Log-Likelihood (CLL) LHL BUSLMSL LHLBUSLMSL Experimental Results LHL BUSL MSL BUSLMSL

4 Background Learning via Hypergraph Lifting Experiments Future Work Background Learning via Hypergraph Lifting Experiments Future Work 4 Outline

5 5 Markov Logic A logical KB is a set of hard constraints on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, it becomes less probable, not impossible Give each formula a weight (Higher weight  Stronger constraint)

6 6 Markov Logic A Markov logic network (MLN) is a set of pairs (F,w) F is a formula in first-order logic w is a real number vector of truth assignments to ground atoms partition function weight of i th formula #true groundings of i th formula

7 Challenging task Few approaches to date [Kok & Domingos, ICML’05; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08] Most MLN structure learners Greedily and systematically enumerate formulas Computationally expensive; large search space Susceptible to local optima 7 MLN Structure Learning

8 While beam not empty Add unit clauses to beam While beam has changed For each clause c in beam c’ à add a literal to c newClauses à newClauses [ c’ beam à k best clauses in beam [ newClauses Add best clause in beam to MLN 8 MSL [Kok & Domingos, ICML’05]

9 Find paths of linked ground atoms ! formulas Path ´ conjunction that is true at least once Exponential search space of paths Restricted to short paths 9 Relational Pathfinding [Richards & Mooney, AAAI’92] Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue Teaches TAs Advises Pete CS1 Sam Advises(Pete, Sam) Æ Teaches(Pete, CS1) Æ TAs(Sam, CS1)Advises( p, s ) Æ Teaches( p, c ) Æ TAs( s, c )

10 Find short paths with a form of relational pathfinding Path ! Boolean variable ! Node in Markov network Greedily tries to link the nodes with edges Cliques ! clauses Form disjunctions of atoms in clique’s nodes Greedily adds clauses to an empty MLN 10 BUSL [Mihalkova & Mooney, ICML’07] Advises( p,s) Æ Teaches(p,c) TAs(s,c) … Advises(p,s) V Teaches(p,c) V TAs(s,c) : Advises(p,s) V : Teaches(p,c) V TAs(s,c) …

11 Background Learning via Hypergraph Lifting Experiments Future Work 11 Outline

12 Uses relational pathfinding to fuller extent Induces a hypergraph over clusters of constants 12 Learning via Hypergraph Lifting (LHL) Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue Teaches TAs Advises Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises “ Lift ”

13 Uses a hypergraph (V,E) V : set of nodes E : set of labeled, non-empty, ordered subsets of V Find paths in a hypergraph Path: set of hyperedges s.t. for any two e 0 and e n, 9 sequence of hyperedges in set that leads from e 0 Ã e n 13 Learning via Hypergraph Lifting (LHL)

14 Relational DB can be viewed as hypergraph Nodes ´ Constants Hyperedges ´ True ground atoms 14 DB Advises PeteSam PeteSaul PaulSara …… TAs SamCS1 SamCS2 SaraCS1 …… Teache s PeteCS1 PeteCS2 PaulCS2 …… Learning via Hypergraph Lifting (LHL) Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue TAs Advises Teaches

15 LHL “lifts” hypergraph into more compact rep. Jointly clusters nodes into higher-level concepts Clusters hyperedges Traces paths in lifted hypergraph 15 LHL = Clustering + Relational Pathfinding Sam Pete CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Paul Pat Phil Sara Saul Sue Teaches TAs Advises Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises “ Lift ”

16 LHL has three components LiftGraph: Lifts hypergraph FindPaths: Finds paths in lifted hypergraph CreateMLN: Creates rules from paths, and adds good ones to empty MLN 16 Learning via Hypergraph Lifting

17 Defined using Markov logic Jointly clusters constants in bottom-up agglomerative manner Allows information to propagate from one cluster to another Ground atoms also clustered #Clusters need not be specified in advance Each lifted hyperedge contains ¸ one true ground atom 17 LiftGraph

18 Find cluster assignment C that maxmizes posterior prob. P(C | D) / P(D | C) P(C) 18 Learning Problem in LiftGraph Truth values of ground atoms Defined with an MLN Defined with another MLN

19 For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule 19 LiftGraph’s P(D|C) MLN Pete Paul Pat Phil Professor Student Sam Sara Saul Sue Teaches TAs Advises CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course Pete Paul Pat Phil Professor CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course Teaches

20 For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course 20 LiftGraph’s P(D|C) MLN Pete Paul Pat Phil Professor Teaches Teaches(p,c ) p 2Æ c 2 )

21 For each predicate r, we have a default atom prediction rule 21 LiftGraph’s P(D|C) MLN Pete Paul Pat Phil Professor CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Course Sam Sara Saul Sue x 2 Æ y 2 x 2 Pete Paul Pat Phil Professor Æ y 2 … Default Cluster Combination ) Teaches(x,y) Student

22 Each symbol belongs to exactly one cluster Infinite weight Exponential prior on #cluster combinations Negative weight - ¸ 22 LiftGraph’s P(C) MLN

23 Hard assignments of constants to clusters Weights and log-posterior computed in closed form Searches for cluster assignment with highest log-posterior 23 LiftGraph

24 24 LiftGraph’s Search Algm Pete Paul CS1 CS2 CS3 Sam Sara Teaches Advises Pete Paul Pete Paul

25 25 LiftGraph’s Search Algm CS1 CS2 CS3 Sam Sara Teaches Advises Pete Paul CS1 CS2 CS1 CS2 CS3 CS1 CS2 CS3 Sam Sara Sam Sara Teaches Advises

26 26 FindPaths Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches TAs Advises Paths Found Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Advises (, ) Advises (, ), Teaches (, ) Advises (, ), Teaches (, ), TAs (, )

27 Advises (, ), Pete Paul Pat Phil Sam Sara Saul Sue Teaches (, ), CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Pete Paul Pat Phil TAs (, ) Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 : Advises(p, s) V : Teaches(p, c) V : TAs(s, c) 27 Clause Creation Advises (, ) Pete Paul Pat Phil Sam Sara Saul Sue Teaches (, ) CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Pete Paul Pat Phil TAs (, ) Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Æ Æ Advises (, ) Teaches (, ) TAs (, ) Æ Æ p p s s c c Advises(p, s) Æ Teaches(p, c) Æ TAs(s, c) Advises(p, s) V : Teaches(p, c) V : TAs(s, c) Advises(p, s) V Teaches(p, c) V : TAs(s, c) …

28 28 Clause Pruning : Advises(p, s) V : Teaches(p, c) V TAs(s, c) Advises(p, s) V : Teaches(p, c) V TAs(s, c) … : Advises(p, s) V : Teaches(p, c) : Advises(p, s) V TAs(s, c) … : Advises(p, s) : Teaches(p, c) : Teaches(p, c) V TAs(s, c) TAs(s, c) Score -1.15 -1.17 -2.21 -2.23 -2.03 -3.13 -2.93 -3.93 … … `

29 29 Clause Pruning : Advises(p, s) V : Teaches(p, c) V TAs(s, c) Advises(p, s) V : Teaches(p, c) V TAs(s, c) … : Advises(p, s) V : Teaches(p, c) : Advises(p, s) V TAs(s, c) … : Advises(p, s) : Teaches(p, c) : Teaches(p, c) V TAs(s, c) TAs(s, c) Score -1.15 -1.17 -2.21 -2.23 -2.03 -3.13 -2.93 -3.93 … … Compare each clause against its sub-clauses (taken individually)

30 Add clauses to empty MLN in order of decreasing score Retrain weights of clauses each time clause is added Retain clause in MLN if overall score improves 30 MLN Creation

31 Background Learning via Hypergraph Lifting Experiments Future Work 31 Outline

32 IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones 32 Datasets

33 C ora Citations to computer science papers Papers, authors, titles, etc., and their relationships 687,422 ground atoms; 42,558 true ones 33 Datasets

34 Five-fold cross validation Inferred prob. true for groundings of each predicate Groundings of all other predicates as evidence Evaluation measures Area under precision-recall curve (AUC) Average conditional log-likelihood (CLL) 34 Methodology

35 MCMC inference algms in Alchemy to evaluate the test atoms 1 million samples 24 hours 35 Methodology

36 Compared with MSL [Kok & Domingos, ICML’05] BUSL [Mihalkova & Mooney, ICML’07] Lesion study NoLiftGraph: LHL with no hypergraph lifting Find paths directly from unlifted hypergraph NoPathFinding: LHL with no pathfinding Use MLN representing LiftGraph 36 Methodology

37 37 LHL vs. BUSL vs. MSL Area under Prec-Recall Curve System IMDBUW-CSE AUCCLLAUCCLL LHL 0.69 § 0.01-0.13 § 0.000.22 § 0.01-0.04 § 0.00 BUSL0.47 § 0.01-0.14 § 0.000.21 § 0.01-0.05 § 0.00 MSL0.47 § 0.01-0.17 § 0.000.18 § 0.01-0.57 § 0.00 LHLBUSLMSL IMDB UW-CSE Cora System Cora AUCCLL LHL 0.87 § 0.00-0.26 § 0.00 BUSL0.17 § 0.00-0.37 § 0.00 MSL0.17 § 0.00-0.37 § 0.00 LHLBUSLMSL LHLBUSLMSL

38 LHL vs. BUSL vs. MSL Conditional Log-likelihood System IMDBUW-CSE AUCCLLAUCCLL LHL 0.69 § 0.01-0.13 § 0.000.22 § 0.01-0.04 § 0.00 BUSL0.47 § 0.01-0.14 § 0.000.21 § 0.01-0.05 § 0.00 MSL0.47 § 0.01-0.17 § 0.000.18 § 0.01-0.57 § 0.00 IMDBUW-CSE System Cora AUCCLL LHL 0.87 § 0.00-0.26 § 0.00 BUSL0.17 § 0.00-0.37 § 0.00 MSL0.17 § 0.00-0.37 § 0.00 Cora LHLBUSLMSL LHLBUSLMSL LHLBUSLMSL

39 39 LHL vs. BUSL vs. MSL Runtime System IMDBUW-CSECora (Minutes)(Hours) LHL 15.63 § 1.887.55 § 1.5114.82 § 1.78 BUSL4.69 § 1.0212.97 § 9.8018.65 § 9.52 MSL0.17 § 0.102.13 § 0.3865.60 § 1.82 UW-CSE IMDB Cora min hr LHLBUSLMSL LHLBUSLMSL LHLBUSLMSL

40 40 LHL vs. NoLiftGraph Area under Prec-Recall Curve System Cora AUCCLL LHL 0.87 § 0.00-0.26 § 0.00 LHL- FindPaths 0.91 § 0.00-0.17 § 0.00 IMDBUW-CSE Cora NoLift Graph LHL NoLift Graph LHL NoLift Graph LHL

41 41 LHL vs. NoLiftGraph Conditional Log-likelihood System Cora AUCCLL LHL 0.87 § 0.00-0.26 § 0.00 LHL- FindPaths 0.91 § 0.00-0.17 § 0.00 IMDBUW-CSE Cora NoLift Graph LHL NoLift Graph LHL NoLift Graph LHL

42 42 System IMDBUW-CSECora (Minutes)(Hours) LHL 15.637.5514.82 LHL- FindPaths 242.41158.245935.5 LHL vs. NoLiftGraph Runtime IMDBUW-CSE Cora min hr NoLift Graph LHL NoLift Graph LHL NoLift Graph LHL

43 43 LHL vs. NoPathFinding System IMDBUW-CSE AUCCLLAUCCLL LHL 0.69 § 0.01-0.13 § 0.000.22 § 0.01-0.04 § 0.00 LHL- LiftGraph 0.45 § 0.01-0.27 § 0.000.14 § 0.01-0.06 § 0.00 AUC CLL IMDB UW-CSE NoPath Finding LHL NoPath Finding LHL NoPath Finding LHL NoPath Finding LHL

44 if a is an actor and d is a director, and they both worked in the same movie, then a probably worked under d if p is a professor, and p co-authored a paper with s, then s is likely a student if papers x and y have same author then x and y are likely to be same paper 44 Examples of Rules Learned

45 Motivation Background Learning via Hypergraph Lifting Experiments Future Work 45 Outline

46 Integrate the components of LHL Integrate LHL with lifted inference [Singla & Domingos, AAAI’08] Construct ontology simultaneously with probabilistic KB Further scale LHL up Apply LHL to larger, richer domains e.g., the Web 46 Future Work

47 LHL = Clustering + Relational Pathfinding “Lifts” data into more compact form Essential for speeding up relational pathfinding LHL outperforms state-of-the-art structure learners 47 Conclusion


Download ppt "1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint."

Similar presentations


Ads by Google