Download presentation

Presentation is loading. Please wait.

Published byFaith Plair Modified over 2 years ago

1
Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz

2
Modeling Relational Data with Markov Random Fields

3
3 Markov Random Fields Probabilistic model for high-dimensional data: The random variables represent the data, such as whether a person has an attribute or whether a link exists The potentials score different configurations of the data The weights scale the influence of different potentials

4
4 Markov Random Fields Variables and potentials form graphical structure:

5
5 MRFs with Logic One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks Use first-order logic to define templates for potentials -Ground out weighted rules over graph data -The truth table of each ground rule is a potential -Each first-order rule has a weight that becomes the potential’s Richardson and Domingos, 2006

6
6 MRFs with Logic Let be a set of weighted logical rules, where each rule has the general form -Weights and sets and index variables Equivalent clausal form:

7
7 MRFs with Logic Probability distribution:

8
8 MAP (maximum a posteriori) inference seeks a most- probable assignment to the unobserved variables MAP inference is This MAX SAT problem is combinatorial and NP-hard! MAP Inference

9
MAX SAT Relaxation

10
10 Approximate Inference View MAP inference as optimizing rounding probabilities Expected score of a clause is a weighted noisy-or function: Then expected total score is But, is highly non-convex!

11
11 Approximate Inference It is the products in the objective that make it non-convex The expected score can be lower bounded using the relationship between arithmetic and harmonic means: This leads to the lower bound Goemans and Williamson, 1994

12
12 So, we solve the linear program If we set, a greedy rounding method will find a -optimal discrete solution If we set, it improves to ¾-optimal Approximate Inference Goemans and Williamson, 1994

13
Local Consistency Relaxation

14
14 Local Consistency Relaxation LCR is a popular technique for approximating MAP in MRFs -Often simply called linear programming (LP) relaxation -Dual decomposition solves dual to LCR objective Idea: relax search over consistent marginals to simpler set LCR admits fast message-passing algorithms, but no quality guarantees in general Wainwright and Jordan 2008

15
15 Local Consistency Relaxation : pseudomarginals over variable states : pseudomarginals over joint potential states Wainwright and Jordan 2008

16
Unifying the Relaxations

17
17 Analysis Bach et al. AISTATS 2015 j=1 j=2 j=3 and so on…

18
18 Analysis Bach et al. AISTATS 2015

19
19 Analysis We can now analyze each potential’s parameterized subproblem in isolation: Using the KKT conditions, we can find a simplified expression for each solution based on the parameters : Bach et al. AISTATS 2015

20
20 Analysis Bach et al. AISTATS 2015 Substitute back into outer objective

21
21 Analysis Leads to simplified, projected LCR over : Bach et al. AISTATS 2015

22
22 Analysis Bach et al. AISTATS 2015 Local Consistency Relaxation MAX SAT Relaxation

23
23 Consequences MAX SAT relaxation solved with choice of algorithms Rounding guarantees apply to LCR! Bach et al. AISTATS 2015

24
Soft Logic and Continuous Values

25
25 Continuous Values Continuous values can also be interpreted as similarities Or degrees of truth. Łukasiewicz logic is a fuzzy logic for reasoning about imprecise concepts Bach et al. In Preparation

26
26 All Three are Equivalent Local Consistency Relaxation MAX SAT Relaxation Exact MAX SAT for Łukasiewicz logic Bach et al. In Preparation

27
27 Consequences Exact MAX SAT for Łukasiewicz logic is equivalent to relaxed Boolean MAX SAT and local consistency relaxation for logical MRFs So these scalable message-passing algorithms can also be used to reason about similarity, imprecise concepts, etc.! Bach et al. In Preparation

28
Hinge-Loss Markov Random Fields

29
29 Generalizing Relaxed MRFs Relaxed, logic-based MRFs can reason about both discrete and continuous relational data scalably and accurately Define a new distribution over continuous variables: We can generalize this inference objective to be the energy of a new type of MRF that does even more Bach et al. NIPS 12, Bach et al. UAI 13

30
30 Generalizations Arbitrary hinge-loss functions (not just logical clauses) Hard linear constraints Squared hinge losses Bach et al. NIPS 12, Bach et al. UAI 13

31
31 Hinge-Loss MRFs Define hinge-loss MRFs by using this generalized objective as the energy function Bach et al. NIPS 12, Bach et al. UAI 13

32
32 HL-MRF Inference and Learning MAP inference for HL-MRFs always a convex optimization -Highly scalable ADMM algorithm for MAP Supervised Learning -No need to hand-tune weights -Learn from training data -Also highly scalable Unsupervised and Semi-Supervised Learning -New learning algorithm that interleaves inference and parameter updates to cut learning time by as much as 90% (under review) Bach et al. NIPS 12, Bach et al. UAI 13 More in Bert’s talk

33
Probabilistic Soft Logic

34
34 Probabilistic Soft Logic (PSL) Probabilistic programming language for defining HL-MRFs PSL components -Predicates:relationships or properties -Atoms:(continuous) random variables -Rules:potentials

35
35 Example: Voter Identification ? $$ Tweet Status update 5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.

36
36 Example: Voter Identification 0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) spouse colleague spouse friend

37
37 Example: Voter Identification /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.

38
38 PSL Defines HL-MRFs /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”)=1. +=

39
39 Open Source Implementation PSL is implemented as an open source library and programming language interface It’s ready to use for your next project Some of the other groups already using PSL: -Jure Leskovec (Stanford)[West et al., TACL 14] -Dan Jurafsky (Stanford)[Li et al., ArXiv 14] -Ray Mooney (UT Austin)[Beltagy et al., ACL 14] -Kevin Murphy (Google)[Pujara et al., BayLearn 14]

40
40 Other PSL Topics Not discussed here: -Smart grounding -Lazy inference -Distributed inference and learning Future work: -Lifted inference -Generalized rounding guarantees

41
41 psl.cs.umd.edu

42
Applications

43
43 PSL Empirical Highlights Compared with discrete MRFs: Predicting MOOC outcomes via latent engagement (AuPR): Collective ClassificationTrust Prediction PSL81.8%0.7 sec.482 AuPR0.32 sec Discrete79.7%184.3 sec.441 AuPR212.36 sec TechWomen-CivilGenes Lecture Rank0.3330.5080.688 PSL-Direct0.3930.5650.757 PSL-Latent0.5460.8160.818 Bach et al. UAI 13, Ramesh et al. AAAI 14

44
44 PSL Empirical Highlights Improved activity recognition in video: Compared on drug-target interaction prediction: 5 Activities6 Activities HOG47.4%.481 F159.6%.582 F1 PSL + HOG59.8%.603 F179.3%.789 F1 ACD67.5%.678 F183.5%.835 F1 PSL + ACD69.2%.693 F186.0%.860 F1 London et al. CVPR WS 13, Fakhraei et al. TCBB 14 AuPRP@130 Perlman’s Method.564 ±.05.594 ±.04 PSL.617 ±.05.616 ±.04

45
Conclusion

46
46 Conclusion HL-MRFs unite and generalize different ways of viewing fundamental AI problems, including MAX SAT, probabilistic graphical models, and fuzzy logic PSL’s mix of expressivity and scalability makes it a general-purpose tool for network analysis, bioinformatics, NLP, computer vision, and more Many exciting open problems, from algorithms, to theory, to applications

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google