Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa.

Slides:

Advertisements

Similar presentations

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Advertisements

Mean-Field Theory and Its Applications In Computer Vision1 1.

CS188: Computational Models of Human Behavior

Discriminative Training of Markov Logic Networks

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.

A Simple, Greedy Approximation Algorithm for MAX SAT David P. Williamson Joint work with Matthias Poloczek (Frankfurt, Cornell) and Anke van Zuylen (William.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Supervised Learning Recap

Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Global Learning of Type Entailment Rules Jonathan Berant, Ido Dagan, Jacob Goldberger June 21 st, 2011.

Foundations of Comparative Analytics for Uncertainty in Graphs Lise Getoor, University of Maryland Alex Pang, UC Santa Cruz Lisa Singh, Georgetown University.

1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.

School of Computing Science Simon Fraser University Vancouver, Canada.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Approximate Counting via Correlation Decay Pinyan Lu Microsoft Research.

Plan Recognition with Multi- Entity Bayesian Networks Kathryn Blackmond Laskey Department of Systems Engineering and Operations Research George Mason University.

Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Semidefinite Programming

Schedule Introduction Models: small cliques and special potentials Tea break Inference: Relaxation techniques:

1 Optimisation Although Constraint Logic Programming is somehow focussed in constraint satisfaction (closer to a “logical” view), constraint optimisation.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

1 Computer Vision Research  Huttenlocher, Zabih –Recognition, stereopsis, restoration, learning  Strong algorithmic focus –Combinatorial optimization.

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.

Visual Recognition Tutorial

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.

Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.

Planar Cycle Covering Graphs for inference in MRFS The Typhon Algorithm A New Variational Approach to Ground State Computation in Binary Planar Markov.

Minimizing general submodular functions

HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki,

第十讲概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models

Multiplicative Bounds for Metric Labeling M. Pawan Kumar École Centrale Paris Joint work with Phil Torr, Daphne Koller.

Probabilistic Graphical Models

Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,

Lorentz workshop on Human Probabilistic Inferences A complexity-theoretic perspective on the preconditions for Bayesian tractability Johan Kwisthout (joint.

LECTURE 13. Course: “Design of Systems: Structural Approach” Dept. “Communication Networks &Systems”, Faculty of Radioengineering & Cybernetics Moscow.

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago Umut Acar, MPI-SWS Alexander Ihler, UC Irvine Ramgopal Mettu,

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.

Fast and accurate energy minimization for static or time-varying Markov Random Fields (MRFs) Nikos Komodakis (Ecole Centrale Paris) Nikos Paragios (Ecole.

CS774. Markov Random Field : Theory and Application Lecture 02

Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.

CS Statistical Machine learning Lecture 24

An Introduction to Variational Methods for Graphical Models

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Lecture 2: Statistical learning primer for biologists

Weakly Supervised Models of Aspect-Sentiment for Online Course Discussion Forums ARTI RAMESH SHACHI H. KUMAR JAMES FOULDS LISE GETOOR.

Contribution and Proposed Solution Sequence-Based Features Collective Classification with Reports Results of Classification Using Reports Collective Spammer.

Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Brief Intro to Machine Learning CS539

Lecture 7: Constrained Conditional Models

ICS 280 Learning in Graphical Models

CIS 700 Advanced Machine Learning for NLP Inference Applications

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Speaker: Jim-An Tsai Advisor: Professor Jia-ling Koh

Expectation-Maximization & Belief Propagation

Markov Networks.

Presentation transcript:

Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz

Modeling Relational Data with Markov Random Fields

3 Markov Random Fields  Probabilistic model for high-dimensional data:  The random variables represent the data, such as whether a person has an attribute or whether a link exists  The potentials score different configurations of the data  The weights scale the influence of different potentials

4 Markov Random Fields  Variables and potentials form graphical structure:

5 MRFs with Logic  One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks  Use first-order logic to define templates for potentials -Ground out weighted rules over graph data -The truth table of each ground rule is a potential -Each first-order rule has a weight that becomes the potential’s Richardson and Domingos, 2006

6 MRFs with Logic  Let be a set of weighted logical rules, where each rule has the general form -Weights and sets and index variables  Equivalent clausal form:

7 MRFs with Logic  Probability distribution:

8  MAP (maximum a posteriori) inference seeks a most- probable assignment to the unobserved variables  MAP inference is  This MAX SAT problem is combinatorial and NP-hard! MAP Inference

MAX SAT Relaxation

10 Approximate Inference  View MAP inference as optimizing rounding probabilities  Expected score of a clause is a weighted noisy-or function:  Then expected total score is  But, is highly non-convex!

11 Approximate Inference  It is the products in the objective that make it non-convex  The expected score can be lower bounded using the relationship between arithmetic and harmonic means:  This leads to the lower bound Goemans and Williamson, 1994

12  So, we solve the linear program  If we set, a greedy rounding method will find a -optimal discrete solution  If we set, it improves to ¾-optimal Approximate Inference Goemans and Williamson, 1994

Local Consistency Relaxation

14 Local Consistency Relaxation  LCR is a popular technique for approximating MAP in MRFs -Often simply called linear programming (LP) relaxation -Dual decomposition solves dual to LCR objective  Idea: relax search over consistent marginals to simpler set  LCR admits fast message-passing algorithms, but no quality guarantees in general Wainwright and Jordan 2008

15 Local Consistency Relaxation : pseudomarginals over variable states : pseudomarginals over joint potential states Wainwright and Jordan 2008

Unifying the Relaxations

17 Analysis Bach et al. AISTATS 2015 j=1 j=2 j=3 and so on…

18 Analysis Bach et al. AISTATS 2015

19 Analysis  We can now analyze each potential’s parameterized subproblem in isolation:  Using the KKT conditions, we can find a simplified expression for each solution based on the parameters : Bach et al. AISTATS 2015

20 Analysis Bach et al. AISTATS 2015 Substitute back into outer objective

21 Analysis  Leads to simplified, projected LCR over : Bach et al. AISTATS 2015

22 Analysis Bach et al. AISTATS 2015 Local Consistency Relaxation MAX SAT Relaxation

23 Consequences  MAX SAT relaxation solved with choice of algorithms  Rounding guarantees apply to LCR! Bach et al. AISTATS 2015

Soft Logic and Continuous Values

25 Continuous Values  Continuous values can also be interpreted as similarities  Or degrees of truth. Łukasiewicz logic is a fuzzy logic for reasoning about imprecise concepts Bach et al. In Preparation

26 All Three are Equivalent Local Consistency Relaxation MAX SAT Relaxation Exact MAX SAT for Łukasiewicz logic Bach et al. In Preparation

27 Consequences  Exact MAX SAT for Łukasiewicz logic is equivalent to relaxed Boolean MAX SAT and local consistency relaxation for logical MRFs  So these scalable message-passing algorithms can also be used to reason about similarity, imprecise concepts, etc.! Bach et al. In Preparation

Hinge-Loss Markov Random Fields

29 Generalizing Relaxed MRFs  Relaxed, logic-based MRFs can reason about both discrete and continuous relational data scalably and accurately  Define a new distribution over continuous variables:  We can generalize this inference objective to be the energy of a new type of MRF that does even more Bach et al. NIPS 12, Bach et al. UAI 13

30 Generalizations  Arbitrary hinge-loss functions (not just logical clauses)  Hard linear constraints  Squared hinge losses Bach et al. NIPS 12, Bach et al. UAI 13

31 Hinge-Loss MRFs  Define hinge-loss MRFs by using this generalized objective as the energy function Bach et al. NIPS 12, Bach et al. UAI 13

32 HL-MRF Inference and Learning  MAP inference for HL-MRFs always a convex optimization -Highly scalable ADMM algorithm for MAP  Supervised Learning -No need to hand-tune weights -Learn from training data -Also highly scalable  Unsupervised and Semi-Supervised Learning -New learning algorithm that interleaves inference and parameter updates to cut learning time by as much as 90% (under review) Bach et al. NIPS 12, Bach et al. UAI 13 More in Bert’s talk

Probabilistic Soft Logic

34 Probabilistic Soft Logic (PSL)  Probabilistic programming language for defining HL-MRFs  PSL components -Predicates:relationships or properties -Atoms:(continuous) random variables -Rules:potentials

35  Example: Voter Identification ? $$ Tweet Status update 5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.

36   Example: Voter Identification     0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) spouse colleague spouse friend

37 Example: Voter Identification /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.

38 PSL Defines HL-MRFs /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”)=1. +=

39 Open Source Implementation  PSL is implemented as an open source library and programming language interface  It’s ready to use for your next project  Some of the other groups already using PSL: -Jure Leskovec (Stanford)[West et al., TACL 14] -Dan Jurafsky (Stanford)[Li et al., ArXiv 14] -Ray Mooney (UT Austin)[Beltagy et al., ACL 14] -Kevin Murphy (Google)[Pujara et al., BayLearn 14]

40 Other PSL Topics  Not discussed here: -Smart grounding -Lazy inference -Distributed inference and learning  Future work: -Lifted inference -Generalized rounding guarantees

41 psl.cs.umd.edu

Applications

43 PSL Empirical Highlights  Compared with discrete MRFs:  Predicting MOOC outcomes via latent engagement (AuPR): Collective ClassificationTrust Prediction PSL81.8%0.7 sec.482 AuPR0.32 sec Discrete79.7%184.3 sec.441 AuPR sec TechWomen-CivilGenes Lecture Rank PSL-Direct PSL-Latent Bach et al. UAI 13, Ramesh et al. AAAI 14

44 PSL Empirical Highlights  Improved activity recognition in video:  Compared on drug-target interaction prediction: 5 Activities6 Activities HOG47.4%.481 F159.6%.582 F1 PSL + HOG59.8%.603 F179.3%.789 F1 ACD67.5%.678 F183.5%.835 F1 PSL + ACD69.2%.693 F186.0%.860 F1 London et al. CVPR WS 13, Fakhraei et al. TCBB 14 Perlman’s Method.564 ± ±.04 PSL.617 ± ±.04

Conclusion

46 Conclusion  HL-MRFs unite and generalize different ways of viewing fundamental AI problems, including MAX SAT, probabilistic graphical models, and fuzzy logic  PSL’s mix of expressivity and scalability makes it a general-purpose tool for network analysis, bioinformatics, NLP, computer vision, and more  Many exciting open problems, from algorithms, to theory, to applications