1 Efficient Stochastic Local Search for MPE Solving Frank Hutter The University of British Columbia (UBC), Vancouver, Canada Joint work with Holger Hoos.

Slides:

Advertisements

Similar presentations

G5BAIM Artificial Intelligence Methods

Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

“Using Weighted MAX-SAT Engines to Solve MPE” -- by James D. Park Shuo (Olivia) Yang.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.

Label Placement and graph drawing Imo Lieberwerth.

Automatic Tuning1/33 Boosting Verification by Automatic Tuning of Decision Procedures Domagoj Babić joint work with Frank Hutter, Holger H. Hoos, Alan.

On the Potential of Automated Algorithm Configuration Frank Hutter, University of British Columbia, Vancouver, Canada. Motivation for automated tuning.

MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Visual Recognition Tutorial

Recent Development on Elimination Ordering Group 1.

Efficient Statistical Pruning for Maximum Likelihood Decoding Radhika Gowaikar Babak Hassibi California Institute of Technology July 3, 2003.

Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 1 Ryan Kinworthy CSCE Advanced Constraint Processing.

REES: Reasoning Engines Evaluation Shell version 3.0 Automated Reasoning Lab University of California, Irvine.

Frank Hutter, Holger Hoos, Kevin Leyton-Brown

MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.

Announcements Readings for today:

Understanding Belief Propagation and its Applications Dan Yuan June 2004.

Ant Colony Optimization Optimisation Methods. Overview.

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

MAE 552 – Heuristic Optimization Lecture 10 February 13, 2002.

Stochastic greedy local search Chapter 7 ICS-275 Spring 2007.

Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.

Distributed Constraint Optimization * some slides courtesy of P. Modi

Ant Colony Optimization: an introduction

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Finding dense components in weighted graphs Paul Horn

Heuristic Optimization Methods

Performing Bayesian Inference by Weighted Model Counting Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Search Methods An Annotated Overview Edward Tsang.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Parallel Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of British Columbia, Vancouver, Canada.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

Variable and Value Ordering for MPE Search Sajjad Siddiqi and Jinbo Huang.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

Experimental Algorithmics Reading Group, UBC, CS Presented paper: Fine-tuning of Algorithms Using Fractional Experimental Designs and Local Search by Belarmino.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms: An Initial Investigation Frank Hutter 1, Youssef Hamadi 2, Holger.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Stochastic greedy local search Chapter 7 ICS-275 Spring 2009.

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Lecture 2: Statistical learning primer for biologists

Performance of Distributed Constraint Optimization Algorithms A.Gershman, T. Grinshpon, A. Meisels and R. Zivan Dept. of Computer Science Ben-Gurion University.

Outline Problem Definition Related Works & Complexity MILP Formulation Solution Algorithms Computational Experiments Conclusions & Future Research 1/26.

William Lam March 20, 2013 (with slides from the IJCAI-09 tutorial on Combinatorial Optimization for Graphical Models) Discrete Optimization via Branch.

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

Different Local Search Algorithms in STAGE for Solving Bin Packing Problem Gholamreza Haffari Sharif University of Technology

Hub Location–Allocation in Intermodal Logistic Networks Hüseyin Utku KIYMAZ.

Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.

Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.

Distributed cooperation and coordination using the Max-Sum algorithm

Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.

Solving MAP Exactly by Searching on Compiled Arithmetic Circuits

Constraint Optimization And counting, and enumeration 275 class

Exact Inference Continued

Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?

Exact Inference Continued

Iterative Join Graph Propagation

Presentation transcript:

1 Efficient Stochastic Local Search for MPE Solving Frank Hutter The University of British Columbia (UBC), Vancouver, Canada Joint work with Holger Hoos ( UBC) and Thomas Stützle ( Darmstadt University of Technology, Germany)

2 SLS: general algorithmic framework for solving combinatorial problems

3 MPE in graphical models: many applications

4 Outline Most probable explanation (MPE) problem Most probable explanation (MPE) problem Problem definition Problem definition Previous work Previous work SLS algorithms for MPE SLS algorithms for MPE Illustration Illustration Previous SLS algorithms Previous SLS algorithms Guided Local Search (GLS) in detail Guided Local Search (GLS) in detail From Guided Local Search to GLS + From Guided Local Search to GLS + Modifications Modifications Performance gains Performance gains Comparison to state-of-the-art Comparison to state-of-the-art

5 MPE - problem definition (in most general representation: factor graphs) Given a factor graph Given a factor graph Discrete Variables X = {X 1,..., X n } Discrete Variables X = {X 1,..., X n } Factors  = {  1,...,  m } over subsets of X Factors  = {  1,...,  m } over subsets of X A factor  i over variables V i µ X assigns a non-negative number to every complete instantiation v i of V i A factor  i over variables V i µ X assigns a non-negative number to every complete instantiation v i of V i Find Find Complete instantiation {x 1,...,x n } maximizing  i=1 m  i [x 1,...,x n ] Complete instantiation {x 1,...,x n } maximizing  i=1 m  i [x 1,...,x n ] NP-hard (simple reduction from SAT) NP-hard (simple reduction from SAT) Also known as Max-product or Maximum a posteriori (MAP) Also known as Max-product or Maximum a posteriori (MAP)

6 Previous approaches for solving MPE Variable elimination / Junction tree Variable elimination / Junction tree Exponential in the graphical model´s induced width Exponential in the graphical model´s induced width Approximation with loopy belief propagation and its generalizations [Yedidia, Freeman, Weiss ´02] Approximation with loopy belief propagation and its generalizations [Yedidia, Freeman, Weiss ´02] Approximation with Mini Buckets (MB) [Dechter & Rish ´97] ! also gives lower & upper bound Approximation with Mini Buckets (MB) [Dechter & Rish ´97] ! also gives lower & upper bound Search algorithms Search algorithms Local Search Local Search Branch and Bound with various MB heuristics [Dechter´s group, ´ ] UAI ´03: B&B with MB heuristic shown to be state-of-the-art Branch and Bound with various MB heuristics [Dechter´s group, ´ ] UAI ´03: B&B with MB heuristic shown to be state-of-the-art

7 Motivation for our work B&B clearly outperforms best SLS algorithm so far, even on random problem instances [Marinescu, Kask, Dechter, UAI ´03] B&B clearly outperforms best SLS algorithm so far, even on random problem instances [Marinescu, Kask, Dechter, UAI ´03] MPE is closely related to weighted Max-SAT [Park ´02] MPE is closely related to weighted Max-SAT [Park ´02] For Max-SAT, SLS is state-of-the-art (at the very least for random problems) For Max-SAT, SLS is state-of-the-art (at the very least for random problems) Why is SLS not state-of-the-art for MPE ? Why is SLS not state-of-the-art for MPE ? Additional problem structure inside the factors Additional problem structure inside the factors But for completely random problems ? But for completely random problems ? SLS algos should be much better than they currently are SLS algos should be much better than they currently are We took the best SLS algorithm so far (GLS) and improved it We took the best SLS algorithm so far (GLS) and improved it

8 Outline Most probable explanation (MPE) problem Most probable explanation (MPE) problem Problem definition Problem definition Previous work Previous work SLS algorithms for MPE SLS algorithms for MPE Illustration Illustration Previous SLS algorithms Previous SLS algorithms Guided Local Search (GLS) in detail Guided Local Search (GLS) in detail From Guided Local Search to GLS + From Guided Local Search to GLS + Modifications Modifications Performance gains Performance gains Comparison to state-of-the-art Comparison to state-of-the-art

9 SLS for MPE – illustration X1X1X1X1 X2X2X2X2 222 X3X3X3X3 444 X1X1 X2X2 X4X4 X3X3 11 22 33 44 55 X2X2X2X2 X3X3X3X3 X4X4X4X4 555 X1X1X1X1 111 X1X1 X3X3 3  i=1 M  i [2,1,0,0] = 0.1 * 0.2 * 2.7 * 0.9 * 33.2 Instantiation:

10 SLS for MPE – illustration X1X1X1X1 X2X2X2X2 222 X3X3X3X3 444 X1X1 X2X2 X4X4 X3X3 11 22 33 44 55 X2X2X2X2 X3X3X3X3 X4X4X4X4 555 X1X1X1X1 111 !01!0 00 X1X1 X3X3 3 Instantiation:  i=1 M  i [2,0,0,0] =  i=1 M  i [2,1,0,0] * *  i=1 M  i [2,0,0,0] =  i=1 M  i [2,1,0,0] * 0.9/0.2 * 10/33.2

11 Previous SLS algorithms for MPE Iterative Conditional Modes [Besag, ´86] Iterative Conditional Modes [Besag, ´86] Just greedy hill climbing Just greedy hill climbing Stochastic Simulation Stochastic Simulation Sampling algorithm, very poor for optimization Sampling algorithm, very poor for optimization Greedy + Stochastic Simulation [Kask & Dechter, ´99] Greedy + Stochastic Simulation [Kask & Dechter, ´99] Outperforms the above & simulated annealing by orders of magnitude Outperforms the above & simulated annealing by orders of magnitude Guided Local Search (GLS) [Park ´02] Guided Local Search (GLS) [Park ´02] (Iterated Local Search (ILS) [Hutter ´04]) (Iterated Local Search (ILS) [Hutter ´04]) Outperforms Greedy + Stochastic Simulation by orders of magnitude Outperforms Greedy + Stochastic Simulation by orders of magnitude

12 Guided Local Search (GLS) [Voudouris 1997] Subclass of Dynamic Local Search [Hoos & Stützle, 2004]: Iteratively: 1) Local search ! local optimum 2) Modify evaluation function Subclass of Dynamic Local Search [Hoos & Stützle, 2004]: Iteratively: 1) Local search ! local optimum 2) Modify evaluation function In local optima: penalize some solution features In local optima: penalize some solution features Solution features for MPE are partial assigments Solution features for MPE are partial assigments Evaluation fct. = Objective fct. - sum of respective penalties Evaluation fct. = Objective fct. - sum of respective penalties Penalty update rule experimentally designed Penalty update rule experimentally designed Performs very well across many problem classes Performs very well across many problem classes....

13 GLS for MPE [Park 2002] Initialize penalties to 0 Initialize penalties to 0 Evaluation function: Evaluation function: Obj. function - sum of penalties of current instantiation Obj. function - sum of penalties of current instantiation  i=1 m  i [x 1,...,x n ] -  i=1 p i [x 1,...,x n ]  i=1 m  i [x 1,...,x n ] -  i=1 p i [x 1,...,x n ] In local optimum: In local optimum: Choose partial instantiations (according to GLS update rule) Choose partial instantiations (according to GLS update rule) Increment their penalty by 1 Increment their penalty by 1 Every N  local optima Every N  local optima Smooth all penalties by multiplying them with  < 1 Smooth all penalties by multiplying them with  < 1 Important to eventually optimize the original objective function Important to eventually optimize the original objective function

14 Outline Most probable explanation (MPE) problem Most probable explanation (MPE) problem Problem definition Problem definition Previous work Previous work SLS algorithms for MPE SLS algorithms for MPE Illustration Illustration Previous SLS algorithms Previous SLS algorithms Guided Local Search (GLS) in detail Guided Local Search (GLS) in detail From Guided Local Search to GLS + From Guided Local Search to GLS + Modifications Modifications Performance gains Performance gains Comparison to state-of-the-art Comparison to state-of-the-art

15 GLS ! GLS + : Overview of modified components Modified evaluation function Modified evaluation function Pay more attention to the actual objective function Pay more attention to the actual objective function Improved caching of evaluation function Improved caching of evaluation function Straightforward adaption from SAT caching schemes Straightforward adaption from SAT caching schemes Tuning of smoothing parameter  Tuning of smoothing parameter  Over two orders of magnitude improvement ! Over two orders of magnitude improvement ! Initialization with Mini-Buckets instead of random Initialization with Mini-Buckets instead of random Was shown to perform better by [Kask & Dechter, 1999] Was shown to perform better by [Kask & Dechter, 1999]

16 GLS ! GLS + (1) Modified evaluation function GLS GLS  i=1 m  i [x 1,...,x n ] -  i=1 p i [x 1,...,x n ]  i=1 m  i [x 1,...,x n ] -  i=1 p i [x 1,...,x n ] Product of entries minus sum of penalties ¼ zero minus sum of penalties Almost neglecting objective function Product of entries minus sum of penalties ¼ zero minus sum of penalties Almost neglecting objective function GLS + GLS +  i=1 m log(  i [x 1,...,x n ]) -  i=1 p i [x 1,...,x n ]  i=1 m log(  i [x 1,...,x n ]) -  i=1 p i [x 1,...,x n ] Use logarithmic objective function Use logarithmic objective function Very simple, but much better results Very simple, but much better results Penalties are now just new temporary factors that decay over time! Penalties are now just new temporary factors that decay over time! Could be improved by dynamic weighting of the penalties Could be improved by dynamic weighting of the penalties

17 GLS ! GLS + (1) Modified evaluation function Much faster in early stages of the search Much faster in early stages of the search Speedups of about 1 order of magnitude Speedups of about 1 order of magnitude GLS GLS + GLS

18 Time complexity for a single best-improvement step: Time complexity for a single best-improvement step: Previously best caching:  (|V| £ |D V | £  V ) Previously best caching:  (|V| £ |D V | £  V ) Improved caching:  (|V improving | £ |D V |) Improved caching:  (|V improving | £ |D V |) GLS ! GLS + (2) Speedups by caching A A A A

19 GLS ! GLS + (3) Tuning the smoothing factor  [Park ´02] stated GLS to have ``no parameters´´ Changing  from Park`s setting 0.8 to 0.99 Sometimes from unsolvable to milliseconds Effect increases for large instances  1   =      = 0.99  =  = 1

20 GLS ! GLS + (4) Initialization with Mini-Buckets Sometimes a bit worse, sometimes much better Sometimes a bit worse, sometimes much better Particularly helps for some structured instances Particularly helps for some structured instances

21 Outline Most probable explanation (MPE) problem Most probable explanation (MPE) problem Problem definition Problem definition Previous work Previous work SLS algorithms for MPE SLS algorithms for MPE Illustration Illustration Previous SLS algorithms Previous SLS algorithms Guided Local Search (GLS) in detail Guided Local Search (GLS) in detail From Guided Local Search to GLS + From Guided Local Search to GLS + Modifications Modifications Performance gains Performance gains Comparison to state-of-the-art Comparison to state-of-the-art

22 Comparison based on [Marinescu, Kask, Dechter, UAI ´03] Branch & Bound with MB heuristic was state-of-the-art for MPE, even for random instances! Branch & Bound with MB heuristic was state-of-the-art for MPE, even for random instances! Scales better than original GLS with Scales better than original GLS with Number of variables Number of variables Domain size Domain size Both as anytime algorithm and in terms of time needed to find optimum Both as anytime algorithm and in terms of time needed to find optimum On the same problem instances, we show that our new GLS + scales better than their implementation with On the same problem instances, we show that our new GLS + scales better than their implementation with Number of variables Number of variables Domain size Domain size Density Density Induced width Induced width

23 Benchmark instances Randomly generated Bayes nets Randomly generated Bayes nets Graph structure: completely random/grid networks Graph structure: completely random/grid networks Controlled number of variables & domain size Controlled number of variables & domain size Random networks with controlled induced width Random networks with controlled induced width Bayesian networks from Bayes net repository Bayesian networks from Bayes net repository

24 Original GLS vs. B&B with MB heuristic : relative solution quality after 100 seconds for random grid networks of size NxN A A Small Medium Large

25 GLS + vs. GLS and B&B with MB heuristic : relative solution quality after 100 seconds for random grid networks of size NxN Small Medium Large

26 GLS + vs. B&B with MB heuristic : Solution time with increasing domain size on random networks Small Medium Large

27 Solution times with increasing induced width on random networks A d-BBMB s-BBMB Orig GLS GLS +

28 Results for Bayes net repository GLS + shows overall best performance GLS + shows overall best performance Only algorithm to solve Link network (in 1 second!) Only algorithm to solve Link network (in 1 second!) Problems for Barley and especially Diabetes Problems for Barley and especially Diabetes Preprocessing with partial variable elimination helps a lot Preprocessing with partial variable elimination helps a lot Can reduce #(variables) dramatically Can reduce #(variables) dramatically

29 Conclusions SLS algorithms are competitive for MPE solving SLS algorithms are competitive for MPE solving Scale very well, especially with induced width Scale very well, especially with induced width But they need careful design, analysis & parameter tuning But they need careful design, analysis & parameter tuning SLS and Machine Learning (ML) people should talk SLS and Machine Learning (ML) people should talk SLS can perform very well for some traditional ML problems SLS can perform very well for some traditional ML problems Our C source code is online Our C source code is online Please use it Please use it There‘s also a Matlab interface There‘s also a Matlab interface

30 Extensions in progress Real problem domains Real problem domains MRFs for stereo vision MRFs for stereo vision CRFs for sketch recognition CRFs for sketch recognition Domain-dependent extensions Domain-dependent extensions Hierarchical SLS for problems in computer vision Hierarchical SLS for problems in computer vision Automated parameter tuning Automated parameter tuning Use Machine Learning to predict runtime for different settings of algorithm parameters Use Machine Learning to predict runtime for different settings of algorithm parameters Use parameter setting with lowest predicted runtime Use parameter setting with lowest predicted runtime

31 The End Thanks to Thanks to Holger Hoos & Thomas Stützle Holger Hoos & Thomas Stützle Radu Marinescu for their B&B code Radu Marinescu for their B&B code You for your attention You for your attention