Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Slides:

Advertisements

Similar presentations

Beyond Convexity – Submodularity in Machine Learning

Advertisements

1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.

Submodular Set Function Maximization A Mini-Survey Chandra Chekuri Univ. of Illinois, Urbana-Champaign.

Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.

Submodular Set Function Maximization via the Multilinear Relaxation & Dependent Rounding Chandra Chekuri Univ. of Illinois, Urbana-Champaign.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Daniel Golovin and Andreas Krause.

Parallel Double Greedy Submodular Maxmization Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael I. Jordan.

Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.

S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.

Dependent Randomized Rounding in Matroid Polytopes (& Related Results) Chandra Chekuri Jan VondrakRico Zenklusen Univ. of Illinois IBM ResearchMIT.

Variational Inference in Bayesian Submodular Models

Learning with Inference for Discrete Graphical Models Nikos Komodakis Pawan Kumar Nikos Paragios Ramin Zabih (presenter)

Visual Recognition Tutorial

Introduction to Approximation Algorithms Lecture 12: Mar 1.

Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.

Learning Submodular Functions Nick Harvey University of Waterloo Joint work with Nina Balcan, Georgia Tech.

Semidefinite Programming

1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.

1 -1 Chapter 1 Introduction Why Do We Need to Study Algorithms? To learn strategies to design efficient algorithms. To understand the difficulty.

Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.

Pushkar Tripathi Georgia Institute of Technology Approximability of Combinatorial Optimization Problems with Submodular Cost Functions Based on joint work.

Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.

2010/5/171 Overview of graph cuts. 2010/5/172 Outline Introduction S-t Graph cuts Extension to multi-label problems Compare simulated annealing and alpha-

An Algebraic Algorithm for Weighted Linear Matroid Intersection

Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.

Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.

Submodularity in Machine Learning

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.

CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Minimizing general submodular functions

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

Great Theoretical Ideas in Computer Science.

RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.

Approximation Algorithms

Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.

Fast and accurate energy minimization for static or time-varying Markov Random Fields (MRFs) Nikos Komodakis (Ecole Centrale Paris) Nikos Paragios (Ecole.

Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.

Probabilistic Inference Lecture 5 M. Pawan Kumar Slides available online

Data Structures & Algorithms Graphs

Online Social Networks and Media

5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!

A Membrane Algorithm for the Min Storage problem Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca WMC.

Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet Khli Microsoft Research Cambridge Presented.

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

Submodular Maximization with Cardinality Constraints Moran Feldman Based On Submodular Maximization with Cardinality Constraints. Niv Buchbinder, Moran.

New algorithms for Disjoint Paths and Routing Problems

A Unified Continuous Greedy Algorithm for Submodular Maximization Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.

Vasilis Syrgkanis Cornell University

Deterministic Algorithms for Submodular Maximization Problems Moran Feldman The Open University of Israel Joint work with Niv Buchbinder.

Aspects of Submodular Maximization Subject to a Matroid Constraint Moran Feldman Based on A Unified Continuous Greedy Algorithm for Submodular Maximization.

Maximizing Symmetric Submodular Functions Moran Feldman EPFL.

© The McGraw-Hill Companies, Inc., Chapter 1 Introduction.

MAP Estimation in Binary MRFs using Bipartite Multi-Cuts Sashank J. Reddi Sunita Sarawagi Sundar Vishwanathan Indian Institute of Technology, Bombay TexPoint.

Markov Random Fields in Vision

Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.

Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.

Submodularity Reading Group Matroids, Submodular Functions M. Pawan Kumar

Inferring Networks of Diffusion and Influence

Monitoring rivers and lakes [IJCAI ‘07]

Probability Theory and Parameter Estimation I

Optimization problems such as

Vitaly Feldman and Jan Vondrâk IBM Research - Almaden

Distributed Submodular Maximization in Massive Datasets

Coverage Approximation Algorithms

Submodular Maximization Through the Lens of the Multilinear Relaxation

Guess Free Maximization of Submodular and Linear Sums

Presentation transcript:

Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school

Overview discrete labeling problems (MAP inference) (structured) sparse variable selection finding informative / influential subsets Recurrent questions: how model prior knowledge / assumptions? structure efficient optimization? Recurrent themes: convexity submodularity polyhedra

Intuition: min vs max

Sensing Place sensors to monitor temperature

Sensing Y s : temperature at location s X s : sensor value at location s X s = Y s + noise x1x1 x2x2 x3x3 x6x6 x5x5 x4x4 y1y1 y4y4 y3y3 y6y6 y5y5 y2y2 Where to measure to maximize information about y? monotone submodular function!

Maximizing influence

Maximizing diffusion each node monotone submodular activation function and random threshold activated if active neighbors Theorem (Mossel & Roch 07) is submodular. # active after n steps

Diversity priors “spread out”

Determinantal point processes normalized similarity matrix sample Y: repulsion is submodular (not monotone)

Diversity priors (Kulesza & Taskar 10)

Summarization (Lin & Bilmes 11) RelevanceDiversity

assume generic case – bi-directional greedy (BFNS12) – local search (FMV07) monotone function (constrained) – greedy (NWF78) – relaxation (CCPV11) exact methods (NW81,GSTT99,KNTB09) NP hard

Monotone maximization greedy algorithm:

Monotone maximization Theorem (NWF78) sensor placement information gain optimal greedy empirically: speedup in practice: “lazy greedy” (Minoux, 78)

More complex costraints Ground set Configuration: Sensing quality model k Configuration is feasible if no camera points in two directions at once

Matroids 17 S is independent if … … |S| ≤ k Uniform matroid … S contains at most one element from each square Partition matroid … S contains no cycles Graphic matroid S independent  T S also independent Exchange property: S, U independent, |S| > |U|  some can be added to U: independent All maximal independent sets have the same size

Matroids 18 S is independent if … … |S| ≤ k Uniform matroid … S contains at most one element from each group Partition matroid … S contains no cycles Graphic matroid S independent  T S also independent Exchange property: S, U independent, |S| > |U|  some can be added to U: independent All maximal independent sets have the same size

More complex costraints Ground set Configuration: Sensing quality model k Configuration is feasible if no camera points in two directions at once Partition matroid independence if

Maximization over matroids greedy algorithm:

Maximization over matroids Theorem (FNW78) better: relaxation (continuous greedy) approximation factor (CCPV11)

concave in certain directions approximate by sampling Multilinear relaxation vs. Lovász ext. convex computable in O(n log n)

assume generic case – bi-directional greedy (BFNS12) – local search (FMV07) monotone function (constrained) – greedy (NWF78) – relaxation (CCPV11) exact methods (NW81,GSTT99,KNTB09) NP hard

Non-monotone maximization AB a b c d e f a

AB a c d e f a c

Theorem (BFNS12)

Summary submodular maximization NP-hard – ½ approximation constrained maximization NP-hard, mostly constant approximation factors submodular minimization exploit convexity – poly-time constrained minimization? special cases poly-time; many cases polynomial lower bounds

Constraints 28 cutmatchingpathspanning tree ground set: edges in a graph minimum…

Recall: MAP and cuts 29 pairwise random field: What’s the problem? minimum cut: prefer short cut = short object boundary aim reality

MAP and cuts 30 Minimum cut minimize sum of edge weights implicit criterion: short cut = short boundary minimize submodular function of edges new criterion: boundary may be long if the boundary is homogeneous Minimum cooperative cut not a sum of edge weights!

Reward co-occurrence of edges 31 submodular cost function: use few groups S i of edges sum of weights: use few edges 7 edges, 4 types 25 edges, 1 type

Results Graph cutCooperative cut 32

Constrained optimization 33 cut matchingpath spanning tree convex relaxation minimize surrogate function (Goel et al.`09, Iwata & Nagano `09, Goemans et al. `09, Jegelka & Bilmes `11, Iyer et al. `13, Kohli et al `13...) approximate optimization approximation bounds dependent on F: polynomial – constant – FPTAS

Efficient constrained optimization 34 (JB11, IJB13) 2. Solve easy sum-of-weights problem: and repeat. minimize a series of surrogate functions 1. compute linear upper bound efficient only need to solve sum-of-weights problems

Does it work? 35 Goemans et al 2009 majorize-minimize 1 iteration optimal solution empirical results much better than theoretical worst-case bounds!?

Does it work? 36 approximate solutionoptimal solution (Kohli, Osokin, Jegelka 2013) (Jegelka & Bilmes 2011) minimum cut solution

Theory and practice 37 vs. worst-case Lower bound trees, matchings cuts approximation learning bounds from (Goel et al.‘09, Iwata & Nagano‘09, Jegelka & Bilmes‘11, Goemans et al‘09, Svitkina& Fleischer‘08, Balcan & Harvey’12) Good approximations in practice …. BUT not in theory? theory says: no good approximations possible (in general) What makes some (practical) problems easier than others?

Curvature 38 Theorems (IJB 2013). Tightened upper & lower bounds for constrained minimization, approximation, learning: size of set for submodular max: (Conforti & Cornuéjols`84, Vondrák`08) marginal cost single-item cost small large worst case opt cost

Curvature and approximations 39 smaller is better

If there was more time… Learning submodular functions Adaptive submodular maximization Online learning/optimization Distributed algorithms Many more applications… worst case vs. average practical case pointers and references: slides:

Summary discrete labeling problems (MAP inference) (structured) sparse variable selection finding informative / influential subsets Recurrent questions: how model prior knowledge / assumptions? structure efficient optimization? Recurrent themes: convexity submodularity polyhedra

Submodularity and machine learning 42 bla blablala oh bla bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla dw bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla bl abl lba bla gggg hgt dfg uyg efefm o sd djfkefbjal odh wdbfeowhjkd fenjk jj ef owskf wu distributions over labels, sets often: tractability – submodularity e.g. “attractive” graphical models, determinantal point processes distributions over labels, sets often: tractability – submodularity e.g. “attractive” graphical models, determinantal point processes (convex) regularization submodularity: “discrete convexity” e.g. combinatorial sparse estimation (convex) regularization submodularity: “discrete convexity” e.g. combinatorial sparse estimation submodularit y behind a lot of machine learning! submodularit y behind a lot of machine learning!