Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school

Overview discrete labeling problems (MAP inference) (structured) sparse variable selection finding informative / influential subsets Recurrent questions: how model prior knowledge / assumptions? structure efficient optimization? Recurrent themes: convexity submodularity polyhedra

Intuition: min vs max

Sensing Place sensors to monitor temperature

Sensing Y s : temperature at location s X s : sensor value at location s X s = Y s + noise x1x1 x2x2 x3x3 x6x6 x5x5 x4x4 y1y1 y4y4 y3y3 y6y6 y5y5 y2y2 Where to measure to maximize information about y? monotone submodular function!

Maximizing influence

Maximizing diffusion each node monotone submodular activation function and random threshold activated if active neighbors Theorem (Mossel & Roch 07) is submodular. # active after n steps

Diversity priors “spread out”

Determinantal point processes normalized similarity matrix sample Y: repulsion is submodular (not monotone)

Diversity priors (Kulesza & Taskar 10)

Summarization (Lin & Bilmes 11) RelevanceDiversity

assume generic case – bi-directional greedy (BFNS12) – local search (FMV07) monotone function (constrained) – greedy (NWF78) – relaxation (CCPV11) exact methods (NW81,GSTT99,KNTB09) NP hard

Monotone maximization greedy algorithm:

Monotone maximization Theorem (NWF78) sensor placement information gain optimal greedy empirically: speedup in practice: “lazy greedy” (Minoux, 78)

More complex costraints Ground set Configuration: Sensing quality model k Configuration is feasible if no camera points in two directions at once

Matroids 17 S is independent if … … |S| ≤ k Uniform matroid … S contains at most one element from each square Partition matroid … S contains no cycles Graphic matroid S independent  T S also independent Exchange property: S, U independent, |S| > |U|  some can be added to U: independent All maximal independent sets have the same size

Matroids 18 S is independent if … … |S| ≤ k Uniform matroid … S contains at most one element from each group Partition matroid … S contains no cycles Graphic matroid S independent  T S also independent Exchange property: S, U independent, |S| > |U|  some can be added to U: independent All maximal independent sets have the same size

More complex costraints Ground set Configuration: Sensing quality model k Configuration is feasible if no camera points in two directions at once Partition matroid independence if

Maximization over matroids greedy algorithm:

Maximization over matroids Theorem (FNW78) better: relaxation (continuous greedy) approximation factor (CCPV11)

concave in certain directions approximate by sampling Multilinear relaxation vs. Lovász ext. convex computable in O(n log n)

assume generic case – bi-directional greedy (BFNS12) – local search (FMV07) monotone function (constrained) – greedy (NWF78) – relaxation (CCPV11) exact methods (NW81,GSTT99,KNTB09) NP hard

Non-monotone maximization AB a b c d e f a

AB a c d e f a c

Theorem (BFNS12)

Summary submodular maximization NP-hard – ½ approximation constrained maximization NP-hard, mostly constant approximation factors submodular minimization exploit convexity – poly-time constrained minimization? special cases poly-time; many cases polynomial lower bounds

Constraints 28 cutmatchingpathspanning tree ground set: edges in a graph minimum…

Recall: MAP and cuts 29 pairwise random field: What’s the problem? minimum cut: prefer short cut = short object boundary aim reality

MAP and cuts 30 Minimum cut minimize sum of edge weights implicit criterion: short cut = short boundary minimize submodular function of edges new criterion: boundary may be long if the boundary is homogeneous Minimum cooperative cut not a sum of edge weights!

Reward co-occurrence of edges 31 submodular cost function: use few groups S i of edges sum of weights: use few edges 7 edges, 4 types 25 edges, 1 type

Results Graph cutCooperative cut 32

Constrained optimization 33 cut matchingpath spanning tree convex relaxation minimize surrogate function (Goel et al.`09, Iwata & Nagano `09, Goemans et al. `09, Jegelka & Bilmes `11, Iyer et al. `13, Kohli et al `13...) approximate optimization approximation bounds dependent on F: polynomial – constant – FPTAS

Efficient constrained optimization 34 (JB11, IJB13) 2. Solve easy sum-of-weights problem: and repeat. minimize a series of surrogate functions 1. compute linear upper bound efficient only need to solve sum-of-weights problems

Does it work? 35 Goemans et al 2009 majorize-minimize 1 iteration optimal solution empirical results much better than theoretical worst-case bounds!?

Does it work? 36 approximate solutionoptimal solution (Kohli, Osokin, Jegelka 2013) (Jegelka & Bilmes 2011) minimum cut solution

Theory and practice 37 vs. worst-case Lower bound trees, matchings cuts approximation learning bounds from (Goel et al.‘09, Iwata & Nagano‘09, Jegelka & Bilmes‘11, Goemans et al‘09, Svitkina& Fleischer‘08, Balcan & Harvey’12) Good approximations in practice …. BUT not in theory? theory says: no good approximations possible (in general) What makes some (practical) problems easier than others?

Curvature 38 Theorems (IJB 2013). Tightened upper & lower bounds for constrained minimization, approximation, learning: size of set for submodular max: (Conforti & Cornuéjols`84, Vondrák`08) marginal cost single-item cost small large worst case opt cost

Curvature and approximations 39 smaller is better

If there was more time… Learning submodular functions Adaptive submodular maximization Online learning/optimization Distributed algorithms Many more applications… worst case vs. average practical case pointers and references: http://www.cs.berkeley.edu/~stefje/madalgo/literature_list.pdf http://www.cs.berkeley.edu/~stefje/madalgo/literature_list.pdf slides: http://www.cs.berkeley.edu/~stefje/madalgo/http://www.cs.berkeley.edu/~stefje/madalgo/

Summary discrete labeling problems (MAP inference) (structured) sparse variable selection finding informative / influential subsets Recurrent questions: how model prior knowledge / assumptions? structure efficient optimization? Recurrent themes: convexity submodularity polyhedra

Submodularity and machine learning 42 bla blablala oh bla bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla dw bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla bl abl lba bla gggg hgt dfg uyg efefm o sd djfkefbjal odh wdbfeowhjkd fenjk jj ef owskf wu distributions over labels, sets often: tractability – submodularity e.g. “attractive” graphical models, determinantal point processes distributions over labels, sets often: tractability – submodularity e.g. “attractive” graphical models, determinantal point processes (convex) regularization submodularity: “discrete convexity” e.g. combinatorial sparse estimation (convex) regularization submodularity: “discrete convexity” e.g. combinatorial sparse estimation submodularit y behind a lot of machine learning! submodularit y behind a lot of machine learning!

Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Similar presentations

Presentation on theme: "Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Similar presentations

Presentation on theme: "Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school."— Presentation transcript:

Similar presentations

About project

Feedback