Presentation on theme: "Geometric methods in image processing, networks, and machine learning"— Presentation transcript:
1Geometric methods in image processing, networks, and machine learning Andrea BertozziUniversity of California, Los Angeles
2Diffuse interface methods Total variationGinzburg-Landau functionalW is a double well potential with two minimaTotal variation measures length of boundary between two constant regions.GL energy is a diffuse interface approximation of TV for binary functionals
3Diffuse interface Equations and their sharp interface limit Allen-Cahn equation – L2 gradient flow of GL functionalApproximates motion by mean curvaure - useful for image segmentation and image deblurring.Cahn-Hilliard equation – H-1 gradient flow of GL functionalApproximates Mullins-Sekerka problem (nonlocal): Pego; Alikakos, Bates, and Chen. Conserves the mean of u.Used in image inpainting – fourth order allows for two boundary conditions to besatisfied for inpainting.
4My First introduction to wavelets Impromptu tutorial by Ingrid Daubechies over lunch in the cafeteria at Bell Labs Murray Hill c when I was a PhD student in their GRPW program.summertimeFall, winter and spring
5Roughly 20 years later…..Then PhD student Julia Dobrosotskaya asked me if she could work with me on a thesis that combines wavelets and “UCLA” style algorithms.Result was the wavelet Ginzburg-Laundau functional to connect L1 compresive sensing with L2-based wavelet constructions.IEEE Trans Image Proc. 2008, Interfaces and Free Boundaries 2011, SIAM J. Image ProcThis work was the initial inspiration for our new work on nonlocal graph based methods.inpaintingBar code deconvolution
6Weighted graphs for “big data” In a typical application we have data supported on the graph, possibly high dimensional. The above weights represent comparison of the data.Examples include:voting records of US Congress – each person has a vote vector associated with them.Nonlocal means image processing – each pixel has a pixel neighborhood that can be compared with nearby and far away pixels.
7Graph Cuts and Total Variation Mimal cutMaximum cutTotal Variation of function f defined on nodes of a weighted graph:Min cut problems can be reformulated as a total variation minimization problemfor binary/multivalued functions defined on the nodes of the graph.
8Diffuse interface methods on graphs Bertozzi and Flenner MMS 2012.
9Convergence of graph GL functional van Gennip and ALB Adv. Diff. Eq. 2012
10An MBO scheme on graphs for segmentation and image processing E. Merkurjev, T. Kostic and A.L. Bertozzi, to appear SIAM J Imaging Sci 2013.Instead of minimizating the GL functionalApply MBO scheme involving a simple algorithm alternating the heat equation with thresholding.MBO stands for Merriman Bence and Osher who invented this scheme for differential operators a couple of decades ago…..
11Two-Step Minimization Procedure based on classical MBO scheme for motion by mean curvature (now on graphs)1) propagation by graph heat equation + forcing term2) thresholdingSimple! And often converges in just a few iterations (e.g. 4 for MNIST dataset)
12AlgorithmI) Create a graph from the data, choose a weight function and then create the symmetric graph Laplacian.II) Calculate the eigenvectors and eigenvalues of the symmetric graph Laplacian. It is only necessary to calculate a portion of the eigenvectors*.III) Initialize u.IV) Iterate the two-step scheme described above until a stopping criterion is satisfied.*Fast linear algebra routines are necessary – either Raleigh-Chebyshev procedure or Nystrom extension.
13Two Moons Segmentation Second eigenvector segmentationOur method’s segmentation
14Image segementation Original image 1 Original image 2 Handlabeled grass regionGrass label transferred
20Convergence and Energy Landscape for Cheeger Cut Clustering Bresson, Laurent, Uminsky, von Brecht (current and former postdocs of our group), NIPS 2012Relaxed continuous Cheeger cut problem (unsupervised)Ratio of TV term to balance term.Prove convergence of two algorithms based on CS ideasProvides a rigorous connection between graph TV and cut problems.
21Generalization MULTICLASS Machine Learning Problems (MBO) Garcia, Merkurjev, Bertozzi, Percus, Flenner, 2013Semi-supervised learningInstead of double well we have N-class well withMinima on a simplex in N-dimensions
22Multiclass examples – semi-supervised Three moons MBO Scheme 98.5% correct.5% ground truth used for fidelity.Greyscale image 4% random points for fidelity, perfect classification.
23MNIST Database Comparisons Semi-supervised learning Vs Supervised learningWe do semi-supervised withonly 3.6% of the digits as theKnown data.Supervised uses digits for training and tests on digits.
26Community Detection – modularity Optimization Joint work with Huiyi Hu, Thomas Laurent, and Mason PorterNewman, Girvan, Phys. Rev. E 2004.[wij] is graph adjacency matrixP is probability nullmodel (Newman-Girvan) Pij=kikj/2mki = sumj wij (strength of the node)Gamma is the resolution parametergi is group assignment2m is total volume of the graph = sumi ki = sumij wijThis is an optimization (max) problem. Combinatorially complex – optimize over all possible group assignments. Very expensive computationally.
27Bipartition of a graph Given a subset A of nodes on the graph define Vol(A) = sum i in A kiThen maximizing Q is equivalent to minimizingGiven a binary function on the graph f taking values +1, -1 define A to be the set where f=1, we can define:
28Equivalence to L1 compressive sensing Thus modularity optimization restricted to twogroups is equivalent toThis generalizes to n class optimization quite naturallyBecause the TV minimization problem involves functions with values on the simplex we can directly use the MBO scheme to solve this problem.
30LFR Benchmark – synthetic benchmark graphs Lancichinetti, Fortunato, and Radicchi Phys Rev. E 78(4) 2008.Each mode is assigned a degree from a powerlaw distribution with power x.Maximum degree is kmax and mean degree by <k>. Community sizes follow a powerlaw distribution with power beta subject to a constraint that the sum of of the community sizes equals the number of nodes N. Each node shares a fraction 1-m of edges with nodes in its own community and a fraction m with nodes in other communities (mixing parameter). Min and max community sizes are also specified.
31Normalized Mutual information Similarity measure for comparing two partitions based on information entropy.NMI = 1 when two partitions are identical and is expected to be zero when they are independent.For an N-node network with two partitions
34LFR50k Similar scaling to LFR1K 50,000 nodes Approximately 2000 communitiesRun times for LFR1K and 50K
35MNIST 4-9 digit segmentation 13782 handwritten digits. Graph created based on similarity scorebetween each digit. Weighted graph with connections.Modularity MBO performs comparably to Genlouvain but in about atenth the run time. Advantage of MBO based scheme will be forvery large datasets with moderate numbers of clusters.
37Conclusions and future work (new preprint) Yves van Gennip, Nestor Guillen, Braxton Osting, and Andrea L. Bertozzi, Mean curvature, threshold dynamics, and phase field theory on finite graphs, 2013.Diffuse interface formulation provides competitive algorithms for machine learning applications including nonlocal means imagingExtends PDE-based methods to a graphical frameworkFuture work includes community detection algorithms (very computationallyexpensive)Speedup includes fast spectral methods and the use of a small subset ofeigenfunctions rather than the complete basisCompetitive or faster than split-Bregman methods and other L1-TV basedmethods
38Cluster group at ICERM spring 2014 People working on the boundary between compressive sensing methods and graph/machine learning problemsFebruary 2014 (month long working group)Workshop to be organizedLooking for more core participants