Geometric methods in image processing, networks, and machine learning

Slides:



Advertisements
Similar presentations
HOPS: Efficient Region Labeling using Higher Order Proxy Neighborhoods Albert Y. C. Chen 1, Jason J. Corso 1, and Le Wang 2 1 Dept. of Computer Science.
Advertisements

Liang Shan Clustering Techniques and Applications to Image Segmentation.
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Partitional Algorithms to Detect Complex Clusters
CMPUT 615 Applications of Machine Learning in Image Analysis
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Bregman Iterative Algorithms for L1 Minimization with
Level set based Image Segmentation Hang Xiao Jan12, 2013.
Machine learning continued Image source:
Modularity and community structure in networks
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Supervised Learning Recap
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
I Images as graphs Fully-connected graph – node for every pixel – link between every pair of pixels, p,q – similarity w ij for each link j w ij c Source:
MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 21: Spectral Clustering
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
Hierarchical Region-Based Segmentation by Ratio-Contour Jun Wang April 28, 2004 Course Project of CSCE 790.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Optical flow and Tracking CISC 649/849 Spring 2009 University of Delaware.
Comp 775: Graph Cuts and Continuous Maximal Flows Marc Niethammer, Stephen Pizer Department of Computer Science University of North Carolina, Chapel Hill.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Department of Engineering Science Department of Zoology Soft partitioning in networks via Bayesian nonnegative matrix factorization Ioannis Psorakis, Steve.
Presenter : Kuang-Jui Hsu Date : 2011/5/23(Tues.).
Andrea Bertozzi University of California, Los Angeles Thanks to Arjuna Flenner, Ekaterina Merkurjev, Tijana Kostic, Huiyi Hu, Allon Percus, Mason Porter,
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided K-means Clustering (text) EM Clustering (paper) Graph Partitioning (text)
7.1. Mean Shift Segmentation Idea of mean shift:
Image Segmentation February 27, Implicit Scheme is considerably better with topological change. Transition from Active Contours: –contour v(t) 
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Random Walk with Restart (RWR) for Image Segmentation
Graph Cuts Marc Niethammer. Segmentation by Graph-Cuts A way to compute solutions to the optimization problems we looked at before. Example: Binary Segmentation.
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
Optimizing Pheromone Modification for Dynamic Ant Algorithms Ryan Ward TJHSST Computer Systems Lab 2006/2007 Testing To test the relative effectiveness.
HEAT TRANSFER FINITE ELEMENT FORMULATION
Gaussian Mixture Models and Expectation-Maximization Algorithm.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
Lecture 2: Statistical learning primer for biologists
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Andrea Bertozzi University of California, Los Angeles.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Some links between min-cuts, optimal spanning forests and watersheds Cédric Allène, Jean-Yves Audibert, Michel Couprie, Jean Cousty & Renaud Keriven ENPC.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Classification with Perceptrons Reading:
Jianping Fan Dept of CS UNC-Charlotte
Learning with information of features
Grouping.
3.3 Network-Centric Community Detection
“Traditional” image segmentation
Presentation transcript:

Geometric methods in image processing, networks, and machine learning Andrea Bertozzi University of California, Los Angeles

Diffuse interface methods Total variation Ginzburg-Landau functional W is a double well potential with two minima Total variation measures length of boundary between two constant regions. GL energy is a diffuse interface approximation of TV for binary functionals

Diffuse interface Equations and their sharp interface limit Allen-Cahn equation – L2 gradient flow of GL functional Approximates motion by mean curvaure - useful for image segmentation and image deblurring. Cahn-Hilliard equation – H-1 gradient flow of GL functional Approximates Mullins-Sekerka problem (nonlocal): Pego; Alikakos, Bates, and Chen. Conserves the mean of u. Used in image inpainting – fourth order allows for two boundary conditions to be satisfied for inpainting.

My First introduction to wavelets Impromptu tutorial by Ingrid Daubechies over lunch in the cafeteria at Bell Labs Murray Hill c. 1987-8 when I was a PhD student in their GRPW program. summertime Fall, winter and spring

Roughly 20 years later….. Then PhD student Julia Dobrosotskaya asked me if she could work with me on a thesis that combines wavelets and “UCLA” style algorithms. Result was the wavelet Ginzburg-Laundau functional to connect L1 compresive sensing with L2-based wavelet constructions. IEEE Trans Image Proc. 2008, Interfaces and Free Boundaries 2011, SIAM J. Image Proc. 2013. This work was the initial inspiration for our new work on nonlocal graph based methods. inpainting Bar code deconvolution

Weighted graphs for “big data” In a typical application we have data supported on the graph, possibly high dimensional. The above weights represent comparison of the data. Examples include: voting records of US Congress – each person has a vote vector associated with them. Nonlocal means image processing – each pixel has a pixel neighborhood that can be compared with nearby and far away pixels.

Graph Cuts and Total Variation Mimal cut Maximum cut Total Variation of function f defined on nodes of a weighted graph: Min cut problems can be reformulated as a total variation minimization problem for binary/multivalued functions defined on the nodes of the graph.

Diffuse interface methods on graphs Bertozzi and Flenner MMS 2012.

Convergence of graph GL functional van Gennip and ALB Adv. Diff. Eq. 2012

An MBO scheme on graphs for segmentation and image processing E. Merkurjev, T. Kostic and A.L. Bertozzi, to appear SIAM J Imaging Sci 2013. Instead of minimizating the GL functional Apply MBO scheme involving a simple algorithm alternating the heat equation with thresholding. MBO stands for Merriman Bence and Osher who invented this scheme for differential operators a couple of decades ago…..

Two-Step Minimization Procedure based on classical MBO scheme for motion by mean curvature (now on graphs) 1) propagation by graph heat equation + forcing term 2) thresholding Simple! And often converges in just a few iterations (e.g. 4 for MNIST dataset)

Algorithm I) Create a graph from the data, choose a weight function and then create the symmetric graph Laplacian. II) Calculate the eigenvectors and eigenvalues of the symmetric graph Laplacian. It is only necessary to calculate a portion of the eigenvectors*. III) Initialize u. IV) Iterate the two-step scheme described above until a stopping criterion is satisfied. *Fast linear algebra routines are necessary – either Raleigh-Chebyshev procedure or Nystrom extension.

Two Moons Segmentation Second eigenvector segmentation Our method’s segmentation

Image segementation Original image 1 Original image 2 Handlabeled grass region Grass label transferred

Image Segmentation Handlabeled cow region Cow label transferred Handlabeled sky region Sky label transferred

Bertozzi-Flenner vs MBO on graphs BF Graph MBO Graph MBO BF

Examples on image inpainting Original image Damaged image Local TV inpainting Nonlocal TV inpainting Our method’s result

Sparse Reconstruction Original image Damaged image Local TV inpainting Nonlocal TV inpainting Our method’s result

Performance NLTV vs MBO on Graphs

Convergence and Energy Landscape for Cheeger Cut Clustering Bresson, Laurent, Uminsky, von Brecht (current and former postdocs of our group), NIPS 2012 Relaxed continuous Cheeger cut problem (unsupervised) Ratio of TV term to balance term. Prove convergence of two algorithms based on CS ideas Provides a rigorous connection between graph TV and cut problems.

Generalization MULTICLASS Machine Learning Problems (MBO) Garcia, Merkurjev, Bertozzi, Percus, Flenner, 2013 Semi-supervised learning Instead of double well we have N-class well with Minima on a simplex in N-dimensions

Multiclass examples – semi-supervised Three moons MBO Scheme 98.5% correct. 5% ground truth used for fidelity. Greyscale image 4% random points for fidelity, perfect classification.

MNIST Database Comparisons Semi-supervised learning Vs Supervised learning We do semi-supervised with only 3.6% of the digits as the Known data. Supervised uses 60000 digits for training and tests on 10000 digits.

Timing comparisons

Performance on Coil WebKB

Community Detection – modularity Optimization Joint work with Huiyi Hu, Thomas Laurent, and Mason Porter Newman, Girvan, Phys. Rev. E 2004. [wij] is graph adjacency matrix P is probability nullmodel (Newman-Girvan) Pij=kikj/2m ki = sumj wij (strength of the node) Gamma is the resolution parameter gi is group assignment 2m is total volume of the graph = sumi ki = sumij wij This is an optimization (max) problem. Combinatorially complex – optimize over all possible group assignments. Very expensive computationally.

Bipartition of a graph Given a subset A of nodes on the graph define Vol(A) = sum i in A ki Then maximizing Q is equivalent to minimizing Given a binary function on the graph f taking values +1, -1 define A to be the set where f=1, we can define:

Equivalence to L1 compressive sensing Thus modularity optimization restricted to two groups is equivalent to This generalizes to n class optimization quite naturally Because the TV minimization problem involves functions with values on the simplex we can directly use the MBO scheme to solve this problem.

Modularity optimization moons and clouds

LFR Benchmark – synthetic benchmark graphs Lancichinetti, Fortunato, and Radicchi Phys Rev. E 78(4) 2008. Each mode is assigned a degree from a powerlaw distribution with power x. Maximum degree is kmax and mean degree by <k>. Community sizes follow a powerlaw distribution with power beta subject to a constraint that the sum of of the community sizes equals the number of nodes N. Each node shares a fraction 1-m of edges with nodes in its own community and a fraction m with nodes in other communities (mixing parameter). Min and max community sizes are also specified.

Normalized Mutual information Similarity measure for comparing two partitions based on information entropy. NMI = 1 when two partitions are identical and is expected to be zero when they are independent. For an N-node network with two partitions

LFR1K(1000,20,50,2,1,mu,10,50)

LFR1K(1000,20,50,2,1,mu,10,50)

LFR50k Similar scaling to LFR1K 50,000 nodes Approximately 2000 communities Run times for LFR1K and 50K

MNIST 4-9 digit segmentation 13782 handwritten digits. Graph created based on similarity score between each digit. Weighted graph with 194816 connections. Modularity MBO performs comparably to Genlouvain but in about a tenth the run time. Advantage of MBO based scheme will be for very large datasets with moderate numbers of clusters.

4-9 MNIST Segmentation

Conclusions and future work (new preprint) Yves van Gennip, Nestor Guillen, Braxton Osting, and Andrea L. Bertozzi, Mean curvature, threshold dynamics, and phase field theory on finite graphs, 2013. Diffuse interface formulation provides competitive algorithms for machine learning applications including nonlocal means imaging Extends PDE-based methods to a graphical framework Future work includes community detection algorithms (very computationally expensive) Speedup includes fast spectral methods and the use of a small subset of eigenfunctions rather than the complete basis Competitive or faster than split-Bregman methods and other L1-TV based methods

Cluster group at ICERM spring 2014 People working on the boundary between compressive sensing methods and graph/machine learning problems February 2014 (month long working group) Workshop to be organized Looking for more core participants