Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

What does the world look like? High level image statistics Object Recognition for large-scale search Gigantic Image Collections

Spectrum of Label Information Human annotationsNoisy labels Unlabeled

Semi-Supervised Learning using Graph Laplacian V = data points E = n x n affinity matrix W Graph Laplacian: [Zhu03,Zhou04]

SSL using Graph Laplacian If labeled: If unlabeled: Want to find label function f that minimizes: y = labels, λ = weights Rewrite as: Straightforward solution SmoothnessAgreement with labels

Smooth vectors will be linear combinations of eigenvectors U with small eigenvalues: Eigenvectors of Laplacian [Belkin & Niyogi 06, Schoelkopf & Smola 02, Zhu et al 03, 08]

Rewrite System Let U = smallest k eigenvectors of L, α = coeffs. Optimal is now solution to k x k system:

Computational Bottleneck Consider a dataset of 80 million images Inverting L –Inverting 80 million x 80 million matrix Finding eigenvectors of L –Diagonalizing 80 million x 80 million matrix

Large Scale SSL - Related work Nystrom method: pick small set of landmark points –Compute exact solution on these –Interpolate solution to rest Others iteratively use classifiers to label data –E.g. Boosting-based method of Loeff et al. ICML’08 [see Zhu ‘08 survey] DataLandmarks

Our Approach

Overview of Our Approach DataLandmarks Density Reduce n Limit as n  ∞ NystromOurs

Consider Limit as n  ∞ Consider x to be drawn from 2D distribution p(x) Let L p (F) be a smoothness operator on p(x), for a function F(x): Analyze eigenfunctions of L p (F) where 2

Eigenvectors & Eigenfunctions

Claim: If p is separable, then: Eigenfunctions of marginals are also eigenfunctions of the joint density, with same eigenvalue p(x 1,x 2 ) p(x 1 ) p(x 2 ) Key Assumption: Separability of Input data [Nadler et al. 06,Weiss et al. 08]

Numerical Approximations to Eigenfunctions in 1D 300k points drawn from distribution p(x) Consider p(x 1 ) p(x) Data p(x 1 ) Histogram h(x 1 )

Solve for values g of eigenfunction at set of discrete locations (histogram bin centers) –and associated eigenvalues –B x B system (# histogram bins = 50) P is diag(h(x 1 )) Numerical Approximations to Eigenfunctions in 1D Affinity between discrete locations

1D Approximate Eigenfunctions Solve 1 st Eigenfunction of h(x 1 ) 2 nd Eigenfunction of h(x 1 ) 3 rd Eigenfunction of h(x 1 )

Separability over Dimension Build histogram over dimension 2: h(x 2 ) Now solve for eigenfunctions of h(x 2 ) 1 st Eigenfunction of h(x 2 ) 2 nd Eigenfunction of h(x 2 ) 3 rd Eigenfunction of h(x 2 ) Data

From Eigenfunctions to Approximate Eigenvectors Take each data point Do 1-D interpolation in each eigenfunction  k dimensional vector (for k eigenfunctions) Very fast operation (has to be done nk times) Histogram bin 150 Eigenfunction value

Preprocessing Need to make data separable Rotate using PCA Not separable Separable Rotate

Overall Algorithm 1.Rotate data to maximize separability (currently use PCA) 2.For each dimension: –Construct 1D histogram –Solve numerically for eigenfunctions/values 3.Order eigenfunctions from all dimensions by increasing eigenvalue & take first k 4.Interpolate data into k eigenfunctions –Yields approximate eigenvectors of Normalized Laplacian 5.Solve k x k least squares system to give label function

Experiments on Toy Data

Comparison of Approaches

Nystrom Comparison Too few landmark points results in highly unstable eigenvectors

Nystrom Comparison Eigenfunctions fail when data has significant dependencies between dimensions

Experiments on Real Data

Experiments Images from 126 classes downloaded from Internet search engines, total 63,000 images Dump truck Emu Labels (correct/incorrect) provided by Geoff Hinton, Alex Krizhevsky, Vinod Nair (U. Toronto and CIFAR)

Input Image Representation Pixels not a convenient representation Use Gist descriptor (Oliva & Torralba, 2001) PCA down to 64 dimensions L2 distance btw. Gist vectors rough substitute for human perceptual distance

Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA MI is mutual information score. 0 = Independent

Real 1-D Eigenfunctions of PCA’d Gist descriptors Eigenfunction 1 Eigenfunction 256 Input Dimension Eigenfunction value Color = Input dimension x min x max Histogram bin 150

Protocol Task is to re-rank images of each class Measure precision @ 15% recall Vary # of labeled examples Chance level performance is 33% Total of 63,000 images

80 Million Images

Running on 80 million images PCA to 32 dims, k=48 eigenfunctions Precompute approximate eigenvectors (~20Gb) For each class, labels propagating through 80 million images

Summary Semi-supervised scheme that can scale to really large problems Rather than sub-sampling the data, we take the limit of infinite unlabeled data Assumes input data distribution is separable Can propagate labels in graph with 80 million nodes in fractions of second

Future Work Can potentially use 2D or 3D histograms instead of 1D –Requires more data Consider diagonal eigenfunctions Sharing of labels between classes

Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA MI is mutual information score. 0 = Independent

Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist ICA Joint histogram for pairs of dimensions after ICA MI is mutual information score. 0 = Independent

Overview of Our Approach Existing large-scale SSL methods try to reduce # points We consider what happens as n  ∞ Eigenvectors  Eigenfunctions Assume input distribution is separable Make crude numerical approx. to Eigenfunctions Interpolate data in these approximate eigenfunctions to give approx. eigenvalues

Eigenfunctions Eigenfunction are limit of Eigenvectors as n  ∞ Analytical forms of eigenfunctions exist only in a few cases: Uniform, Gaussian Instead, we calculate numerical approximation to eigenfunctions [Nadler et al. 06,Weiss et al. 08] [Coifman et al. 05, Nadler et al. 06, Belikin & Niyogi 07]

Complexity Comparison Nystrom Select m landmark points Get smallest k eigenvectors of m x m system Interpolate n points into k eigenvectors Solve k x k linear system Eigenfunction Rotate n points Form d 1-D histograms Solve d linear systems, each b x b k 1-D interpolations of n points Solve k x k linear system Key: n = # data points (big, >10 6 ) l = # labeled points (small, <100) m = # landmark points d = # input dims (~100) k = # eigenvectors (~100) b = # histogram bins (~50) Polynomial in # landmarksLinear in # data points

Can’t build accurate high dimensional histograms –Need too many points Currently just use 1-D histograms –2 or 3D ones possible with enough data This assumes distribution is separable –Assume p(x) = p(x 1 ) p(x 2 ) … p(x d ) For separable distributions, eigenfunctions are also separable Key Assumption: Separability of Input data [Nadler et al. 06,Weiss et al. 08]

Varying # Training Examples

Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Similar presentations

Presentation on theme: "Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Similar presentations

Presentation on theme: "Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read."— Presentation transcript:

Similar presentations

About project

Feedback