Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Similar presentations


Presentation on theme: "Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read."— Presentation transcript:

1 Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

2 What does the world look like? High level image statistics Object Recognition for large-scale search Gigantic Image Collections

3 Spectrum of Label Information Human annotationsNoisy labels Unlabeled

4 Semi-Supervised Learning using Graph Laplacian V = data points E = n x n affinity matrix W Graph Laplacian: [Zhu03,Zhou04]

5 SSL using Graph Laplacian If labeled: If unlabeled: Want to find label function f that minimizes: y = labels, λ = weights Rewrite as: Straightforward solution SmoothnessAgreement with labels

6 Smooth vectors will be linear combinations of eigenvectors U with small eigenvalues: Eigenvectors of Laplacian [Belkin & Niyogi 06, Schoelkopf & Smola 02, Zhu et al 03, 08]

7 Rewrite System Let U = smallest k eigenvectors of L, α = coeffs. Optimal is now solution to k x k system:

8 Computational Bottleneck Consider a dataset of 80 million images Inverting L –Inverting 80 million x 80 million matrix Finding eigenvectors of L –Diagonalizing 80 million x 80 million matrix

9 Large Scale SSL - Related work Nystrom method: pick small set of landmark points –Compute exact solution on these –Interpolate solution to rest Others iteratively use classifiers to label data –E.g. Boosting-based method of Loeff et al. ICML’08 [see Zhu ‘08 survey] DataLandmarks

10 Our Approach

11 Overview of Our Approach DataLandmarks Density Reduce n Limit as n  ∞ NystromOurs

12 Consider Limit as n  ∞ Consider x to be drawn from 2D distribution p(x) Let L p (F) be a smoothness operator on p(x), for a function F(x): Analyze eigenfunctions of L p (F) where 2

13 Eigenvectors & Eigenfunctions

14 Claim: If p is separable, then: Eigenfunctions of marginals are also eigenfunctions of the joint density, with same eigenvalue p(x 1,x 2 ) p(x 1 ) p(x 2 ) Key Assumption: Separability of Input data [Nadler et al. 06,Weiss et al. 08]

15 Numerical Approximations to Eigenfunctions in 1D 300k points drawn from distribution p(x) Consider p(x 1 ) p(x) Data p(x 1 ) Histogram h(x 1 )

16 Solve for values g of eigenfunction at set of discrete locations (histogram bin centers) –and associated eigenvalues –B x B system (# histogram bins = 50) P is diag(h(x 1 )) Numerical Approximations to Eigenfunctions in 1D Affinity between discrete locations

17 1D Approximate Eigenfunctions Solve 1 st Eigenfunction of h(x 1 ) 2 nd Eigenfunction of h(x 1 ) 3 rd Eigenfunction of h(x 1 )

18 Separability over Dimension Build histogram over dimension 2: h(x 2 ) Now solve for eigenfunctions of h(x 2 ) 1 st Eigenfunction of h(x 2 ) 2 nd Eigenfunction of h(x 2 ) 3 rd Eigenfunction of h(x 2 ) Data

19 From Eigenfunctions to Approximate Eigenvectors Take each data point Do 1-D interpolation in each eigenfunction  k dimensional vector (for k eigenfunctions) Very fast operation (has to be done nk times) Histogram bin 150 Eigenfunction value

20 Preprocessing Need to make data separable Rotate using PCA Not separable Separable Rotate

21 Overall Algorithm 1.Rotate data to maximize separability (currently use PCA) 2.For each dimension: –Construct 1D histogram –Solve numerically for eigenfunctions/values 3.Order eigenfunctions from all dimensions by increasing eigenvalue & take first k 4.Interpolate data into k eigenfunctions –Yields approximate eigenvectors of Normalized Laplacian 5.Solve k x k least squares system to give label function

22 Experiments on Toy Data

23 Comparison of Approaches

24 Data

25 Nystrom Comparison Too few landmark points results in highly unstable eigenvectors

26 Nystrom Comparison Eigenfunctions fail when data has significant dependencies between dimensions

27 Experiments on Real Data

28 Experiments Images from 126 classes downloaded from Internet search engines, total 63,000 images Dump truck Emu Labels (correct/incorrect) provided by Geoff Hinton, Alex Krizhevsky, Vinod Nair (U. Toronto and CIFAR)

29 Input Image Representation Pixels not a convenient representation Use Gist descriptor (Oliva & Torralba, 2001) PCA down to 64 dimensions L2 distance btw. Gist vectors rough substitute for human perceptual distance

30 Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA MI is mutual information score. 0 = Independent

31 Real 1-D Eigenfunctions of PCA’d Gist descriptors Eigenfunction 1 Eigenfunction 256 Input Dimension Eigenfunction value Color = Input dimension x min x max Histogram bin 150

32 Protocol Task is to re-rank images of each class Measure precision @ 15% recall Vary # of labeled examples Chance level performance is 33% Total of 63,000 images

33

34

35

36

37 80 Million Images

38 Running on 80 million images PCA to 32 dims, k=48 eigenfunctions Precompute approximate eigenvectors (~20Gb) For each class, labels propagating through 80 million images

39

40 Summary Semi-supervised scheme that can scale to really large problems Rather than sub-sampling the data, we take the limit of infinite unlabeled data Assumes input data distribution is separable Can propagate labels in graph with 80 million nodes in fractions of second

41

42 Future Work Can potentially use 2D or 3D histograms instead of 1D –Requires more data Consider diagonal eigenfunctions Sharing of labels between classes

43 Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA MI is mutual information score. 0 = Independent

44 Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist ICA Joint histogram for pairs of dimensions after ICA MI is mutual information score. 0 = Independent

45

46 Overview of Our Approach Existing large-scale SSL methods try to reduce # points We consider what happens as n  ∞ Eigenvectors  Eigenfunctions Assume input distribution is separable Make crude numerical approx. to Eigenfunctions Interpolate data in these approximate eigenfunctions to give approx. eigenvalues

47 Eigenfunctions Eigenfunction are limit of Eigenvectors as n  ∞ Analytical forms of eigenfunctions exist only in a few cases: Uniform, Gaussian Instead, we calculate numerical approximation to eigenfunctions [Nadler et al. 06,Weiss et al. 08] [Coifman et al. 05, Nadler et al. 06, Belikin & Niyogi 07]

48 Complexity Comparison Nystrom Select m landmark points Get smallest k eigenvectors of m x m system Interpolate n points into k eigenvectors Solve k x k linear system Eigenfunction Rotate n points Form d 1-D histograms Solve d linear systems, each b x b k 1-D interpolations of n points Solve k x k linear system Key: n = # data points (big, >10 6 ) l = # labeled points (small, <100) m = # landmark points d = # input dims (~100) k = # eigenvectors (~100) b = # histogram bins (~50) Polynomial in # landmarksLinear in # data points

49 Can’t build accurate high dimensional histograms –Need too many points Currently just use 1-D histograms –2 or 3D ones possible with enough data This assumes distribution is separable –Assume p(x) = p(x 1 ) p(x 2 ) … p(x d ) For separable distributions, eigenfunctions are also separable Key Assumption: Separability of Input data [Nadler et al. 06,Weiss et al. 08]

50 Varying # Training Examples


Download ppt "Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read."

Similar presentations


Ads by Google