Download presentation

Presentation is loading. Please wait.

Published byGavin Newman Modified over 3 years ago

1
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

2
What does the world look like? High level image statistics Object Recognition for large-scale image search Gigantic Image Collections

3
Spectrum of Label Information Human annotationsNoisy labels Unlabeled

4
Semi-Supervised Learning Classification function should be smooth with respect to data density DataSupervisedSemi-Supervised

5
Semi-Supervised Learning using Graph Laplacian is n x n affinity matrix (n = # of points) Graph Laplacian: [Zhu03,Zhou04]

6
Want to find label function f that minimizes: y = labels If labeled,, otherwise SSL using Graph Laplacian Solution: SmoothnessAgreement with labels n x n system (n = # points)

7
Smooth vectors will be linear combinations of eigenvectors U with small eigenvalues: Eigenvectors of Laplacian [Belkin & Niyogi 06, Schoelkopf & Smola 02, Zhu et al 03, 08]

8
Rewrite System Let U = smallest k eigenvectors of L = coeffs. k is user parameter (typically ~100) Optimal is now solution to k x k system:

9
Computational Bottleneck Consider a dataset of 80 million images Inverting L –Inverting 80 million x 80 million matrix Finding eigenvectors of L –Diagonalizing 80 million x 80 million matrix

10
Large Scale SSL - Related work Nystrom method: pick small set of landmark points –Compute exact eigenvectors on these –Interpolate solution to rest Other approaches include: [see Zhu 08 survey] DataLandmarks Mixture models (Zhu and Lafferty 05), Sparse Grids (Garcke and Griebel 05), Sparse Graphs (Tsang and Kwok 06)

11
Our Approach

12
Overview of Our Approach Compute approximate eigenvectors DataLandmarks Density Reduce n Limit as n NystromOurs Linear in number of data-points Polynomial in number of landmarks

13
Consider Limit as n Consider x to be drawn from 2D distribution p(x) Let L p (F) be a smoothness operator on p(x), for a function F(x) Smoothness operator penalizes functions that vary in areas of high density Analyze eigenfunctions of L p (F) where 2

14
Eigenvectors & Eigenfunctions

15
Claim: If p is separable, then: Eigenfunctions of marginals are also eigenfunctions of the joint density, with same eigenvalue p(x 1,x 2 ) p(x 1 ) p(x 2 ) Key Assumption: Separability of Input data [Nadler et al. 06,Weiss et al. 08]

16
Numerical Approximations to Eigenfunctions in 1D 300,000 points drawn from distribution p(x) Consider p(x 1 ) p(x) Data p(x 1 ) Histogram h(x 1 )

17
Solve for values of eigenfunction at set of discrete locations (histogram bin centers) –and associated eigenvalues –B x B system (B = # histogram bins, e.g. 50) Numerical Approximations to Eigenfunctions in 1D

18
1D Approximate Eigenfunctions 1 st Eigenfunction of h(x 1 ) 2 nd Eigenfunction of h(x 1 ) 3 rd Eigenfunction of h(x 1 )

19
Separability over Dimension Build histogram over dimension 2: h(x 2 ) Now solve for eigenfunctions of h(x 2 ) 1 st Eigenfunction of h(x 2 ) 2 nd Eigenfunction of h(x 2 ) 3 rd Eigenfunction of h(x 2 )

20
Take each data point Do 1-D interpolation in each eigenfunction Very fast operation From Eigenfunctions to Approximate Eigenvectors Histogram bin 1 50 Eigenfunction value

21
Preprocessing Need to make data separable Rotate using PCA Not separable Separable PCA

22
Overall Algorithm 1.Rotate data to maximize separability (currently use PCA) 2.For each of the d input dimensions: –Construct 1D histogram –Solve numerically for eigenfunctions/values 3.Order eigenfunctions from all dimensions by increasing eigenvalue & take first k 4.Interpolate data into k eigenfunctions –Yields approximate eigenvectors of Laplacian 5.Solve k x k least squares system to give label function

23
Experiments on Toy Data

24
Nystrom Comparison With Nystrom, too few landmark points result in highly unstable eigenvectors

25
Nystrom Comparison Eigenfunctions fail when data has significant dependencies between dimensions

26
Experiments on Real Data

27
Experiments Images from 126 classes downloaded from Internet search engines, total 63,000 images Dump truck Emu Labels (correct/incorrect) provided by Alex Krizhevsky, Vinod Nair & Geoff Hinton, (CIFAR & U. Toronto)

28
Input Image Representation Pixels not a convenient representation Use Gist descriptor (Oliva & Torralba, 2001) L2 distance btw. Gist vectors rough substitute for human perceptual distance Apply oriented Gabor filters over different scales Average filter energy in each bin

29
Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA to 64 dimensions MI is mutual information score. 0 = Independent

30
Real 1-D Eigenfunctions of PCAd Gist descriptors Eigenfunction 1 Input Dimension

31
Protocol Task is to re-rank images of each class (class/non-class) Use eigenfunctions computed on all 63,000 images Vary number of labeled examples Measure 15% recall

32
Total number of images

33
Total number of images

34
Total number of images

35
Total number of images

36
80 Million Images

37
Running on 80 million images PCA to 32 dims, k=48 eigenfunctions For each class, labels propagating through 80 million images Precompute approximate eigenvectors (~20Gb) Label propagation is fast <0.1secs/keyword

38
Japanese Spaniel 3 positive 3 negative Labels from CIFAR set

39
Airbus, Ostrich, Auto

40
Summary Semi-supervised scheme that can scale to really large problems – linear in # points Rather than sub-sampling the data, we take the limit of infinite unlabeled data Assumes input data distribution is separable Can propagate labels in graph with 80 million nodes in fractions of second Related paper in this NIPS by Nadler, Srebro & Zhou –See spotlights on Wednesday

41

42
Future Work Can potentially use 2D or 3D histograms instead of 1D –Requires more data Consider diagonal eigenfunctions Sharing of labels between classes

43
Comparison of Approaches

44
Data Exact Eigenvectors : Exact -- Approximate EigenvaluesApproximate Eigenvectors : : : :

45
Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA MI is mutual information score. 0 = Independent

46
Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist ICA Joint histogram for pairs of dimensions after ICA MI is mutual information score. 0 = Independent

47
Varying # Eigenfunctions

48
Leveraging Noisy Labels Images in dataset have noisy labels Keyword used in from Internet search engine Can easily be incorporated into SSL scheme Give weight 1/10 th of hand-labeled example

49
Leveraging Noisy Labels

50
Effect of Noisy Labels

51
Complexity Comparison Nystrom Select m landmark points Get smallest k eigenvectors of m x m system Interpolate n points into k eigenvectors Solve k x k linear system Eigenfunction Rotate n points Form d 1-D histograms Solve d linear systems, each b x b k 1-D interpolations of n points Solve k x k linear system Key: n = # data points (big, >10 6 ) l = # labeled points (small, <100) m = # landmark points d = # input dims (~100) k = # eigenvectors (~100) b = # histogram bins (~50) Polynomial in # landmarksLinear in # data points

52
Semi-Supervised Learning using Graph Laplacian V = data points (n in total) E = n x n affinity matrix W Graph Laplacian: [Zhu03,Zhou04]

53
Rewrite System Let U = smallest k eigenvectors of L = coeffs. k is user parameter (typically ~100) Optimal is now solution to k x k system:

54
Consider Limit as n Consider x to be drawn from 2D distribution p(x) Let L p (F) be a smoothness operator on p(x), for a function F(x): Analyze eigenfunctions of L p (F) where 2

55
Solve for values g of eigenfunction at set of discrete locations (histogram bin centers) –and associated eigenvalues –B x B system (# histogram bins = 50) P is diag(h(x 1 )) Numerical Approximations to Eigenfunctions in 1D Affinity between discrete locations

56
Real 1-D Eigenfunctions of PCAd Gist descriptors Eigenfunction 1 Eigenfunction 256 Input Dimension Eigenfunction value Color = Input dimension x min x max Histogram bin 150

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google