Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.

Similar presentations


Presentation on theme: "Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition."— Presentation transcript:

1 Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition. CVPR 2008 Small Codes and Large Image Databases for Recognition A. Torralba, R. Fergus, W. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. TR80 million tiny images: a large dataset for non-parametric object and scene recognition. Presented by Ken and Ryan

2 Outline Large Datasets of Images Searching Large Datasets –Nearest Neighbor –ANN: Locality Sensitive Hashing Dimensionality Reduction –Boosting –Restricted Boltzmann Machines (RBM) Results

3 Goal Develop efficient image search and scene matching techniques that are fast and require very little memory Particularly on VERY large image sets Query

4 Motivation Image sets –Vogel & Schiele: 702 natural scenes in 6 cat –Olivia & Torralba: 2688 –Caltech 101: ~50 images/cat ~ 5000 –Caltech 256: 80-800 images/cat ~ 30608 Why do we want larger datasets?

5 Motivation Classify any image Complex classification methods don’t extend well Can we use a simple classification method?

6 Thumbnail Collection Project Collect images for ALL objects –List obtained from WordNet –75,378 non-abstract nouns in English

7 Thumbnail Collection Project Collected 80M images http://people.csail.mit.edu/torralba/tinyimages

8 How Much is 80M Images? One feature-length movie: –105 min = 151K frames @ 24 FPS For 80M images, watch 530 movies How do we store this? –1k * 80M = 80 GB –Actual storage: 760GB

9 First Attempt Store each image as 32x32 color thumbnail Based on human visual perception Information: 32*32*3 channels =3072 entries

10 First Attempt Used SSD++ to find nearest neighbors of query image –Used first 19 principal components

11 Motivation Part 2 Is this good enough? SSD is naïve Still too much storage required How can we fix this? –Traditional methods of searching large datasets –Binary reduction

12

13

14

15 Locality-Sensitive Hash Families

16

17 LSH Example

18

19 Binary Reduction Lots of pixels 512 values32 bits Gist vector Binary reduction 164 GB 320 MB 80 million images?

20 Gist “The ‘gist’ is an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.)” A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision, 42(3):145–175, 2001.

21 Gist

22 http://ilab.usc.edu/siagian/Research/Gist/Gist.html Gist vector

23 Query Image Dataset Querying

24 1  ?

25 6  ?

26

27 Boosting Positive and negative image pairs train the discovery of the binary reduction. & & = 1 = -1 80% negatives150K pairs

28 BoostSSC Similarity Sensitive Coding Weights start uniformly xixi N values Weight 

29 BoostSSC For each bit m: –Choose the index n that minimizes a weighted error across entire training set Feature vector x from image i Binary reduction h(x) N values M bits m n

30 BoostSSC Weak classifications are evaluated via regression stumps: xixi N values n xjxj We need to figure out , , and T for each n. If x i and x j are similar, we should get 1 for most n’s.

31 BoostSSC Try a range of threshold T: –Regress f across entire training set to find each  and . –Keep the T that fits the best. Then, keep the n that causes the least weighted error. xixi xjxj n N values n n

32 BoostSSC xixi xjxj N values M bits m n

33 BoostSSC Update weights. –Affects future error calculations xixi xjxj N values n Weight 

34 BoostSSC In the end, each bit has an n index and a threshold. xixi N values M bits

35 BoostSSC

36 Restricted Boltzmann Machine (RBM) Architecture Network of binary stochastic units Hinton & Salakhutdinov, Nature 2006 Parameters:w:Symmetric Weights b:Biases h:Hidden Units v:Visible Units

37 Multi-Layer RBM Architecture

38 Training RBM Models Two phases 1.Pre-training Unsupervised Use Contrastive Divergence to learn weights and biases Gets parameters in the right ballpark 2.Fine-tuning Supervised No longer stochastic Backpropogate error to update parameters Moves parameters to local minimum

39 Greedy Pre-training (Unsupervised)

40

41

42

43 Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Output of RBM W are RBM weights

44 Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Assume K=2 classes

45 Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Pulls nearby points of same class closer

46 Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Pulls nearby points of same class closer Goal is to preserve neighborhood structure of original, high-dimensional space

47 Experiments and Results

48 Searching Bit limitations: –Hashing scheme: Max. capacity for 13M images: 30 bits –Exhaustive search: 256 bits possible

49 Searching Results

50 LabelMe Retrieval

51 Examples of Web Retrieval 12 neighbors using different distance metrics

52 Web Images Retrieval

53 Conclusion Efficient searching for large image datasets Compact image representation Methods for binary reductions –Locality-Sensitive Hashing –Boosting –Restricted Boltzmann Machines Searching techniques

54

55 How Much is 80M Images? 1 feature-length movie: –105 min = 151K frames @ 24 FPS For 80M images, watch 530 movies

56

57

58

59

60

61 Parameters:Weights wBiases b

62 Input to RBM: Gist Vectors


Download ppt "Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition."

Similar presentations


Ads by Google