Presentation is loading. Please wait.

Presentation is loading. Please wait.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Similar presentations


Presentation on theme: "80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008."— Presentation transcript:

1 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008

2 Outline Motivation Low dimensional image representation Solution for the gap between image and semantic meaning Experiments Conclusion

3 Motivation There are billions of images available online, which is a dense sampling of the visual world. Can we use them effectively? Existing datasets have 10 2 --10 4 images spreading over a few different classes.

4 Problems needed to be concerned How big is enough to robustly perform recognition? What is the smallest resolution with reliable performance in classification?

5 Low dimensional image representation 32 × 32 color images contain enough information for scene recognition, object detection and segmentation.

6 Low dimensional image representation (Cont.) Scene recognition

7 Low dimensional image representation (Cont.) Segmentation of 32 × 32 images

8 Low dimensional image representation (Cont.) We cannot recognize the below objects without the knowledge about their context.

9 Low dimensional image representation (Cont.) Conclusion for low resolution representation: 32 × 32 color image contains enough information for scene recognition, object detection and segmentation.

10 Low dimensional image representation (Cont.) Conclusion for low resolution representation: It is practical to work with millions of images with a small resolution in respect of image storage capacity, image processing in retrieval process. Example: 256 × 256 × 3 = 192 KB / image It takes 192 GB for 1 million images. 32 × 32 × 3 = 3KB / image It takes 3 GB for 1 million images.

11 A large dataset of 32 × 32 images (Cont.) Collection procedure [Russell et al. 2008] Where? What? How?

12 A large dataset of 32 × 32 images (Cont.) Collection procedure [Russell et al. 2008] Where -- internet, collecting images from 7 independent image search engines. What -- result images from search engines by querying non-abstract nouns. How --

13 A large dataset of 32 × 32 images (Cont.) Statistics of tiny image in database

14 Statistics of very low resolution images Is there any statistic relation between dataset size and the probability of finding similar images? How many images are needed to be able to find a similar image to match any input image?

15 Statistics of very low resolution images (Cont.) If we want to retrieve the top 50 closest similar images from a 10,000 images’ dataset, how many similar images should we retrieve to guarantee 80% of the images in result are among the real top 50 closest ?

16 Statistics of very low resolution images (Cont.) : the set of N exact nearest neighbors : the set of M approximate nearest neighbors The probability that an image, of index i, from the set is also inside :

17 Statistics of very low resolution images (Cont.)

18 With probability of 80% to find

19 Statistics of very low resolution images (Cont.) Comparison between two images Sum of squared distances (SSD) between two images I 1 and I 2. To improve the computation performance, they index the images using the first 19 principal components of the 80 million images

20 Statistics of very low resolution images (Cont.) Approximate distance C : the number of components used to approximate the distance. v i (n): the nth principal component coefficient for the ith image.

21 Statistics of very low resolution images (Cont.) Image similarity metrics Incorporating invariance to small translations, scaling and image mirror, they introduce this similarity measure: : optimized by gradient descent

22 Statistics of very low resolution images (Cont.) Initializing I 2 with the warping parameters obtained after optimization of Shifted by 5 × 5 pixel

23 Statistics of very low resolution images (Cont.)

24 Impact on performance: logarithmical similarity metrics: D shift

25 Solution for semantic gap Wordnet voting scheme Wordnet provides semantic relationships between the non-abstract and the collected images. Wordnet tree: Recognition of a test image can be performed at multiple semantic levels. Using the wordnet hierarchy tree, we can get the images with upper semantic level.

26 Solution for semantic gap (Cont.)

27 Experiments Images belonging to “person” in wordnet tree. Measured by D shift

28 Experiments – person detection Person detection Containing person or not Existing Detection: Face detection, head and shoulders, profile faces

29 Experiments (Cont.) – person detection Comparison for the size of person in images

30 Experiments (Cont.) – person detection Person detection

31 Experiments (Cont.) – person detection Person detection (head >20%)

32 Experiments (Cont.) – person detection Evaluating using Altavista images Reordering the images by Wordnet Voting scheme

33 Experiments (Cont.) – person detection

34 Experiments -- Person localization Person localization Extract multiple putative crops of the high resolution query image. For each crop, they resize it to 32 × 32 pixels and query the tiny image dataset to obtain it’s retrieval set. To reduce the number of crops, they segment the image using normalized cut, producing around 10 segments. All possible combinations of contiguous segments are considered.

35 Experiments (Cont.) -- Person localization Similarity Measure: D shift Nearest Neighbor Number: 80

36 Experiments – Scene recognition Scene recognition Retrieving the images with semantic meaning of “location”

37 Experiments (Cont.) – Scene recognition High voting for “location” Low voting for “location”

38 Experiments (Cont.) – Scene recognition

39 Experiments – Image annotation Target object is absent or occupies at least 20% pixels 80 nearest neighbors

40 Conclusion Their experiments show that 32 × 32 is the minimum color image resolution for a reliable object recognition and scene recognition. The 79 million dataset can provide a reasonable density over the manifold of natural images. With the huge dataset and semantic voting scheme, it performs well in person detection, person localization and scene recognition.

41 References 1.B. C. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web- based tool for image annotation. Intl. J. Computer Vision, 77(1-3):157-173,2008 2.C. Fellbaum. Wordnet: An Electronic Lexical Database. Bradford Books, 1998


Download ppt "80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008."

Similar presentations


Ads by Google