80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.
Aggregating local image descriptors into compact codes
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Presented by Xinyu Chang
Improved TF-IDF Ranker
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Salvatore giorgi Ece 8110 machine learning 5/12/2014
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.
Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Proceedings of the IEEE 2010 Antonio Torralba, MIT Jenny Yuen, MIT Bryan C. Russell, MIT.
Robust and large-scale alignment Image from
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
A Study of Approaches for Object Recognition
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Video/Image Fingerprinting & Search Naren Chittar CS 223-B project, Winter 2008.
Opportunities of Scale Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Human abilities Presented By Mahmoud Awadallah 1.
Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.
Labeling Images for FUN!!! Yan Cao, Chris Hinrichs.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Feature-Based Classification & Principle Component Analysis.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Competence Centre on Information Extraction and Image Understanding for Earth Observation 29th March 2007 Category - based Semantic Search Engine 1 Mihai.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Visual Data on the Internet With slides from Alexei Efros, James Hays, Antonio Torralba, and Frederic Heger : Computational Photography Jean-Francois.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Image Classification for Automatic Annotation
Multi-object Similarity Query Evaluation Michal Batko.
Yixin Chen and James Z. Wang The Pennsylvania State University
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Computer Vision Group Department of Computer Science University of Illinois at Urbana-Champaign.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Tentative Future Courses Fall `11 : Computer Vision – emphasis on recognition Spring `11 : Graduate seminar Fall `12 : Computational Photography.
SIFT Scale-Invariant Feature Transform David Lowe
Automatic Video Shot Detection from MPEG Bit Stream
Supervised Time Series Pattern Discovery through Local Importance
Nonparametric Semantic Segmentation
Recognition: Face Recognition
Recognition using Nearest Neighbor (or kNN)
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Image Segmentation Techniques
Rob Fergus Computer Vision
Aim of the project Take your image Submit it to the search engine
Paper Reading Dalong Du April.08, 2011.
Minwise Hashing and Efficient Search
Presentation transcript:

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008

Outline Motivation Low dimensional image representation Solution for the gap between image and semantic meaning Experiments Conclusion

Motivation There are billions of images available online, which is a dense sampling of the visual world. Can we use them effectively? Existing datasets have images spreading over a few different classes.

Problems needed to be concerned How big is enough to robustly perform recognition? What is the smallest resolution with reliable performance in classification?

Low dimensional image representation 32 × 32 color images contain enough information for scene recognition, object detection and segmentation.

Low dimensional image representation (Cont.) Scene recognition

Low dimensional image representation (Cont.) Segmentation of 32 × 32 images

Low dimensional image representation (Cont.) We cannot recognize the below objects without the knowledge about their context.

Low dimensional image representation (Cont.) Conclusion for low resolution representation: 32 × 32 color image contains enough information for scene recognition, object detection and segmentation.

Low dimensional image representation (Cont.) Conclusion for low resolution representation: It is practical to work with millions of images with a small resolution in respect of image storage capacity, image processing in retrieval process. Example: 256 × 256 × 3 = 192 KB / image It takes 192 GB for 1 million images. 32 × 32 × 3 = 3KB / image It takes 3 GB for 1 million images.

A large dataset of 32 × 32 images (Cont.) Collection procedure [Russell et al. 2008] Where? What? How?

A large dataset of 32 × 32 images (Cont.) Collection procedure [Russell et al. 2008] Where -- internet, collecting images from 7 independent image search engines. What -- result images from search engines by querying non-abstract nouns. How --

A large dataset of 32 × 32 images (Cont.) Statistics of tiny image in database

Statistics of very low resolution images Is there any statistic relation between dataset size and the probability of finding similar images? How many images are needed to be able to find a similar image to match any input image?

Statistics of very low resolution images (Cont.) If we want to retrieve the top 50 closest similar images from a 10,000 images’ dataset, how many similar images should we retrieve to guarantee 80% of the images in result are among the real top 50 closest ?

Statistics of very low resolution images (Cont.) : the set of N exact nearest neighbors : the set of M approximate nearest neighbors The probability that an image, of index i, from the set is also inside :

Statistics of very low resolution images (Cont.)

With probability of 80% to find

Statistics of very low resolution images (Cont.) Comparison between two images Sum of squared distances (SSD) between two images I 1 and I 2. To improve the computation performance, they index the images using the first 19 principal components of the 80 million images

Statistics of very low resolution images (Cont.) Approximate distance C : the number of components used to approximate the distance. v i (n): the nth principal component coefficient for the ith image.

Statistics of very low resolution images (Cont.) Image similarity metrics Incorporating invariance to small translations, scaling and image mirror, they introduce this similarity measure: : optimized by gradient descent

Statistics of very low resolution images (Cont.) Initializing I 2 with the warping parameters obtained after optimization of Shifted by 5 × 5 pixel

Statistics of very low resolution images (Cont.)

Impact on performance: logarithmical similarity metrics: D shift

Solution for semantic gap Wordnet voting scheme Wordnet provides semantic relationships between the non-abstract and the collected images. Wordnet tree: Recognition of a test image can be performed at multiple semantic levels. Using the wordnet hierarchy tree, we can get the images with upper semantic level.

Solution for semantic gap (Cont.)

Experiments Images belonging to “person” in wordnet tree. Measured by D shift

Experiments – person detection Person detection Containing person or not Existing Detection: Face detection, head and shoulders, profile faces

Experiments (Cont.) – person detection Comparison for the size of person in images

Experiments (Cont.) – person detection Person detection

Experiments (Cont.) – person detection Person detection (head >20%)

Experiments (Cont.) – person detection Evaluating using Altavista images Reordering the images by Wordnet Voting scheme

Experiments (Cont.) – person detection

Experiments -- Person localization Person localization Extract multiple putative crops of the high resolution query image. For each crop, they resize it to 32 × 32 pixels and query the tiny image dataset to obtain it’s retrieval set. To reduce the number of crops, they segment the image using normalized cut, producing around 10 segments. All possible combinations of contiguous segments are considered.

Experiments (Cont.) -- Person localization Similarity Measure: D shift Nearest Neighbor Number: 80

Experiments – Scene recognition Scene recognition Retrieving the images with semantic meaning of “location”

Experiments (Cont.) – Scene recognition High voting for “location” Low voting for “location”

Experiments (Cont.) – Scene recognition

Experiments – Image annotation Target object is absent or occupies at least 20% pixels 80 nearest neighbors

Conclusion Their experiments show that 32 × 32 is the minimum color image resolution for a reliable object recognition and scene recognition. The 79 million dataset can provide a reasonable density over the manifold of natural images. With the huge dataset and semantic voting scheme, it performs well in person detection, person localization and scene recognition.

References 1.B. C. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web- based tool for image annotation. Intl. J. Computer Vision, 77(1-3): , C. Fellbaum. Wordnet: An Electronic Lexical Database. Bradford Books, 1998