Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Content-Based Image Retrieval
Discriminative Relevance Feedback With Virtual Textual Representation For Efficient Image Retrieval Suman Karthik and C.V.Jawahar.
Patch to the Future: Unsupervised Visual Prediction
1 Overview of Image Retrieval Hui-Ying Wang. 2/42 Reference Smeulders, A. W., Worring, M., Santini, S., Gupta, A.,, and Jain, R “Content-based.
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
A Study of Approaches for Object Recognition
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Information Retrieval in Practice
A. Frank Multimedia Multimedia/Video Search. 2 A. Frank Contents Multimedia (MM) and search/retrieval Text-based MM search in General SEs Text-based MM.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Spatio-chromatic image content descriptors and their analysis using Extreme Value theory Vasileios Zografos and Reiner Lenz
Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.
Creating and Exploring a Large Photorealistic Virtual Space INRIA / CSAIL / Adobe First IEEE Workshop on Internet Vision, associated with CVPR 2008.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
ImageNet: A Large-Scale Hierarchical Image Database
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Presenting by, Prashanth B R 1AR08CS035 Dept.Of CSE. AIeMS-Bidadi. Sketch4Match – Content-based Image Retrieval System Using Sketches Under the Guidance.
Shin’ichi Satoh National Institute of Informatics.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Search Engines and Information Retrieval Chapter 1.
Human abilities Presented By Mahmoud Awadallah 1.
Multimedia Databases (MMDB)
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Labeling Images for FUN!!! Yan Cao, Chris Hinrichs.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Interactive Vision Two methods for Interactive Edge detection. Final Project by Daniel Zatulovsky
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley.
November 13, 2014Computer Vision Lecture 17: Object Recognition I 1 Today we will move on to… Object Recognition.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
Intelligent Bilddatabassökning Reiner Lenz, Thanh H. Bui, (Linh V. Tran) ITN, Linköpings Universitet David Rydén, Göran Lundberg Matton AB, Stockholm.
Content-Based Image Retrieval Using Fuzzy Cognition Concepts Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung University.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Visual Data on the Internet With slides from Alexei Efros, James Hays, Antonio Torralba, and Frederic Heger : Computational Photography Jean-Francois.
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Chittampally Vasanth Raja vasanthexperiments.wordpress.com.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Yixin Chen and James Z. Wang The Pennsylvania State University
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Tentative Future Courses Fall `11 : Computer Vision – emphasis on recognition Spring `11 : Graduate seminar Fall `12 : Computational Photography.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Information Retrieval in Practice
Visual Information Retrieval
Supervised Time Series Pattern Discovery through Local Importance
Recognition using Nearest Neighbor (or kNN)
Multimedia Information Retrieval
Lecture 25: Introduction to Recognition
WordNet: A Lexical Database for English
Object Recognition Today we will move on to… April 12, 2018
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Objects as Attributes for Scene Classification
International Marketing and Output Database Conference 2005
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Presentation transcript:

Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled

Internet 2012 in numbers 7 petabytes – How much photo content Facebook added every month.added 300 million – Number of new photos added every day to Facebook.added 5 billion – The total number of photos uploaded to Instagram since its start, reached in September 2012.total number of photos 58 – Number of photos uploaded every second to Instagram.every second 1 – Apple iPhone 4S was the most popular camera on Flickr.camera

Image search is a specialized data search used to find images Search methods – Image meta search – Content-base image retrieval Image Retrieval

Search of images based on associated metadata such as keywords, text, etc. Google Images – The keywords for the image search are based on the filename of the image, the link text pointing to the image, and text adjacent to the image Image meta search

The search will analyze the actual contents of the image by colors, shapes, textures etc. The most common method for comparing two images in content based image retrieval is using an image distance measure. Many CBIR systems have been developed, but the problem of retrieving images on the basis of their pixel content remains largely unsolved. Content-based image retrieval (CBIR)

prag.diee.unica.it

Why not combine both methods?

Primary goals 79,000,000 images collected from WWW Image matching similar to Google search prediction – “Did you mean?” tool

The problem 79,000,000 images – Large storage – Long process time

Collecting ~80,000,000 images Using image search engines: – Altavista, Ask, Flickr, Cydral, Google, Picsearch and Webshots 760GB on one hard disk?

Creating image dataset Each image is labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. The result is a large semantic tree

What is WordNet WordNet® is a large lexical database of English Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

carrot Plant root Plant organ Plant part Natural object Object, physical object entity, physical thing entity mechanism Mechanical device sprinkler

Reduce space and process time With The size of 32X32 we can get more than 80% correct recognition rate

Reduce space and process time Moving from 256X256 to 32X32

Reduce space and process time Studies on the face perception have shown that only 16X16 pixels needed for robust face recognition This remarkable performance is also found in a scene recognition task

Reduce space and process time Speech recognition uses 10^6 data points. Current experiments in object recognition typically use 10^2 - 10^4

Reduce space and process time Human visual space ( 100 years ) * ( 30 frames per sec ) = 10^11 All 32X32 images = 10^7400 images – Most of the images are just noise

Reduce space and process time We understand that 32^2 contain enough data for our purpose. The advantage is the ability to work with million of images (~10^8).

Statistics of low-res images Image matching methods: – SSD (sum of squared differences) – Warp – Shift (per pixel)

Statistics of low-res images

Recognition The goal is to recognize objects and scenery by using SSD, WARP, SHIFT methods instead of complex matching algorithms Given an image, the neighbors are found using some similarity measure (D-Shift)

Recognition Each neighbor in turn votes for its branch within the WordNet tree. Classification Image Search returns an object

Person detection Is it a person?

Person detection Standard approach : Face detection algorithm

Person detection Better approach: Using the image DB More then 23% images contain pictures of people

Person detection Evaluating performance by two different sets of test images: - Evaluation using randomly drawn images - Evaluation using Altavista images

Evaluation using randomly drawn images Randomly drawn 1,125 images from DB People were manually segmented on each image Findings: – Large Appearance  Better performance – Weaker labels  Largest object

Large Appearance  Better Performance A better performance is achieved when a person’s appearance is greater than 20% of the image.

Evaluation using Altavista images 1,018 images drawn by searching ‘person’ label Images classified using WordNet  Reordered labels

Scene recognition A search for images that match an entire scene rather than a specific object Randomly tagging 1,125 pictures to: “City”, “River”, “Field”, “Mountain”

DB Size: 80,000, ,000 8,000 The larger the database, the more successful the detection rate.

Achievements Building a large dataset of 79 million 32x32 color labeled images. Showing that a simple non-parametric method, in conjunction with large dataset, can give reasonable performance on object recognition task. Tasks as Person detection and Scene detection perform as good as leading class specific detectors

Conclusions It is possible to put less effort into the modeling part in object recognition (seeking to develop suitable parametric representation for recognition), while simultaneously improving the dataset itself can help to solve the same problem.

References 80 million tiny images – ImageNet – WordNet – Precision and recall – ROC curve – Images taken from: – – – – – – –