Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.

Similar presentations


Presentation on theme: "Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled."— Presentation transcript:

1 Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled

2

3 http://royal.pingdom.com/ Internet 2012 in numbers 7 petabytes – How much photo content Facebook added every month.added 300 million – Number of new photos added every day to Facebook.added 5 billion – The total number of photos uploaded to Instagram since its start, reached in September 2012.total number of photos 58 – Number of photos uploaded every second to Instagram.every second 1 – Apple iPhone 4S was the most popular camera on Flickr.camera

4 Image search is a specialized data search used to find images Search methods – Image meta search – Content-base image retrieval Image Retrieval

5 Search of images based on associated metadata such as keywords, text, etc. Google Images – The keywords for the image search are based on the filename of the image, the link text pointing to the image, and text adjacent to the image Image meta search http://en.wikipedia.org/wiki/Google_Images

6 The search will analyze the actual contents of the image by colors, shapes, textures etc. The most common method for comparing two images in content based image retrieval is using an image distance measure. Many CBIR systems have been developed, but the problem of retrieving images on the basis of their pixel content remains largely unsolved. Content-based image retrieval (CBIR) http://en.wikipedia.org/wiki/Content-based_image_retrieval

7 prag.diee.unica.it www.dailydawdle.com

8 Why not combine both methods?

9 Primary goals 79,000,000 images collected from WWW Image matching similar to Google search prediction – “Did you mean?” tool

10 The problem 79,000,000 images – Large storage – Long process time

11 Collecting ~80,000,000 images Using image search engines: – Altavista, Ask, Flickr, Cydral, Google, Picsearch and Webshots 760GB on one hard disk? www.apartmenttherapy.com

12 Creating image dataset Each image is labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. The result is a large semantic tree

13 What is WordNet WordNet® is a large lexical database of English Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. http://wordnet.princeton.edu

14 carrot Plant root Plant organ Plant part Natural object Object, physical object entity, physical thing entity mechanism Mechanical device sprinkler

15 http://www.cs.princeton.edu/courses/archive/spr07/cos226/assignments/wordnet.html

16 Reduce space and process time With The size of 32X32 we can get more than 80% correct recognition rate

17 Reduce space and process time Moving from 256X256 to 32X32

18 Reduce space and process time Studies on the face perception have shown that only 16X16 pixels needed for robust face recognition This remarkable performance is also found in a scene recognition task

19 Reduce space and process time Speech recognition uses 10^6 data points. Current experiments in object recognition typically use 10^2 - 10^4

20 Reduce space and process time Human visual space ( 100 years ) * ( 30 frames per sec ) = 10^11 All 32X32 images = 10^7400 images – Most of the images are just noise

21 Reduce space and process time We understand that 32^2 contain enough data for our purpose. The advantage is the ability to work with million of images (~10^8).

22 Statistics of low-res images Image matching methods: – SSD (sum of squared differences) – Warp – Shift (per pixel)

23 Statistics of low-res images

24

25

26

27 Recognition The goal is to recognize objects and scenery by using SSD, WARP, SHIFT methods instead of complex matching algorithms Given an image, the neighbors are found using some similarity measure (D-Shift)

28 Recognition Each neighbor in turn votes for its branch within the WordNet tree. Classification Image Search returns an object

29

30

31

32

33 Person detection Is it a person?

34 Person detection Standard approach : Face detection algorithm

35 Person detection Better approach: Using the image DB More then 23% images contain pictures of people

36 Person detection Evaluating performance by two different sets of test images: - Evaluation using randomly drawn images - Evaluation using Altavista images

37 Evaluation using randomly drawn images Randomly drawn 1,125 images from DB People were manually segmented on each image Findings: – Large Appearance  Better performance – Weaker labels  Largest object

38 Large Appearance  Better Performance A better performance is achieved when a person’s appearance is greater than 20% of the image.

39

40

41 Evaluation using Altavista images 1,018 images drawn by searching ‘person’ label Images classified using WordNet  Reordered labels

42

43 Scene recognition A search for images that match an entire scene rather than a specific object Randomly tagging 1,125 pictures to: “City”, “River”, “Field”, “Mountain”

44 DB Size: 80,000,000 800,000 8,000 The larger the database, the more successful the detection rate.

45

46 Achievements Building a large dataset of 79 million 32x32 color labeled images. Showing that a simple non-parametric method, in conjunction with large dataset, can give reasonable performance on object recognition task. Tasks as Person detection and Scene detection perform as good as leading class specific detectors

47 Conclusions It is possible to put less effort into the modeling part in object recognition (seeking to develop suitable parametric representation for recognition), while simultaneously improving the dataset itself can help to solve the same problem.

48 References 80 million tiny images – http://people.csail.mit.edu/torralba/publications/80millionImages.pdf http://people.csail.mit.edu/torralba/publications/80millionImages.pdf ImageNet – http://wordnet.cs.princeton.edu/papers/imagenet_cvpr09.pdf http://wordnet.cs.princeton.edu/papers/imagenet_cvpr09.pdf WordNet – http://wordnet.princeton.edu/wordnet/ http://wordnet.princeton.edu/wordnet/ Precision and recall – http://en.wikipedia.org/wiki/Precision_and_recall http://en.wikipedia.org/wiki/Precision_and_recall ROC curve – http://en.wikipedia.org/wiki/Receiver_operating_characteristic http://en.wikipedia.org/wiki/Receiver_operating_characteristic Images taken from: – http://royal.pingdom.com/ http://royal.pingdom.com/ – http://en.wikipedia.org/wiki/Google_Images http://en.wikipedia.org/wiki/Google_Images – http://en.wikipedia.org/wiki/Content-based_image_retrieval http://en.wikipedia.org/wiki/Content-based_image_retrieval – http:// www.prag.diee.unica.it http:// www.prag.diee.unica.it – http:// www.dailydawdle.com http:// www.dailydawdle.com – www.apartmenttherapy.com www.apartmenttherapy.com – http://www.cs.princeton.edu/courses/archive/spr07/cos226/assignments/wordnet.html http://www.cs.princeton.edu/courses/archive/spr07/cos226/assignments/wordnet.html

49


Download ppt "Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled."

Similar presentations


Ads by Google