Analysis of Large Scale Visual Recognition

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Regionlets for Generic Object Detection
Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.
Contributions A people dataset of 8035 images. Three layer attribute classification framework using poselets. 1 2.
Scalable Learning in Computer Vision
Scalable Multi-Label Annotation Jia Deng Olga Russakovsky Jonathan Krause, Michael Bernstein Alexander Berg Li Fei-Fei.
Lecture 6: Classification & Localization
Classification spotlights
Limin Wang, Yu Qiao, and Xiaoou Tang
ImageNet Classification with Deep Convolutional Neural Networks
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Large-Scale Object Recognition with Weak Supervision
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Spatial Pyramid Pooling in Deep Convolutional
Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and results –Mark Everingham (Oxford) 2.40pm Session 1: The.
Fine-Grained Visual Identification using Deep and Shallow Strategies Andréia Marini Adviser: Alessandro L. Koerich Postgraduate Program in Computer Science.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Detection, Segmentation and Fine-grained Localization
Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.
Object detection, deep learning, and R-CNNs
Deep Convolutional Nets
Neural networks in modern image processing Petra Budíková DISA seminar,
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Feedforward semantic segmentation with zoom-out features
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Spatial Localization and Detection
Introduction to Convolutional Neural Networks
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.
Recent developments in object detection
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
The Relationship between Deep Learning and Brain Function
Object detection with deformable part-based models
Data Mining, Neural Network and Genetic Programming
From Vision to Grasping: Adapting Visual Networks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Learning Mid-Level Features For Recognition
Lecture 24: Convolutional neural networks
YOLO9000:Better, Faster, Stronger
ECE 6504 Deep Learning for Perception
Training Techniques for Deep Neural Networks
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT
Deep Belief Networks Psychology 209 February 22, 2013.
Machine Learning: The Connectionist
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Finding Clusters within a Class to Improve Classification Accuracy
Deep Face Recognition Omkar M. Parkhi Andrea Vedaldi Andrew Zisserman
Introduction to Neural Networks
Image Classification.
Very Deep Convolutional Networks for Large-Scale Image Recognition
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Neural network training
Lecture: Deep Convolutional Neural Networks
Outline Background Motivation Proposed Model Experimental Results
Object Tracking: Comparison of
Heterogeneous convolutional neural networks for visual recognition
Course Recap and What’s Next?
Natalie Lang Tomer Malach
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Semantic Segmentation
Week 3 Volodymyr Bobyr.
Presentation transcript:

Analysis of Large Scale Visual Recognition Fei-Fei Li and Olga Russakovsky Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis Refernce to paper, photos, vision-lab, stanford logos

Backpack Want to recognize all objects in the world

Strawberry Flute Traffic light Backpack Matchstick Bathing cap Sea lion Want to recognize all objects in the world Racket

Large-scale recognition Want to recognize all objects in the world

Large-scale recognition Need benchmark datasets Want to recognize all objects in the world

Classification: person, motorcycle Action: riding bicycle PASCAL VOC 2005-2012 20 object classes 22,591 images Classification: person, motorcycle Detection Segmentation Person Motorcycle Action: riding bicycle Everingham, Van Gool, Williams, Winn and Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV 2010.

Large Scale Visual Recognition Challenge (ILSVRC) 2010-2012 20 object classes 22,591 images 1000 object classes 1,431,167 images Dalmatian http://image-net.org/challenges/LSVRC/{2010,2011,2012}

Variety of object classes in ILSVRC

Variety of object classes in ILSVRC

ILSVRC Task 1: Classification Steel drum At that time was infeasible to label with all objects present

ILSVRC Task 1: Classification Steel drum Output: Scale T-shirt Steel drum Drumstick Mud turtle Output: Scale T-shirt Giant panda Drumstick Mud turtle ✔ ✗

ILSVRC Task 1: Classification Steel drum Output: Scale T-shirt Steel drum Drumstick Mud turtle Output: Scale T-shirt Giant panda Drumstick Mud turtle ✔ ✗ Σ Accuracy = 1 100,000 1[correct on image i] 100,000 images

ILSVRC Task 1: Classification 2010 0.72 2011 0.74 # Submissions 2012 0.85 Accuracy (5 predictions/image)

ILSVRC Task 2: Classification + Localization Steel drum

ILSVRC Task 2: Classification + Localization Steel drum ✔ Folding chair Persian cat Loud speaker Steel drum Picket fence Output

ILSVRC Task 2: Classification + Localization Steel drum ✔ Folding chair Persian cat Loud speaker Steel drum Picket fence Output ✗ Folding chair Persian cat Loud speaker Steel drum Picket fence Output (bad localization) ✗ Folding chair Persian cat Loud speaker Picket fence King penguin Output (bad classification)

ILSVRC Task 2: Classification + Localization Steel drum ✔ Folding chair Persian cat Loud speaker Steel drum Picket fence Output Σ Accuracy = 1 100,000 1[correct on image i] 100,000 images

ILSVRC Task 2: Classification + Localization (5 predictions) Accuracy ISI SuperVision OXFORD_VGG

What happens under the hood? In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out

What happens under the hood on classification+localization? In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out

What happens under the hood on classification+localization? In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset Leading algorithms What happens under the hood on classification+localization? In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset Leading algorithms What happens under the hood on classification+localization? A closer look at small objects A closer look at textured objects In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset Leading algorithms What happens under the hood on classification+localization? A closer look at small objects A closer look at textured objects In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

ILSVRC (2012) 1000 object classes Easy to localize Hard to localize In this challenge, we focused on 1000 object classes

500 classes with smallest objects ILSVRC-500 (2012) 500 classes with smallest objects Easy to localize Hard to localize In this challenge, we focused on 1000 object classes

ILSVRC-500 (2012) ILSVRC-500 (2012) 500 object categories 25.3% 500 classes with smallest objects Easy to localize Hard to localize In this challenge, we focused on 1000 object classes Object scale (fraction of image area occupied by target object) ILSVRC-500 (2012) 500 object categories 25.3% PASCAL VOC (2012) 20 object categories 25.2%

Chance Performance of Localization Steel drum B3,4,5,6,… B1 B1 B2 B8 B9 B6 B4 B5 B3 B7 B2 N = 9 here For a single category and the set of corresponding images, can compute as follows. Correlated with object scale, but also takes into account the distribution of locations

Chance Performance of Localization Steel drum B3,4,5,6,… B1 B1 B2 B8 B9 B6 B4 B5 B3 B7 B2 N = 9 here For a single category and the set of corresponding images, can compute as follows. Correlated with object scale, but also takes into account the distribution of locations

Chance Performance of Localization Steel drum B3,4,5,6,… B1 B1 B2 B8 B9 B6 B4 B5 B3 B7 B2 N = 9 here For a single category and the set of corresponding images, can compute as follows. Correlated with object scale, but also takes into account the distribution of locations ILSVRC-500 (2012) 500 object categories 8.4% PASCAL VOC (2012) 20 object categories 8.8%

Level of clutter Steel drum - Generate candidate object regions using method of Selective Search for Object Detection vanDeSande et al. ICCV 2011 - Filter out regions inside object - Count regions For a single category and the set of corresponding images, can compute as follows. Correlated with object scale, but also takes into account the distribution of locations

Level of clutter ILSVRC-500 (2012) 500 object categories 128 ± 35 Steel drum - Generate candidate object regions using method of Selective Search for Object Detection vanDeSande et al. ICCV 2011 - Filter out regions inside object - Count regions For a single category and the set of corresponding images, can compute as follows. Correlated with object scale, but also takes into account the distribution of locations ILSVRC-500 (2012) 500 object categories 128 ± 35 PASCAL VOC (2012) 20 object categories 130 ± 29

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms What happens under the hood on classification+localization? A closer look at small objects A closer look at textured objects In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

SuperVision (SV) Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (Krizhevsky NIPS12) Image classification: Deep convolutional neural networks 7 hidden “weight” layers, 650K neurons, 60M parameters, 630M connections Rectified Linear Units, max pooling, dropout trick Randomly extracted 224x224 patches for more data Trained with SGD on two GPUs for a week, fully supervised Localization: Regression on (x,y,w,h) Very impressive results given poor localization http://image-net.org/challenges/LSVRC/2012/supervision.pdf

SuperVision (SV) Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (Krizhevsky NIPS12) Image classification: Deep convolutional neural networks 7 hidden “weight” layers, 650K neurons, 60M parameters, 630M connections Rectified Linear Units, max pooling, dropout trick Randomly extracted 224x224 patches for more data Trained with SGD on two GPUs for a week, fully supervised Localization: Regression on (x,y,w,h) Very impressive results given poor localization http://image-net.org/challenges/LSVRC/2012/supervision.pdf

OXFORD_VGG (VGG) Karen Simonyan, Yusuf Aytar, Andrea Vedaldi, Andrew Zisserman Image classification: Fisher vector + linear SVM (Sanchez CVPR11) Root-SIFT (Arandjelovic CVPR12), color statistics, augmentation with patch location (x,y) (Sanchez PRL12) Fisher vectors: 1024 Gaussians, 135K dimensions No SPM, product quantization to compress Semi-supervised learning to find additional bounding boxes 1000 one-vs-rest SVM trained with Pegasos SGD 135M parameters! Localization: Deformable part-based models (Felzenszwalb PAMI10), without parts (root-only) For a single category and the set of corresponding images, can compute as follows. Correlated with object scale, but also takes into account the distribution of locations http://image-net.org/challenges/LSVRC/2012/oxford_vgg.pdf

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG What happens under the hood on classification+localization? A closer look at small objects A closer look at textured objects In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

Results on ILSVRC-500 54.3% 45.8% Cls+loc accuracy SV VGG

Difference in accuracy: SV versus VGG Classification-only ✔ Folding chair Persian cat Loud speaker Steel drum Picket fence p < 0.001

Difference in accuracy: SV versus VGG Classification-only Cls. Accuracy: SV - VGG p < 0.001 Object scale

Difference in accuracy: SV versus VGG Classification-only SV better (452 classes) Cls. Accuracy: SV - VGG VGG better (34 classes) p < 0.001 Object scale

Difference in accuracy: SV versus VGG SV beats VGG Classification-only SV better (452 classes) *** *** *** Cls. Accuracy: SV - VGG *** *** *** VGG beats SV VGG better (34 classes) p < 0.001 Object scale *

Difference in accuracy: SV versus VGG Classification-only Classification+Localiation SV better (452 classes) SV better (338 classes) Cls+Loc Accuracy: SV - VGG Cls. Accuracy: SV - VGG VGG better (150 classes) VGG better (34 classes) p < 0.001 Object scale Object scale

Cumulative accuracy across scales Classification+Localization Classification-only SV SV VGG VGG Cumulative cls. accuracy Cumulative cls+loc accuracy Object scale Object scale

Cumulative accuracy across scales Classification+Localization Classification-only SV SV VGG VGG Cumulative cls. accuracy Cumulative cls+loc accuracy 40 vs 41% vgg vs sv for boxes SV not using sliding windows!! Object scale 205 smallest object classes 0.24 Object scale

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG What happens under the hood on classification+localization? SV always great at classification, but VGG does better than SV at localizing small objects A closer look at textured objects In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG What happens under the hood on classification+localization? SV always great at classification, but VGG does better than SV at localizing small objects A closer look at textured objects In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out WHY? Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG What happens under the hood on classification+localization? SV always great at classification, but VGG does better than SV at localizing small objects A closer look at textured objects In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

Textured objects (ILSVRC-500) Amount of texture Low High 156 natural, 344 man-made

Textured objects (ILSVRC-500) Amount of texture Low High 156 natural, 344 man-made No texture Low texture Medium texture High texture # classes 116 189 143 52

Textured objects (ILSVRC-500) Amount of texture Low High 156 natural, 344 man-made No texture Low texture Medium texture High texture # classes 116 189 143 52 Object scale 20.8% 23.7% 23.5% 25.0%

Textured objects (416 classes) Amount of texture Low High 156 natural, 344 man-made No texture Low texture Medium texture High texture # classes 116 189 149 143 115 52 35 Object scale 20.8% 23.7% 20.8% 23.5% 20.8% 25.0% 20.8%

Localizing textured objects (416 classes, same average object scale at each level of texture) SV VGG Localization accuracy 156 natural, 344 man-made Level of texture

Localizing textured objects (416 classes, same average object scale at each level of texture) SV VGG On correctly classified images Localization accuracy 156 natural, 344 man-made Level of texture

Localizing textured objects (416 classes, same average object scale at each level of texture) SV VGG On correctly classified images Localization accuracy 156 natural, 344 man-made Level of texture

What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG What happens under the hood on classification+localization? SV always great at classification, but VGG does better than SV at localizing small objects Textured objects easier to localize, especially for SV In order to do this, we need a couple of things first. We begin by introducing a subset of 500 classes of ILSVRC which is suitable for studying localization. We show that is has many of the same challenges as the PASCAL VOC dataset. We’ll use these 500 classes for our analysis. Then we’ll briefly discuss some of the winning algorithms of the 2012 challenge. Given these, we will point out Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http://image-net.org/challenges/LSVRC/2012/analysis

ILSVRC 2013 with large-scale object detection NEW Fully annotated 200 object classes across 60,000 images Person Car Motorcycle Helmet In this challenge, we focused on 1000 object classes Allows evaluation of generic object detection in cluttered scenes at scale http://image-net.org/challenges/LSVRC/2013/ 3x

ILSVRC 2013 with large-scale object detection NEW Statistics PASCAL VOC 2012 ILSVRC 2013 Object classes 20 200 Training Images 5.7K 395K Objects 13.6K 345K Validation 5.8K 20.1K 13.8K 55.5K Testing 11.0K 40.1K --- 10x 25x 4x In this challenge, we focused on 1000 object classes More than 50,000 person instances annotated http://image-net.org/challenges/LSVRC/2013/ 3x

ILSVRC 2013 with large-scale object detection NEW 159 downloads so far: http://image-net.org/challenges/LSVRC/2013/ Submission deadline Nov. 15th ICCV workshop on December 7th, 2013 Fine-Grained Challenge 2013: https://sites.google.com/site/fgcomp2013/ Google, stanford, princeton, 3x

Thank you! Prof. Alex Berg UNC Chapel Hill Dr. Jia Deng Stanford U. Zhiheng Huang Stanford U. Jonathan Krause Stanford U. Sanjeev Satheesh Stanford U. Hao Su Stanford U.