Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Large Scale Visual Recognition Fei-Fei Li and Olga Russakovsky Refernce to paper, photos, vision-lab, stanford logos Olga Russakovsky, Jia.

Similar presentations


Presentation on theme: "Analysis of Large Scale Visual Recognition Fei-Fei Li and Olga Russakovsky Refernce to paper, photos, vision-lab, stanford logos Olga Russakovsky, Jia."— Presentation transcript:

1 Analysis of Large Scale Visual Recognition Fei-Fei Li and Olga Russakovsky Refernce to paper, photos, vision-lab, stanford logos Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

2 Backpack

3 Flute Strawberry Traffic light Bathing cap Matchstick Racket Sea lion

4 Large-scale recognition

5 Need benchmark datasets

6 PASCAL VOC Classification: person, motorcycle Detection Segmentation Person Motorcycle Action: riding bicycle Everingham, Van Gool, Williams, Winn and Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV object classes22,591 images

7 Large Scale Visual Recognition Challenge (ILSVRC) object classes22,591 images 1000 object classes1,431,167 images Dalmatian

8 Variety of object classes in ILSVRC

9

10 ILSVRC Task 1: Classification Steel drum

11 ILSVRC Task 1: Classification Output: Scale T-shirt Steel drum Drumstick Mud turtle Steel drum Output: Scale T-shirt Giant panda Drumstick Mud turtle

12 ILSVRC Task 1: Classification Output: Scale T-shirt Steel drum Drumstick Mud turtle Steel drum Accuracy = Output: Scale T-shirt Giant panda Drumstick Mud turtle Σ 100,000 images 1[correct on image i] 1 100,000

13 ILSVRC Task 1: Classification Accuracy (5 predictions/image) # Submissions

14 ILSVRC Task 2: Classification + Localization Steel drum

15 Foldin g chair Persian cat Loud speaker Steel drum Picket fence Output Steel drum ILSVRC Task 2: Classification + Localization

16 Foldin g chair Persian cat Loud speaker Steel drum Picket fence Output Foldin g chair Persian cat Loud speaker Steel drum Picket fence Output (bad localization) Foldin g chair Persian cat Loud speaker Picket fence King penguin Output (bad classification) Steel drum ILSVRC Task 2: Classification + Localization

17 Foldin g chair Persian cat Loud speaker Steel drum Picket fence Output Steel drum ILSVRC Task 2: Classification + Localization Accuracy = Σ 100,000 images 1[correct on image i] 1 100,000

18 ILSVRC Task 2: Classification + Localization ISI OXFORD_VGG SuperVision Accuracy (5 predictions)

19 What happens under the hood?

20 What happens under the hood on classification+localization?

21 What happens under the hood on classification+localization? Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

22 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset Leading algorithms Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

23 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset Leading algorithms A closer look at small objects A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

24 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset Leading algorithms A closer look at small objects A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

25 Easy to localize Hard to localize 1000 object classes ILSVRC (2012)

26 Easy to localize Hard to localize 500 classes with smallest objects ILSVRC-500 (2012)

27 Easy to localize Hard to localize ILSVRC-500 (2012)500 object categories25.3% PASCAL VOC (2012)20 object categories25.2% Object scale (fraction of image area occupied by target object) ILSVRC-500 (2012) 500 classes with smallest objects

28 Chance Performance of Localization B1B1 B2B2 B 3,4,5,6,… Steel drum B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B8B8 B9B9 N = 9 here

29 Chance Performance of Localization B1B1 B2B2 B 3,4,5,6,… Steel drum B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B8B8 B9B9 N = 9 here

30 Chance Performance of Localization B1B1 B2B2 B 3,4,5,6,… Steel drum ILSVRC-500 (2012)500 object categories8.4% PASCAL VOC (2012)20 object categories8.8% B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B8B8 B9B9 N = 9 here

31 Level of clutter Steel drum - Generate candidate object regions using method of Selective Search for Object Detection vanDeSande et al. ICCV Filter out regions inside object - Count regions

32 Level of clutter Steel drum - Generate candidate object regions using method of Selective Search for Object Detection vanDeSande et al. ICCV Filter out regions inside object - Count regions ILSVRC-500 (2012)500 object categories128 ± 35 PASCAL VOC (2012)20 object categories130 ± 29

33 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms A closer look at small objects A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

34 SuperVision (SV) Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (Krizhevsky NIPS12) Image classification: Deep convolutional neural networks 7 hidden weight layers, 650K neurons, 60M parameters, 630M connections Rectified Linear Units, max pooling, dropout trick Randomly extracted 224x224 patches for more data Trained with SGD on two GPUs for a week, fully supervised Localization: Regression on (x,y,w,h)

35 SuperVision (SV) Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (Krizhevsky NIPS12) Image classification: Deep convolutional neural networks 7 hidden weight layers, 650K neurons, 60M parameters, 630M connections Rectified Linear Units, max pooling, dropout trick Randomly extracted 224x224 patches for more data Trained with SGD on two GPUs for a week, fully supervised Localization: Regression on (x,y,w,h)

36 OXFORD_VGG (VGG) Karen Simonyan, Yusuf Aytar, Andrea Vedaldi, Andrew Zisserman Image classification: Fisher vector + linear SVM (Sanchez CVPR11) Root-SIFT (Arandjelovic CVPR12), color statistics, augmentation with patch location (x,y) (Sanchez PRL12) Fisher vectors: 1024 Gaussians, 135K dimensions No SPM, product quantization to compress Semi-supervised learning to find additional bounding boxes 1000 one-vs-rest SVM trained with Pegasos SGD 135M parameters! Localization: Deformable part-based models (Felzenszwalb PAMI10), without parts (root-only)

37 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG A closer look at small objects A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

38 SVVGG Cls+loc accuracy 54.3% 45.8% Results on ILSVRC-500

39 p < Difference in accuracy: SV versus VGG Classification-only Foldin g chair Persian cat Loud speaker Steel drum Picket fence

40 Object scale p < Cls. Accuracy: SV - VGG Difference in accuracy: SV versus VGG Classification-only

41 SV better (452 classes) VGG better (34 classes) Object scale p < Cls. Accuracy: SV - VGG Difference in accuracy: SV versus VGG Classification-only

42 SV better (452 classes) VGG better (34 classes) Object scale p < Cls. Accuracy: SV - VGG Difference in accuracy: SV versus VGG Classification-only * *** SV beats VGG VGG beats SV

43 SV better (452 classes) VGG better (34 classes) Object scale p < Cls. Accuracy: SV - VGG Difference in accuracy: SV versus VGG Cls+Loc Accuracy: SV - VGG Object scale Classification-only VGG better (150 classes) SV better (338 classes) Classification+Localiation

44 Cumulative accuracy across scales SV VGG SV VGG Object scale Cumulative cls. accuracy Classification-only Classification+Localization Cumulative cls+loc accuracy Object scale

45 Cumulative accuracy across scales SV VGG SV Object scale Cumulative cls. accuracy Classification-only Classification+Localization Cumulative cls+loc accuracy Object scale smallest object classes VGG

46 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG SV always great at classification, but VGG does better than SV at localizing small objects A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

47 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG SV always great at classification, but VGG does better than SV at localizing small objects A closer look at textured objects WHY? Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

48 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG SV always great at classification, but VGG does better than SV at localizing small objects A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

49 Textured objects (ILSVRC-500) Amount of texture Low High

50 No textureLow textureMedium textureHigh texture # classes Textured objects (ILSVRC-500) Amount of texture Low High

51 No textureLow textureMedium textureHigh texture # classes Object scale20.8%23.7%23.5%25.0% Textured objects (ILSVRC-500) Amount of texture Low High

52 No textureLow textureMedium textureHigh texture # classes Object scale20.8%23.7% 20.8%23.5% 20.8%25.0% 20.8% Textured objects (416 classes) Amount of texture Low High

53 Localizing textured objects (416 classes, same average object scale at each level of texture) Localization accuracy Level of texture SVVGG

54 Level of texture Localization accuracy On correctly classified images SVVGG Localizing textured objects (416 classes, same average object scale at each level of texture)

55 Level of texture Localization accuracy On correctly classified images SVVGG Localizing textured objects (416 classes, same average object scale at each level of texture)

56 What happens under the hood on classification+localization? Preliminaries: ILSVRC-500 (2012) dataset – similar to PASCAL Leading algorithms: SV and VGG SV always great at classification, but VGG does better than SV at localizing small objects Textured objects easier to localize, especially for SV Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013

57 ILSVRC 2013 with large-scale object detection 3x Fully annotated 200 object classes across 60,000 images Allows evaluation of generic object detection in cluttered scenes at scale Person Car Motorcycle Helmet Person Car Motorcycle Helmet NEW

58 ILSVRC 2013 with large-scale object detection StatisticsPASCAL VOC 2012ILSVRC 2013 Object classes20200 Training Images5.7K395K Objects13.6K345K Validation Images5.8K20.1K Objects13.8K55.5K Testing Images11.0K40.1K Objects--- 4x 3x 10x 25x More than 50,000 person instances annotated NEW

59 159 downloads so far: Submission deadline Nov. 15 th ICCV workshop on December 7 th, 2013 Fine-Grained Challenge 2013: https://sites.google.com/site/fgcomp2013/ 3x ILSVRC 2013 with large-scale object detection NEW

60 Thank you! Prof. Alex Berg UNC Chapel Hill Jonathan Krause Stanford U. Sanjeev Satheesh Stanford U. Zhiheng Huang Stanford U. Dr. Jia Deng Stanford U. Hao Su Stanford U.


Download ppt "Analysis of Large Scale Visual Recognition Fei-Fei Li and Olga Russakovsky Refernce to paper, photos, vision-lab, stanford logos Olga Russakovsky, Jia."

Similar presentations


Ads by Google