Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification spotlights

Similar presentations

Presentation on theme: "Classification spotlights"— Presentation transcript:

1 Classification spotlights
Large Scale Visual Recognition Challenge (ILSVRC) 2013: Classification spotlights

2 Additions to the ConvNet Image Classification Pipeline Andrew Howard – Andrew Howard Consulting
Changes to Training: Use more pixels: Train on square patches from rectangular image instead of cropped central square Additional color manipulation of contrast, brightness, color balance used on training patches Use Patches From: Instead of Patches From: Changes to Testing: Make Predictions at different scales and different views which use all pixels Previous: Used 10 predictions (2 flips * 5 translations) This Submission: Used 90 predictions (2 flips * 5 translations * 3 scales * 3 views) The number of predictions can be reduced with no loss of accuracy with stagewise regression Training neural networks can be quite time consuming, so the focus was on simple to test and simple to implement ideas to improve the convolutional neural network based image classification pipeline. Models perform better with more data so we added more training image patches by using all of the pixels from the image rather than only selecting training patches from the cropped central square. We also added additional color manipulations to these training image patches. At test time, we make predictions over multiple scales and multiple views of the image in order to generate diverse predictions and improve the overall combined prediction. Additionally, we build models on higher resolution images which can be quickly trained by fine tuning previously trained models. The final system achieves a 13.6% top five error rate using 5 base models and 5 high resolution models and these new additions to the pipeline. View 1: View 2: View 3: Higher Resolution Models: Use a fully trained model and fine tune on image patches from a higher resolution image This can be trained in about 1/3 the number of epochs Predictions on higher resolution images give complimentary predictions to the base model Final Vision System achieves 13.6% error and is made of 5 base models and 5 higher resolution models Structure is the same as last year with fully connected layers twice as large, which doesn’t add much value

3 CognitiveVision team Cognitive Psychology Inspired Image Classification using Deep Neural Network Kuiyuan Yang, Microsoft Research Yalong Bai, Harbin Institute of Technology Yong Rui, Microsoft Research

4 Our Classification Scheme
CognitiveVision team Given a image, predict its basic category firstly. Basic Category Classification Dog Cat Easy to distinguish Predict sub category Dog Classification Cat Classification dalmatian French bulldog Egyptian cat tiger cat Maltese dog English setter Siamese cat

5 Publicly available at
Caffe: Open-Sourcing Deep Learning Yangqing Jia, Trevor Darrell, UC Berkeley Convolutional Architecture for Fast Feature Extraction Seamless switching between CPU and GPU Fast computation (2.5ms / image with GPU) Full training and testing capability Reference ImageNet model available A framework to support multiple applications: Your next Application! Classification Embedding Detection Publicly available at

6 Experiments for large scale visual recognition
+ We tried: Deep CNN (following Krizhevsky et al’12) Low level features &spatial granularities Where did we fail? top 1 acc = 0.567 Appliance and instrument are confusing for us, including - TV vs. Screen, - Coffee mug vs. Cup, - Flute vs. Microphone, - … Television (0.18) Hair spray (0.18) Coffee mug (0.10) Flute (0.10)

7 Agenda 8:30 Classification&localization 10:30 Detection
Noon Discussion panel 14:00 Invited talk by Vittorio Ferrari: Auto-annotation and self-assessment in ImageNet 14:40 Fine-Grained Challenge 2013 8:50 9:20 9:35 9:50 Spotlights 9:05 10:50 11:10 11:30 Spotlights 11:40

Download ppt "Classification spotlights"

Similar presentations

Ads by Google