Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and results –Mark Everingham (Oxford) 2.40pm Session 1: The.

Slides:



Advertisements
Similar presentations
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance Dhruv Batra, Carnegie Mellon University Adarsh Kowdle, Cornell.
Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Large-Scale Object Recognition with Weak Supervision
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Recognition: A machine learning approach
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.
1Ellen L. Walker Segmentation Separating “content” from background Separating image into parts corresponding to “real” objects Complete segmentation Each.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A Study of Approaches for Object Recognition
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.
Object Detection using Histograms of Oriented Gradients
Scale Invariant Feature Transform (SIFT)
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Object Detection Using the Statistics of Parts Henry Schneiderman Takeo Kanade Presented by : Sameer Shirdhonkar December 11, 2003.
Learning Spatial Context: Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS Stuff (n): Material defined by a homogeneous or.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
Generic object detection with deformable part-based models
Exercise Session 10 – Image Categorization
Bag-of-Words based Image Classification Joost van de Weijer.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Detection, Segmentation and Fine-grained Localization
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Object Recognition in Images Slides originally created by Bernd Heisele.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.
Object detection, deep learning, and R-CNNs
Histograms of Oriented Gradients for Human Detection(HOG)
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Improved Object Detection
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.
A New Method for Crater Detection Heather Dunlop November 2, 2006.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
More sliding window detection: Discriminative part-based models
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Recent developments in object detection
Evaluating Classifiers
Object Detection based on Segment Masks
Object detection with deformable part-based models
ISBI Camelyon16 Challenge Prague, April 13, 2016
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
A Tutorial on HOG Human Detection
Finding Clusters within a Class to Improve Classification Accuracy
“The Truth About Cats And Dogs”
Brief Review of Recognition + Context
On-going research on Object Detection *Some modification after seminar
Mentor: Salman Khokhar
The topic discovery models
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Semantic Segmentation
An introduction to Machine Learning (ML)
Presentation transcript:

Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and results –Mark Everingham (Oxford) 2.40pm Session 1: The Classification Task –Frederic Jurie presenting work by Jianguo Zhang (INRIA) 20 mins Frederic Jurie (INRIA) 20 mins –Thomas Deselaers (Aachen) 20 mins –Jason Farquhar (Southampton) 20 mins pm Coffee break 4.30pm Session 2: The Detection Task –Stefan Duffner/Christophe Garcia (France Telecom) 30 mins –Mario Fritz (Darmstadt) 30 mins 5.30pm Discussion –Lessons learnt, and future challenges

The PASCAL Visual Object Classes Challenge Mark Everingham Luc Van Gool Chris Williams Andrew Zisserman

Challenge Four object classes –Motorbikes –Bicycles –People –Cars Classification –Predict object present/absent Detection –Predict bounding boxes of objects

Competitions Train on any (non-test) data –How well do state-of-the-art methods perform on these problems? –Which methods perform best? Train on supplied data –Which methods perform best given specified training data?

Data sets train, val, test1 –Sampled from the same distribution of images –Images taken from PASCAL image databases –“Easier” challenge test2 –Freshly collected for the challenge (mostly Google Images) –“Harder” challenge

Training and first test set ClassImagesObjects Motorbikes Bicycles People84152 Cars Total684 ClassImagesObjects Motorbikes Bicycles People84149 Cars Total689 train+valtest1

Example images

Second test set ClassImagesObjects Motorbikes Bicycles People Cars Total1282 test2

Example images

Annotation for training Object class present/absent Sub-class labels (partial) –Car side, Car rear, etc. Bounding boxes Segmentation masks (partial)

Issues in ground truth What objects should be considered detectable? –Subjective judgement by size in image, level of occlusion, detection without ‘inference’ Disagreements will cause noise in evaluation i.e. incorrectly- judged false positives “Errors” in training data –Un-annotated objects Requires machine learning algorithms robust to noise on class labels –Inaccurate bounding boxes Hard to specify for some instances e.g. bicycles Detection threshold was set “liberally”

Results: Classification

Participants test1test2 ParticipantMotorbikesBicyclesPeopleCarsMotorbikesBicyclesPeopleCars Aachen  Darmstadt  Edinburgh  FranceTelecom  HUT  INRIA: dalal  INRIA: dorko  INRIA: jurie  INRIA: zhang  METU  MPITuebingen  Southampton 

Methods Interest points (LoG/Harris) + patches/SIFT –Histogram of clustered descriptors SVM: INRIA: Dalal, INRIA: Zhang Log-linear model: Aachen Logistic regression: Edinburgh Other: METU –No clustering step SVM with other kernels: MPITuebingen, Southampton –Additional features Color: METU, moments: Southampton

Methods Image segmentation and region features: HUT –MPEG-7 color, shape, etc. –Self organizing map Classification by detection: Darmstadt –Generalized Hough transform/SVM verification

Evaluation Receiver Operating Characteristic (ROC) –Equal Error Rate (EER) –Area Under Curve (AUC) EER AUC

Competition 1: train+val/test1 1.1: Motorbikes Max EER: (INRIA: Jurie)

Competition 1: train+val/test1 1.2: Bicycles Max EER: (INRIA: Jurie, INRIA: Zhang)

Competition 1: train+val/test1 1.3: People Max EER: (INRIA: Jurie, INRIA: Zhang)

Competition 1: train+val/test1 1.4: Cars Max EER: (INRIA: Jurie)

Competition 2: train+val/test2 2.1: Motorbikes Max EER: (INRIA: Zhang)

Competition 2: train+val/test2 2.2: Bicycles Max EER: (INRIA: Zhang)

Competition 2: train+val/test2 2.3: People Max EER: (INRIA: Zhang)

Competition 2: train+val/test2 2.4: Cars Max EER: (INRIA: Zhang)

Classes and test1 vs. test2 Mean EER of ‘best’ results across classes –test1 : 0.946, test2 : 0.741

Conclusions? Interest points + SIFT + clustering (histogram) + SVM did ‘best’ –Log-linear model (Aachen) a close second –Results with SVM (INRIA) significantly better than with logistic regression (Edinburgh) Method using detection (Darmstadt) did not do so well –Cannot exploit context (= unintended bias?) of image –Used subset of training data and is able to localize

Competitions 3 & 4 Classification Any (non-test) training data to be used No entries submitted

Results: Detection

Participants test1test2 ParticipantMotorbikesBicyclesPeopleCarsMotorbikesBicyclesPeopleCars Aachen  Darmstadt  Edinburgh  FranceTelecom  HUT  INRIA: dalal  INRIA: dorko  INRIA: jurie  INRIA: zhang  METU  MPITuebingen  Southampton 

Methods Generalized Hough Transform –Interest points, clustered patches/descriptors, GHT Darmstadt: (SVM verification stage), side views with segmentation mask used for training INRIA: Dorko: SIFT features, semi-supervised clustering, single detection per image “Sliding window” classifiers –Exhaustive search over translation and scale FranceTelecom: Convolutional neural network INRIA: Dalal: SVM with SIFT-based input representation

Methods Baselines: Edinburgh –Detection confidence class prior probability Whole-image classifier (SIFT + logistic regression) –Bounding box Entire image Scale-normalized mean bounding box from training data Bounding box of all interest points Bounding box of interest points weighted by ‘class purity’

Evaluation Correct detection: 50% overlap in bounding boxes –Multiple detections considered as (one true + ) false positives Precision/Recall –Average Precision (AP) as defined by TREC Mean precision interpolated at recall = 0,0.1,…,0.9,1 Measured Interpolated

Competition 5: train+val/test1 5.1: Motorbikes Max AP: (Darmstadt)

Competition 5: train+val/test1 5.2: Bicycles Max AP: (Edinburgh)

Competition 5: train+val/test1 5.3: People Max AP: (INRIA: Dalal)

Competition 5: train+val/test1 5.4: Cars Max AP: (INRIA: Dalal)

Competition 6: train+val/test2 6.1: Motorbikes Max AP: (Darmstadt)

Competition 6: train+val/test2 6.2: Bicycles Max AP: (Edinburgh)

Competition 6: train+val/test2 6.3: People Max AP: (INRIA: Dalal)

Competition 6: train+val/test2 6.4: Cars Max AP: (INRIA: Dalal)

Classes and test1 vs. test2 Mean AP of ‘best’ results across classes –test1 : 0.408, test2 : 0.195

Conclusions? GHT (Darmstadt) method did ‘best’ on classes entered –SVM verification stage effective –Limited to lower recall (by use of only side views) SVM (INRIA: Dalal) comparable for cars, better on test2 –Smaller objects?, higher recall Performance on bicycles, people was ‘poor’ –“Non-solid” objects, articulation?

Competition 7: any train/ test1 One entry: 7.3: people (INRIA: Dalal) AP: Use of own training data improved results dramatically (AP: 0.013)

Competition 8: any train/ test2 One entry: 8.3: people (INRIA: Dalal) AP: Use of own training data improved results dramatically (AP: 0.021)

Conclusions Classification –Variety of methods and variations on SIFT+SVM –Encouraging performance on all object classes Detection –Variety of methods and variations on GHT –Encouraging performance on cars, motorbikes People and bicycles more challenging Use of own training data –Only one entry (people detection), much better results than using provided training data –State-of-the-art performance for pre-built classification/detection remains to be assessed