Large-Scale Object Recognition with Weak Supervision

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

CVPR2013 Poster Modeling Actions through State Changes.
Lecture 6: Classification & Localization
Limin Wang, Yu Qiao, and Xiaoou Tang
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Patch to the Future: Unsupervised Visual Prediction
DeepID-Net: deformable deep convolutional neural network for generic object detection Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
R-CNN By Zhang Liliang.
DeeperVision and DeepInsight Solutions
Spatial Pyramid Pooling in Deep Convolutional
On the Object Proposal Presented by Yao Lu
From R-CNN to Fast R-CNN
Kuan-Chuan Peng Tsuhan Chen
Detection, Segmentation and Fine-grained Localization
Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Towards Open World Recognition Abhijit Bendale, Terrance Boult University of Colorado of Colorado Springs Poster no 85.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Object detection, deep learning, and R-CNNs
Hierarchical Matching with Side Information for Image Classification
Deep Convolutional Nets
Recognition Using Visual Phrases
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Feedforward semantic segmentation with zoom-out features
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Unsupervised Visual Representation Learning by Context Prediction
Cascade Region Regression for Robust Object Detection
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
Spatial Localization and Detection
Deep Residual Learning for Image Recognition
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Deep Residual Learning for Image Recognition
The Relationship between Deep Learning and Brain Function
Object Detection based on Segment Masks
Object detection with deformable part-based models
Data Driven Attributes for Action Detection
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Nonparametric Semantic Segmentation
Huazhong University of Science and Technology
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Object detection.
Computer Vision James Hays
Introduction to Neural Networks
Image Classification.
Object Detection + Deep Learning
On-going research on Object Detection *Some modification after seminar
CornerNet: Detecting Objects as Paired Keypoints
Outline Background Motivation Proposed Model Experimental Results
RCNN, Fast-RCNN, Faster-RCNN
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
Human-object interaction
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Semantic Segmentation
Jiahe Li
Presentation transcript:

Large-Scale Object Recognition with Weak Supervision Weiqiang Ren, Chong Wang, Yanhua Cheng, Kaiqi Huang, Tieniu Tan {wqren,cwang,yhcheng,kqhuang,tnt}@nlpr.ia.ac.cn

Task2 : Classification + Localization Task 2b: Classification + localization with additional training data — Ordered by classification error Only classification labels are used Full image as object location

Outline Motivation Method Results

Motivation

Why Weakly Supervised Localization (WSL)? Knowing where to look, recognizing objects will be easier ! However, in the classification-only task, no annotations of object location are available. Weakly Supervised Localization

Current WSL Results on VOC07

13.9: Weakly supervised object detector learning with model drift detection, ICCV 2011 15.0: Object-centric spatial pooling for image classification, ECCV 2012 22.4: Multi-fold mil training for weakly supervised object localization, CVPR 2014 22.7: On learning to localize objects with minimal supervision, ICML 2014 26.2: Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision, submitted to TPAMI 26.4: Weakly supervised object detection with posterior regularization, BMVC 2014 31.6: Weakly supervised object localization with latent category learning, ECCV 2014 Sep 11, Poster Session 4A, #34

Our Work VOC 2007 Results Ours 31.6 DPM 5.0 33.7 VOC 2007 Results Ours 26.2 DPM 5.0 33.7 Weakly Supervised Object Localization with Latent Category Learning Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision ECCV 2014 Submitted to TPAMI For the consideration of high efficiency in large-scale tasks, we use the second one.

Method

… Framework 2 3 4 1 Det Prediction Rescoring Cls Prediction Conv Layers Input Images 1 FC Layers

1st : CNN Architecture Chatfield et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets

2nd: MILinear SVM

MILinear : Region Proposal Good region proposal algorithms High recall High overlap Small number Low computation cost MCG pretrained on VOC 2012 Additional Data Training: 128 windows/ image Testing: 256 windows/image Compared to Selective Search (~2000)

MILinear: Feature Representations Low Level Features SIFT, LBP, HOG Shape context, Gabor, … Mid-Level Features Bag of Visual Words (BoVW) Deep Hierarchical Features Convolutional Networks Deep Auto-Encoders Deep Belief Nets

MILinear: Positive Window Mining Clustering KMeans Topic Model pLSA, LDA, gLDA CRF Multiple Instance Learning DD, EMDD, APR MI-NN, MI-SVM, mi-SVM MILBoost

MILinear: Objective Function and Optimization Multiple instance Linear SVM Optimization: trust region Newton A kind of Quasi Newton method Working in the primal Faster convergence

MILinear: Optimization Efficiency

3rd: Detection Rescoring Rescoring with softmax train softmax max … … 128 boxes …… …… 1000 dim 1000 dim 1000 classes Softmax: consider all the categories simultaneously  at each minibatch of the optimization – Suppress the response of other appearance similar object categories

4th: Classification Rescoring Linear Combination … … … 1000 dim 1000 dim 1000 dim One funny thing: We have tried some other strategies of score combination, but it seems not working !

Results

1st: Classification without WSL Method Top 5 Error Baseline with one CNN : 13.7 Average with four CNNs: 12.5

2nd: MILinear on ImageNet 2014 Methods Detection Error Baseline (Full Image) 61.96 MILinear 40.96 Winner 25.3

2nd: MILinear on VOC 2007

2nd: MILinear on ILSVRC 2013 detection mAP: 9.63%! vs 8.99% (DPM5.0)

2nd: MILinear for Classification Methods Top 5 Error Milinear 17.1

3rd: WSL Rescoring (Softmax) Method Top 5 Error Baseline with one CNN : 13.7 Average with four CNN : 12.5 MILinear 17.1 MILinear + Rescore 13.5 The Softmax based rescoring successfully suppresses the predictions of other appearance similar object categories !

4th: Cls and WSL Combinataion Method Top 5 Error Baseline with one CNN model: 13.7 Average with four CNN models: 12.5 MILinear 17.1 MILinear + Rescore 13.5 Cls (12.5) + MILinear (13.5) 11.5 WSL and Cls can be complementary to each other!

Russakovsky et al. ImageNet Large Scale Visual Object Challenge.

Conclusion WSL always helps classification WSL has large potential: WSL data is cheap

Thank You!