Wenchi MA CV Group EECS,KU 03/20/2017

Wenchi MA CV Group EECS,KU 03/20/2017
Rethinking architectures of DCNN and object detection in scene recognition Wenchi MA CV Group EECS,KU 03/20/2017

Current work and related consideration
Task: Object Detection Algorithm: You Only Look Once(YOLO) Architecture: Googlenet based Parameters:　>=97M (relatively small) Techniques: Inception V3 (construction series) Efficient Grid Size Reduction (channels in parallel) Feature fusion by multi-resolution feature maps Problems: relatively high training loss and non-ideal mAP 3*3 Thinking: Structure of the model impacts the detection accuracy so much (Reasonable loss? Small gap between training loss and test loss? Is the model too large?(easy to overfit and make the model out of control) )! Keep searching for the balance between accuracy and model size. Standard for construction? Relationship between objects and scenes. Dose scene classification benefits objective detection?　Merge scene information in object detection.

Wide–Residual-Inception Networks for Real-time Object Detection
Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection Scale down the size of the model furtherly Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection inception Resnet Feature extractor Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection SSD WR-Inception network Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection Model Car Pedestrian Cyclist mAP mAR AP AR VGG-16 74 75 50 56 52 71 58 69 ResNet-101 76.04 74.82 47.74 56.07 53.61 75.26 58.9 70.06 WR-Inception 77.2 76.18 52.51 63.01 54.63 76.17 61.18 73.51 WR-Inception-12 78.24 80.24 51.08 64.29 59.28 63.03 75.14 Dataset: KITTI: obtained through stereo cameras and lidar scannerss in urban, rural, and highway driving environments, and has 10 categories in total, which are small cars, vans, trucks pedestrains, sitting people, cyclists, traims, miscellaneous, and “do not care” Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection Contribution: Propose the model that requires less memory and fewer computations but shows better performance Ensure the real-time performance of object detector Query: KITTI is still a relatively small dataset. And its categories are limited. The performance of this model should be tested on more common and large dataset like Imagenet.

Proper model for a specific dataset
Cambridge Proper model for a specific dataset The somewhat unanswered question in deep learning: Is the selected CNN optimal for the dataset in terms of accuracy and model size? There needs some certain standard, but base what? Given a pre-trained CNN for a specific dataset, refine the architecture in order to potentially increase the accuracy while possibly reducing the model size. Standard: the feature extraction ability of a CNN for a specific dataset. Intuition: separation enhancement To best separate the classes of a dataset, assuming a constant depth of the network. Refining Architectures of Deep Convolutional Neural Networks——Machine Intelligence Lab, University of Cambridge , UK and Microsoft Research Cambridge, UK[CVPR 2016]

Separation enhancement and deterioration capacity of a layer
Cambridge Correlation Matrices for 8 Convolutional Layers of VGG-11 trained on SAD and CAMIT-NSAD Dark blue: minimum correlation between classes Bright yellow: maximum correlation Correlation Matrics give an indication of the separation between classes for a given convolutional layer. Top Row(SAD):The lower layers can separate the class better as compared to deeper layers Bottom Row(CAMIT-NSAD):The classes are separated lesser in lower layers and more prominently in deeper layers

Cambridge Comparing Cl and Cl+1, which class pairs, the separation increased and which deteriorated The number of class pairs where the separation increased compared between layer l and l-1 The number of class pairs where the separation decreased compared between layer l and l-1 Finding the inner-class separation Separation situation varies through layers for different dataset

Cambridge t: 22084 V: 3056 T: 5618 t: V: 3056 T: 5618 DR=Deep Refined Architecture (proposed approach) DR-1=Deep Refined Architecture with only the Stretch network DR-2=Deep Refined Architecture with only the Symmetric Split Sp-1=L1 Sparsified network Sp-2=L2 Sparsified network

Cambridge Contribution: Provide quantified refining network architecture Realize the balance between precision and model size Query: SAD and CAMIT-NSAD are relatively small dataset and they are only scene data. What about big object dataset like ImageNet? Generalization problem has been avoided in this paper. When we do not know the source of the test data, how to transfer transfer learning the model and refine the better model?

MIT: Object Detectors Emerge in Deep Scene CNNs
The same network can do both object localization and scene recognition in a single forward-pass The deep features from Places-CNN tend to perform better on scene-related recognition tasks compared to the features from ImageNet-CNN Scene recognition and classification Published as a conference paper at ICLR 2015 Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences(CAS) Mix scene data and object data together in the training process CVPR 2016

What needs to be taken into consideration?
Cambridge What needs to be taken into consideration? Scale down the size of our model and make it more easy to controlled while improving feature extraction ability. How to carry out training with both object dataset and scene dataset with one single feature detector and how to merge the abstracted scene information with the features of objects?

Thank you!

Wenchi MA CV Group EECS,KU 03/20/2017

Similar presentations

Presentation on theme: "Wenchi MA CV Group EECS,KU 03/20/2017"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Wenchi MA CV Group EECS,KU 03/20/2017

Similar presentations

Presentation on theme: "Wenchi MA CV Group EECS,KU 03/20/2017"— Presentation transcript:

Similar presentations

About project

Feedback