CNN architectures Mostly linear structure

CNN architectures Mostly linear structure
Generally a DAG, directed acyclic graph LeNet AlexNet ZF Net GoogLeNet VGGNet ResNet VisGraph, HKUST VisGraph, HKUST

Learned convolutional filters: Stage 1
9 patches with strongest activation learned filters (7x7x96) These visualizations by Matt Zeiler and Rob Fergus will give you an idea of what the network is doing at different stages. The higher you go the richer and the more specialized the features are. Visualizing and understanding convolutional neural networks. Zeiler, Matthew D., and Rob Fergus. arXiv preprint arXiv: (2013).

Strongest activations: Stage 2
Visualizing and understanding convolutional neural networks. Zeiler, Matthew D., and Rob Fergus. arXiv preprint arXiv: (2013).

Open questions Only empirical that deeper is better
Images contain hierarchical structures Overfitting and generalization meaningful data! Intrinsic laws Networks are non-convex Need regularization Smaller networks are hard to train with local methods Local minima are bad, in loss, not stable, large variance Bigger ones are easier More local minima, but better, more stable, small variance As big as the computational power, and data!

CNN applications Transfer learning Fine-tuning the CNN
Keep some early layers Early layers contain more generic features, edges, color blobs Common to many visual tasks Fine-tune the later layers More specific to the details of the class CNN as feature extractor Remove the last fully connected layer A kind of descriptor or CNN codes for the image AlexNet gives a 4096 Dim descriptor VisGraph, HKUST VisGraph, HKUST

What is object classification?
Demo on stanford cs231n

What are visual tasks? General visual tasks Increased difficulties
Specific ‘face’, ‘people’ detection and recognition OCR  don’t under-estimate ‘small’ problems Increased difficulties Classification (the main or dominant object) Localization (the dominant object) Detection (any number, any size) Segmentation (semantic pixel level) I’ll first define object detection from a computer vision perspective with respect to other tasks such as classification which usually assumes predicting the class of the main object of an image, localization where you have to predict the class of the main object but also a tight bounding box around it. Now detection is similar to localization except that objects can be of any size and in any number (including zero). Segmentation goes one step beyond by labeling every pixel of an image. These are ordered by increasing difficulty, and you probably don’t need to go all the way to segmentation for many tasks.

Why are they important? Robotics
Perception is broader, and the bottleneck, the visual perception is the fundamental Self-driving cars Surveillance Perception is a big deal and is currently one of the biggest bottlenecks for applications such as robotics, self driving cars or surveillance (pandas only).

Face detection and recognition
detection is easy pre-DNN Voila’s approach, 2001, Haar feature, adaboosting, cascading classifier verification: a binary classification verify weather two images belong to the same person identification: a multi-class classification classify an image into one of N identity classes Key challenges intra-personal variations inter-personal variations

Are they deployed? classification
personal image search (Google, Baidu, Bing) detection face detection cameras election duplicate votes CCTV border control casinos visa processing crime solving prosopagnosia (face blindness) objects license plates pedestrian detection (Daimler, MobileEye): e.g Mercedes-Benz E-Class and S-Class: warning and automatic braking reducing accidents and severity vehicle detection for forward collision warning (MobileEye) traffic sign detection (MobileEye) What has been deployed so far? Regarding the recent deep learning work, mostly classification (deployed in 6 months at Google after the acquisition of the Toronto group). Regarding the more traditional vision, there’s been a lot of deployment for face detection because that’s one of the easiest detection problems. But more complicated detection has recently made its way into cars for example with pedestrian detection in the 2013 mercedes.

Pre- and post-DNN hand-crafted features and descriptors DNN era
Bag of words Vocabulary tree DNN era

SUPERVISED DEEP SHALLOW UNSUPERVISED Recurrent Neural Net Boosting
Convolutional Neural Net Neural Net Perceptron SVM DEEP SHALLOW Deep (sparse/denoising) Autoencoder Autoencoder Neural Net Sparse Coding To situate this tutorial in the machine learning context, we’ll be talking about convnets which are in the deep and supervised area of machine learning. although they can be initialized with unsupervised pre-training too. SP GMM Deep Belief Net Restricted BM BayesNP UNSUPERVISED Slide: M. Ranzato

CNN architectures Mostly linear structure

Similar presentations

Presentation on theme: "CNN architectures Mostly linear structure"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CNN architectures Mostly linear structure

Similar presentations

Presentation on theme: "CNN architectures Mostly linear structure"— Presentation transcript:

Similar presentations

About project

Feedback