Vision-inspired classification

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
A shallow introduction to Deep Learning
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Deep Convolutional Nets
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Introduction to Deep Learning
Web-Mining Agents Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Übungen)
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Introduction to Convolutional Neural Networks
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Neural networks and support vector machines
Big data classification using neural network
Regression.
Convolutional Neural Network
Deep learning David Kauchak CS158 – Fall 2016.
Deep Learning Amin Sobhani.
an introduction to: Deep Learning
Data Mining, Neural Network and Genetic Programming
ECE 5424: Introduction to Machine Learning
Deep Learning Insights and Open-ended Questions
Artificial Intelligence (CS 370D)
Article Review Todd Hricik.
Mixture of SVMs for Face Class Modeling
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Intelligent Information System Lab
Supervised Training of Deep Networks
Deep Learning Qing LU, Siyuan CAO.
Convolutional Networks
Deep Belief Networks Psychology 209 February 22, 2013.
Let us start with a review/preview
Unsupervised Learning and Autoencoders
Deep Learning Workshop
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Non-linear classifiers Neural networks
State-of-the-art face recognition systems
Introduction to Deep Learning for neuronal data analyses
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Deep learning Introduction Classes of Deep Learning Networks
of the Artificial Neural Networks.
Gaurav Aggarwal, Mark Shaw, Christian Wolf
Perceptron as one Type of Linear Discriminants
On Convolutional Neural Network
Lecture Notes for Chapter 4 Artificial Neural Networks
Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
Autoencoders hi shea autoencoders Sys-AI.
COSC 4335: Part2: Other Classification Techniques
Computer Vision Lecture 19: Object Recognition III
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Word representations David Kauchak CS158 – Fall 2016.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Vision-inspired classification Deep Learning Vision-inspired classification Bart M. ter Haar Romeny Northeastern University, Shenyang, China Eindhoven University of Technology, Netherlands

From Wikipedia: Deep learning is a class of machine learning algorithms that use a cascade of many layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. The algorithms may be supervised or unsupervised. Applications include pattern analysis (unsupervised) and classification (supervised).

Deep Learning are based on the (unsupervised) learning of multiple levels of features or representations of the data. Higher level features are derived from lower level features to form a hierarchical representation. So we have to learn: What are filters? → Convolution What are features? → Geometric invariants How can kernels be learned? → Principal Component Analysis How does the visual system this? → Front-end visual cortex How can we use this? → Software developments

Deep Learning is a very hot area of Machine Learning Research, with many remarkable recent successes, such as 97.5% accuracy on face recognition, nearly perfect German traffic sign recognition, or even Dogs vs Cats image recognition with 98.9% accuracy. Many winning entries in recent Kaggle Data Science competitions have used Deep Learning. The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after papers by Geoffrey Hinton and his co-workers which showed a fast way to train such networks. http://www.kdnuggets.com/2014/05/learn-deep-learning-courses-tutorials-overviews.html

Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called ConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks. In May 2014, Baidu, the Chinese search giant, has hired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoffrey Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab).

an introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc, Geoffrey E. Hinton https://www.cs.toronto.edu/~hinton/

DL is providing breakthrough results in speech recognition and image classification … From this Hinton et al 2012 paper: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/38131.pdf go here: http://yann.lecun.com/exdb/mnist/ From here: http://people.idsia.ch/~juergen/cvpr2012.pdf

So, 1. what exactly is deep learning. And, 2 So, 1. what exactly is deep learning ? And, 2. why is it generally better than other methods on image, speech and certain other types of data?

So, 1. what exactly is deep learning ? And, 2. why is it generally better than other methods on image, speech and certain other types of data? The short answers 1. ‘Deep Learning’ means using a neural network with several layers of nodes between input and output 2. the series of layers between input & output do feature identification and processing in a series of stages, just as our brains seem to.

hmmm… OK, but: 3. multilayer neural networks have been around for 25 years. What’s actually new?

3. multilayer neural networks have been around for hmmm… OK, but: 3. multilayer neural networks have been around for 25 years. What’s actually new? we have always had good algorithms for learning the weights in networks with 1 hidden layer but these algorithms are not good at learning the weights for networks with more hidden layers what’s new is: algorithms for training many-later networks

-0.06 W1 W2 f(x) -2.5 W3 1.4

-0.06 2.7 -8.6 f(x) -2.5 0.002 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34 1.4

A dataset Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …

Training the neural network Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …

Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Initialise with random weights

Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 1.4 2.7 1.9

Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 1.4 2.7 0.8 1.9

Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 1.4 2.7 0.8 1.9 error 0.8

Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 1.4 2.7 0.8 1.9 error 0.8

Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 6.4 2.8 1.7

Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 6.4 2.8 0.9 1.7

1 Training data Fields class Compare with target output 1.4 2.7 1.9 0 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 6.4 2.8 0.9 1 1.7 error -0.1

1 Training data Fields class Adjust weights based on error 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 6.4 2.8 0.9 1 1.7 error -0.1

1 Training data Fields class And so on …. 1.4 2.7 1.9 0 3.8 3.4 3.2 0 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … And so on …. 6.4 2.8 0.9 1 1.7 error -0.1 Repeat this thousands, maybe millions of times – each time taking a random training instance, and making slight weight adjustments Algorithms for weight adjustment are designed to make changes that will reduce the error

The decision boundary perspective… Initial random weights

The decision boundary perspective… Present a training instance / adjust the weights

The decision boundary perspective… Present a training instance / adjust the weights

The decision boundary perspective… Present a training instance / adjust the weights

The decision boundary perspective… Present a training instance / adjust the weights

The decision boundary perspective… Eventually ….

The point I am trying to make weight-learning algorithms for NNs are dumb they work by making thousands and thousands of tiny adjustments, each making the network do better at the most recent pattern, but perhaps a little worse on many others but, by dumb luck, eventually this tends to be good enough to learn effective classifiers for many real applications

Some other points If f(x) is non-linear, a network with 1 hidden layer can, in theory, learn perfectly any classification problem. A set of weights exists that can produce the targets from the inputs. The problem is finding them.

Some other ‘by the way’ points If f(x) is linear, the NN can only draw straight decision boundaries (even if there are many layers of units)

Some other ‘by the way’ points NNs use nonlinear f(x) so they can draw complex boundaries, but keep the data unchanged

Some other ‘by the way’ points NNs use nonlinear f(x) so they SVMs only draw straight lines, can draw complex boundaries, but they transform the data first but keep the data unchanged in a way that makes that OK

Feature detectors

what is this unit doing?

Hidden layer units become self-organised feature detectors 1 5 10 15 20 25 … … 1 strong +ve weight low/zero weight 63

What does this unit detect? 1 5 10 15 20 25 … … 1 strong +ve weight low/zero weight 63

What does this unit detect? 1 5 10 15 20 25 … … 1 strong +ve weight low/zero weight it will send strong signal for a horizontal line in the top row, ignoring everywhere else 63

What does this unit detect? 1 5 10 15 20 25 … … 1 strong +ve weight low/zero weight 63

What does this unit detect? 1 5 10 15 20 25 … … 1 strong +ve weight low/zero weight Strong signal for a dark area in the top left corner 63

What features might you expect a good NN to learn, when trained with data like this?

vertical lines 1 63

Horizontal lines 1 63

Small circles 1 63

But what about position invariance ??? Small circles 1 But what about position invariance ??? our example unit detectors were tied to specific parts of the image 63

successive layers can learn higher-level features … etc … detect lines in Specific positions Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc… etc … v

successive layers can learn higher-level features … etc … detect lines in Specific positions Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc… etc … v What does this unit detect?

So: multiple layers make sense

So: multiple layers make sense Many-layer neural network architectures should be capable of learning the true underlying features and ‘feature logic’, and therefore generalise very well …

But, until very recently, our weight-learning algorithms simply did not work on multi-layer architectures

Along came deep learning …

The new way to train multi-layer NNs…

The new way to train multi-layer NNs… Train this layer first

The new way to train multi-layer NNs… Train this layer first then this layer

The new way to train multi-layer NNs… Train this layer first then this layer then this layer

The new way to train multi-layer NNs… Train this layer first then this layer then this layer then this layer

The new way to train multi-layer NNs… Train this layer first then this layer then this layer then this layer finally this layer

The new way to train multi-layer NNs… EACH of the (non-output) layers is trained to be an auto-encoder Basically, it is forced to learn good features that describe what comes from the previous layer

an auto-encoder is trained, with an absolutely standard weight-adjustment algorithm to reproduce the input

an auto-encoder is trained, with an absolutely standard weight-adjustment algorithm to reproduce the input By making this happen with (many) fewer units than the inputs, this forces the ‘hidden layer’ units to become good feature detectors

intermediate layers are each trained to be auto encoders (or similar)

Final layer trained to predict class based on outputs from previous layers

And that’s that That’s the basic idea There are many many types of deep learning, different kinds of autoencoder, variations on architectures and training algorithms, etc… Very fast growing area …