Introduction to Neural Networks

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Artificial Neural Networks
Lecture 14 – Neural Networks
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Machine learning Image source:
Machine learning Image source:
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Big data classification using neural network
CS 388: Natural Language Processing: Neural Networks
Deep Feedforward Networks
Artificial Neural Networks
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Learning with Perceptrons and Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CS 2770: Computer Vision Neural Networks
CS 2750: Machine Learning Neural Networks
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
COMP24111: Machine Learning and Optimisation
Lecture 24: Convolutional neural networks
Perceptrons Lirong Xia.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
Neural networks (3) Regularization Autoencoder
CS 188: Artificial Intelligence
Neural Networks and Backpropagation
Machine Learning Today: Reading: Maria Florina Balcan
Goodfellow: Chap 6 Deep Feedforward Networks
CS 188: Artificial Intelligence
Hyperparameters, bias-variance tradeoff, validation
Artificial Neural Networks
[Figure taken from googleblog
Convolutional networks
Neural Networks Geoff Hulten.
Deep Learning for Non-Linear Control
Lecture Notes for Chapter 4 Artificial Neural Networks
ML – Lecture 3B Deep NN.
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Ch4: Backpropagation (BP)
Neural networks (1) Traditional multi-layer perceptrons
Image Classification & Training of Neural Networks
Neural networks (3) Regularization Autoencoder
COSC 4335: Part2: Other Classification Techniques
David Kauchak CS158 – Spring 2019
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Image recognition.
Ch4: Backpropagation (BP)
Perceptrons Lirong Xia.
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Introduction to Neural Networks http://playground.tensorflow.org/

Outline Perceptrons Multi-layer neural networks Perceptron update rule Multi-layer neural networks Training method Best practices for training classifiers After that: convolutional neural networks

Recall: “Shallow” recognition pipeline Feature representation Trainable classifier Image Pixels Class label Hand-crafted feature representation Off-the-shelf trainable classifier

“Deep” recognition pipeline Image pixels Layer 1 Layer 2 Layer 3 Simple Classifier Learn a feature hierarchy from pixels to classifier Each layer extracts features from the output of previous layer Train all layers jointly Make clear that classic methods, e.g. convnets are purely supervised.

Neural networks vs. SVMs (a.k.a. “deep” vs. “shallow” learning)

Linear classifiers revisited: Perceptron Input Weights x1 w1 x2 w2 Output: sgn(wx + b) x3 w3 . Can incorporate bias as component of the weight vector by always including a feature with value set to 1 wD xD

Loose inspiration: Human neurons From Wikipedia: At the majority of synapses, signals are sent from the axon of one neuron to a dendrite of another... All neurons are electrically excitable, maintaining voltage gradients across their membranes… If the voltage changes by a large enough amount, an all-or-none electrochemical pulse called an action potential is generated, which travels rapidly along the cell's axon, and activates synaptic connections with other cells when it arrives.

Perceptron training algorithm Initialize weights w randomly Cycle through training examples in multiple passes (epochs) For each training example x with label y: Classify with current weights: If classified incorrectly, update weights: (α is a positive learning rate that decays over time)

Perceptron update rule The raw response of the classifier changes to If y = 1 and y’ = -1, the response is initially negative and will be increased If y = -1 and y’ = 1, the response is initially positive and will be decreased

Convergence of perceptron update rule Linearly separable data: converges to a perfect solution Non-separable data: converges to a minimum-error solution assuming examples are presented in random sequence and learning rate decays as O(1/t) where t is the number of epochs

Multi-layer perceptrons To make nonlinear classifiers out of perceptrons, build a multi-layer neural network! This requires each perceptron to have a nonlinearity

Multi-layer perceptrons To make nonlinear classifiers out of perceptrons, build a multi-layer neural network! This requires each perceptron to have a nonlinearity To be trainable, the nonlinearity should be differentiable Sigmoid: Rectified linear unit (ReLU): g(t) = max(0,t)

Training of multi-layer networks Find network weights to minimize the prediction loss between true and estimated labels of training examples: 𝐸 𝐰 = 𝑖 𝑙( 𝐱 𝑖 , 𝑦 𝑖 ;𝐰) Possible losses (for binary problems): Quadratic loss: 𝑙 𝐱 𝑖 , 𝑦 𝑖 ;𝐰 = 𝑓 𝐰 ( 𝐱 𝑖 )− 𝑦 𝑖 2 Log likelihood loss: 𝑙 𝐱 𝑖 , 𝑦 𝑖 ;𝐰 =−log⁡ 𝑃 𝐰 𝑦 𝑖 | 𝐱 𝑖 Hinge loss: 𝑙 𝐱 𝑖 , 𝑦 𝑖 ;𝐰 =max⁡(0,1− 𝑦 𝑖 𝑓 𝐰 𝐱 𝑖 )

Training of multi-layer networks Find network weights to minimize the prediction loss between true and estimated labels of training examples: 𝐸 𝐰 = 𝑖 𝑙( 𝐱 𝑖 , 𝑦 𝑖 ;𝐰) Update weights by gradient descent: w1 w2

Training of multi-layer networks Find network weights to minimize the prediction loss between true and estimated labels of training examples: 𝐸 𝐰 = 𝑖 𝑙( 𝐱 𝑖 , 𝑦 𝑖 ;𝐰) Update weights by gradient descent: Back-propagation: gradients are computed in the direction from output to input layers and combined using chain rule Stochastic gradient descent: compute the weight update w.r.t. one training example (or a small batch of examples) at a time, cycle through training examples in random order in multiple epochs

Network with a single hidden layer Neural networks with at least one hidden layer are universal function approximators

Network with a single hidden layer Hidden layer size and network capacity: Source: http://cs231n.github.io/neural-networks-1/

Regularization It is common to add a penalty (e.g., quadratic) on weight magnitudes to the objective function: 𝐸 𝐰 = 𝑖 𝑙( 𝐱 𝑖 , 𝑦 𝑖 ;𝐰) +𝜆 𝐰 2 Quadratic penalty encourages network to use all of its inputs “a little” rather than a few inputs “a lot” Source: http://cs231n.github.io/neural-networks-1/

Multi-Layer Network Demo http://playground.tensorflow.org/

Dealing with multiple classes If we need to classify inputs into C different classes, we put C units in the last layer to produce C one-vs.-others scores 𝑓 1 , 𝑓 2 ,…, 𝑓 𝐶 Apply softmax function to convert these scores to probabilities: softmax 𝑓 1 ,…, 𝑓 𝑐 = exp⁡( 𝑓 1 ) 𝑗 exp⁡( 𝑓 𝑗 ) ,…, exp⁡( 𝑓 𝐶 ) 𝑗 exp⁡( 𝑓 𝑗 ) If one of the inputs is much larger than the others, then the corresponding softmax value will be close to 1 and others will be close to 0 Use log likelihood (cross-entropy) loss: 𝑙 𝐱 𝑖 , 𝑦 𝑖 ;𝐰 =−log⁡ 𝑃 𝐰 𝑦 𝑖 | 𝐱 𝑖

Neural networks: Pros and cons Flexible and general function approximation framework Can build extremely powerful models by adding more layers Cons Hard to analyze theoretically (e.g., training is prone to local optima) Huge amount of training data, computing power may be required to get good performance The space of implementation choices is huge (network architectures, parameters)

Best practices for training classifiers Goal: obtain a classifier with good generalization or performance on never before seen data Learn parameters on the training set Tune hyperparameters (implementation choices) on the held out validation set Evaluate performance on the test set Crucial: do not peek at the test set when iterating steps 1 and 2!

What’s the big deal?

http://www.image-net.org/challenges/LSVRC/announcement-June-2-2015

Bias-variance tradeoff Prediction error of learning algorithms has two main components: Bias: error due to simplifying model assumptions Variance: error due to randomness of training set Bias-variance tradeoff can be controlled by turning “knobs” that determine model complexity High bias, low variance Low bias, high variance Figure source

Underfitting and overfitting Underfitting: training and test error are both high Model does an equally poor job on the training and the test set The model is too “simple” to represent the data or the model is not trained well Overfitting: Training error is low but test error is high Model fits irrelevant characteristics (noise) in the training data Model is too complex or amount of training data is insufficient Underfitting Good tradeoff Overfitting Figure source