Neural Networks Geoff Hulten.

Slides:



Advertisements
Similar presentations
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Back-Propagation Algorithm
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Neural Networks Prepared by David Douglas, University of ArkansasHosted.
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
RNNs: An example applied to the prediction task
Deep Feedforward Networks
Artificial Neural Networks
Deep Learning Amin Sobhani.
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
ECE 5424: Introduction to Machine Learning
Computer Science and Engineering, Seoul National University
COMP24111: Machine Learning and Optimisation
Matt Gormley Lecture 16 October 24, 2016
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
Neural Networks and Backpropagation
Classification / Regression Neural Networks 2
RNNs: Going Beyond the SRN in Language Prediction
CSC 578 Neural Networks and Deep Learning
Machine Learning Today: Reading: Maria Florina Balcan
Goodfellow: Chap 6 Deep Feedforward Networks
CS 4501: Introduction to Computer Vision Training Neural Networks II
BACKPROPAGATION Multlayer Network.
Convolutional networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Overfitting and Underfitting
Backpropagation.
Overview of Neural Network Architecture Assignment Code
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
RNNs: Going Beyond the SRN in Language Prediction
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
COSC 4335: Part2: Other Classification Techniques
Backpropagation and Neural Nets
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
David Kauchak CS158 – Spring 2019
Introduction to Neural Networks
CSC 578 Neural Networks and Deep Learning
Principles of Back-Propagation
Overall Introduction for the Lecture
Presentation transcript:

Neural Networks Geoff Hulten

The Human Brain (According to a computer scientist) Network of ~100 Billion Neurons Each ~1,000 – 10,000 connections Send electro-chemical signals Activation time ~10 ms second ~100 Neuron chain in 1 second Image from Wikipedia

Artificial Neural Network Grossly simplified approximation of how the brain works Features used as input to an initial set of artificial neurons Output of artificial neurons used as input to others Output of the network used as prediction Mid 2010s image processing ~50-100 layers ~10-60 million artificial neurons Artificial Neuron (Sigmoid Unit)

Example Neural Network Fully connected network Single Hidden Layer 2313 weights to learn 1 connection per pixel + bias 1 connection per pixel + bias 𝑃(𝑦=1) 1 connection per pixel + bias 576 Pixels (Normalized) Output Layer 1 connection per pixel + bias 2,308 Weights 5 Weights Hidden Layer

Example of Predicting with Neural Network 𝑤 0 0.5 𝑤 1 -1.0 𝑤 2 1.0 0.0 ~0.5 𝑥 1 1.0 𝑥 2 0.5 1.5 𝑃 𝑦=1 =~0.82 ~0.75 1.0 𝑤 0 0.25 𝑤 1 1.0 𝑤 2 𝑤 0 1.0 𝑤 1 0.5 𝑤 2 -1.0

with Responses as input What’s Going On? Very limited feature engineering on input Hidden nodes learn useful features instead Positive Weight? Weights from Hidden Node 1 𝑃(𝑦=1) Negative Weight? Logistic Regression with Responses as input Input Image (Normalized) Weights from Hidden Node 2

Another Example Neural Network Fully connected network Two Hidden Layers 2333 weights to learn 1 connection per pixel + bias 1 connection per pixel + bias 𝑃(𝑦=1) 1 connection per pixel + bias 576 Pixels (Normalized) Output Layer 1 connection per pixel + bias 2,308 Weights 20 Weights 5 Weights Hidden Layer Hidden Layer

Output Layer Single network (training run), multiple tasks Hidden nodes learn generally useful filters 𝑃(𝑒𝑦𝑒𝑂𝑝𝑒𝑛𝑒𝑑) 𝑃(𝑙𝑒𝑓𝑡𝐸𝑦𝑒) 𝑃(𝑔𝑙𝑎𝑠𝑠𝑒𝑠) 576 Pixels (Normalized) 𝑃(𝑐ℎ𝑖𝑙𝑑) Hidden Layer Output Layer

Neural Network Architectures/Concepts Fully connected layers Convolutional Layers MaxPooling Activation (ReLU) Softmax Recurrent Networks (LSTM & attention) Embeddings Residual Networks Batch Normalization Dropout Will explore in more detail later

Loss we’ll use for Neural Networks 𝐿𝑜𝑠𝑠 < 𝑦 ^ >, <𝑦> = 1 2 𝑘∈𝑜𝑢𝑡𝑝𝑢𝑡𝑠 ( 𝑦 𝑘 ^ − 𝑦 𝑘 ) 2 𝐿𝑜𝑠𝑠 <.5, .1>, <1,0> = 1 2 ( .5 −1 2 + (.1 −0) 2 ) 𝐿𝑜𝑠𝑠 𝑡𝑒𝑠𝑡𝑆𝑒𝑡 = 1 𝑛 𝑖 𝑛 𝐿𝑜𝑠𝑠(< 𝑦 ^ >, <𝑦>) 𝑦 1 ^ 𝑦 2 ^ 𝑦 1 𝑦 2 .5 .1 1 .95 All sorts of options for loss functions for Neural Networks…

Optimizing Neural Nets – Back Propagation Gradient descent over entire network’s weight vector Easy to adapt to different network architectures Converges to local minimum (usually won’t find global minimum) Training can be very slow! For this week’s assignment…sorry… For next week we’ll use a package In general very well suited to run on GPU

Conceptual Backprop Forward Propagation Back Propagation Update Weights 𝛿 ℎ1 ~0.5 Figure out how much each part contributes to the error. 𝛿 𝑜 ~0.82 𝑥 1 1.0 𝑥 2 0.5 𝑦 1 Figure out how much error the network makes on the sample: 𝑒𝑟𝑟𝑜𝑟 ~ 𝑦 ^ −𝑦 ~0.75 𝛿 ℎ2 Step each weight to reduce the error it is contributing to

Backprop Example Forward Propagation Back Propagation Update Weights 𝑤 0 0.5 𝑤 1 -1.0 𝑤 2 1.0 ∝=0.1 𝛿 ℎ1 𝛿 𝑜 = 𝑦 ^ (1− 𝑦 ^ )(𝑦− 𝑦 ^ ) ~0.5 𝛿 𝑜 =0.027 𝛿 𝑜 ~0.82 𝑥 1 1.0 𝑥 2 0.5 𝑦 1 Error = ~0.18 𝑤 0 0.25 𝑤 1 1.0 𝑤 2 ~0.75 ∆𝑤 𝑖 =∝ 𝛿 𝑛𝑒𝑥𝑡 𝑥 𝑖 𝛿 ℎ2 ∆𝑤 𝑖 =∝ 𝛿 𝑛𝑒𝑥𝑡 𝑥 𝑖 ∆𝑤 0 =0.0027 ∆𝑤 1 =0.0013 ∆𝑤 2 =0.0020 𝑤 0 1.0 𝑤 1 0.5 𝑤 2 -1.0 ∆𝑤 0 =0.0005 ∆𝑤 1 =0.0005 ∆𝑤 2 =0.00025 𝛿 ℎ = 𝑜 ℎ (1− 𝑜 ℎ ) 𝑘𝜖𝑂𝑢𝑡𝑝𝑢𝑡𝑠 𝑤 𝑘ℎ 𝛿 𝑘 𝛿 ℎ2 =.005

Backprop Algorithm Initialize all weights to small random number (-0.05 – 0.05) While not ‘time to stop’ repeatedly loop over training data: Input a single training sample to network and calculate 𝑜 𝑢 for every neuron Back propagate the errors from the output to every neuron 𝛿 𝑜 = 𝑦 ^ (1− 𝑦 ^ )(𝑦− 𝑦 ^ ) 𝛿 ℎ = 𝑜 ℎ (1− 𝑜 ℎ ) 𝑘𝜖𝑂𝑢𝑡𝑝𝑢𝑡𝑠 𝑤 𝑘ℎ 𝛿 𝑘 Update every weight in the network ∆𝑤 𝑖 =∝ 𝛿 𝑛𝑒𝑥𝑡 𝑥 𝑖 𝑤 𝑖 = 𝑤 𝑖 + ∆𝑤 𝑖 Stopping Criteria: # of Epochs (passes through data) Training set loss stops going down Accuracy on validation data

Backprop with Hidden Layer (or multiple outputs) Forward Propagation Back Propagation Update Weights 𝛿 ℎ1,1 = 𝑜 ℎ1,1 1− 𝑜 ℎ1,1 ∗( 𝑤 1,1→2,1 𝛿 2,1 + 𝑤 1,1→2,2 𝛿 2,2 ) 𝑤 1,1→2,1 𝛿 ℎ1,1 𝛿 ℎ2,1 𝑤 1,1→2,2 𝛿 𝑜 𝑥 1 1.0 𝑥 2 0.5 𝑦 1 ∆𝑤 𝑖 =∝ 𝛿 𝑛𝑒𝑥𝑡 𝑥 𝑖 𝛿 ℎ1,2 𝛿 ℎ2,2 𝛿 𝑜 = 𝑦 ^ (1− 𝑦 ^ )(𝑦− 𝑦 ^ ) 𝛿 ℎ = 𝑜 ℎ (1− 𝑜 ℎ ) 𝑘𝜖𝑂𝑢𝑡𝑝𝑢𝑡𝑠 𝑤 𝑘ℎ 𝛿 𝑘

Stochastic Gradient Descent Calculate gradient on all samples Step Stochastic Gradient Descent Calculate gradient on some samples Stochastic can make progress faster (large training set) Stochastic takes a less direct path to convergence Batch Size: N instead of 1 Per Sample Gradient Gradient Descent Stochastic Gradient Descent

Local Optimum and Momentum Why is this okay? In practice: Neural networks overfit… Momentum ∆𝑤 𝑖 𝑛 = ∆𝑤 𝑖 𝑛 +𝛽 ∆𝑤 𝑖 (𝑛−1) Power through local optimums Converge faster (?) Loss Parameters

Dead Neurons & Vanishing Gradients Neurons can die 𝛿 ℎ = 𝑜 ℎ (1− 𝑜 ℎ ) * <stuff> Large weights cause gradients to ‘vanish’ Test: Assert if this condition occurs What causes this Poor initialization of weights Optimization that gets out of hand Input variables unnormalized 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 10 ~.99995 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 20 ~.999999998

What should you do with Neural Networks? As a model (similar to others we’ve learned) Fully connected networks Few hidden layers (1,2,3) A few dozen nodes per hidden layer Tune # layers Tune # nodes per layer Do some feature engineering Be careful of overfitting Simplify if not converging Leveraging recent breakthroughs Understand standard architectures Get some GPU acceleration Get lots of data Craft a network architecture More on this next class

Summary of Artificial Neural Networks Model that very crudely approximates the way human brains work Each artificial neuron similar to linear model, with non-linear activation function Neural networks are very expressive, can learn complex concepts (and overfit) Neural networks learn features (which we might have hand crafted without them) Many options for network architectures Backpropagation is a flexible algorithm to learn neural networks