ECE 5424: Introduction to Machine Learning

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

Advanced topics.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.
Lecture 14 – Neural Networks
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
Artificial Neural Networks
Classification / Regression Neural Networks 2
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
ECE 6504: Deep Learning for Perception
Non-Bayes classifiers. Linear discriminants, neural networks.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Fall 2004 Backpropagation CS478 - Machine Learning.
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Learning with Perceptrons and Neural Networks
ECE 5424: Introduction to Machine Learning
CS 2750: Machine Learning Neural Networks
第 3 章 神经网络.
Matt Gormley Lecture 16 October 24, 2016
Lecture 24: Convolutional neural networks
ECE 5424: Introduction to Machine Learning
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 6504 Deep Learning for Perception
Convolution Neural Networks
ECE 5424: Introduction to Machine Learning
Neural Networks and Backpropagation
Non-linear classifiers Neural networks
Classification / Regression Neural Networks 2
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
Goodfellow: Chap 6 Deep Feedforward Networks
CS 4501: Introduction to Computer Vision Training Neural Networks II
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Neural Networks Geoff Hulten.
Artificial Intelligence Chapter 3 Neural Networks
Support Vector Machine I
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Introduction to Radial Basis Function Networks
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
David Kauchak CS158 – Spring 2019
Image recognition.
Dhruv Batra Georgia Tech
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

ECE 5424: Introduction to Machine Learning Topics: Neural Networks Backprop Readings: Murphy 16.5 Stefan Lee Virginia Tech

Recap of Last Time (C) Dhruv Batra

Not linearly separable data Some datasets are not linearly separable! http://www.eee.metu.edu.tr/~alatan/Courses/Demo/AppletSVM.html

Addressing non-linearly separable data – Option 1, non-linear features Choose non-linear features, e.g., Typical linear features: w0 + i wi xi Example of non-linear features: Degree 2 polynomials, w0 + i wi xi + ij wij xi xj Classifier hw(x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin

Addressing non-linearly separable data – Option 2, non-linear classifier Choose a classifier hw(x) that is non-linear in parameters w, e.g., Decision trees, neural networks,… More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin

Biological Neuron (C) Dhruv Batra

Recall: The Neuron Metaphor Neurons accept information from multiple inputs, transmit information to other neurons. Multiply inputs by weights along edges Apply some function to the set of inputs at each node Slide Credit: HKUST

Types of Neurons Slide Credit: HKUST Linear Neuron Logistic Neuron Potentially more. Require a convex loss function for gradient descent training. Perceptron Slide Credit: HKUST

Limitation A single “neuron” is still a linear decision boundary What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra

Multilayer Networks Cascade Neurons together The output from one layer is the input to the next Each Layer has its own sets of weights Slide Credit: HKUST

Universal Function Approximators Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi ’89] (C) Dhruv Batra

Plan for Today Neural Networks Parameter learning Backpropagation (C) Dhruv Batra

Forward Propagation On board (C) Dhruv Batra

Feed-Forward Networks Predictions are fed forward through the network to classify Slide Credit: HKUST

Feed-Forward Networks Predictions are fed forward through the network to classify Slide Credit: HKUST

Feed-Forward Networks Predictions are fed forward through the network to classify Slide Credit: HKUST

Feed-Forward Networks Predictions are fed forward through the network to classify Slide Credit: HKUST

Feed-Forward Networks Predictions are fed forward through the network to classify Slide Credit: HKUST

Feed-Forward Networks Predictions are fed forward through the network to classify Slide Credit: HKUST

Gradient Computation First let’s try: Single Neuron for Linear Regression (C) Dhruv Batra

Gradient Computation First let’s try: Now let’s try the general case Single Neuron for Linear Regression Now let’s try the general case Backpropagation! Really efficient (C) Dhruv Batra

Neural Nets Best performers on OCR NetTalk Rick Rashid speaks Mandarin http://yann.lecun.com/exdb/lenet/index.html NetTalk Text to Speech system from 1987 http://youtu.be/tXMaFhO6dIY?t=45m15s Rick Rashid speaks Mandarin http://youtu.be/Nu-nlQqFCKg?t=7m30s (C) Dhruv Batra

Historical Perspective (C) Dhruv Batra

Convergence of backprop Perceptron leads to convex optimization Gradient descent reaches global minima Multilayer neural nets not convex Gradient descent gets stuck in local minima Hard to set learning rate Selecting number of hidden units and layers = fuzzy process NNs had fallen out of fashion in 90s, early 2000s Back with a new name and significantly improved performance!!!! Deep networks Dropout and trained on much larger corpus (C) Dhruv Batra Slide Credit: Carlos Guestrin

Overfitting Many many many parameters Avoiding overfitting? More training data Regularization Early stopping (C) Dhruv Batra

A quick note (C) Dhruv Batra Image Credit: LeCun et al. ‘98

Rectified Linear Units (ReLU) (C) Dhruv Batra

Convolutional Nets Basic Idea On board Assumptions: Local Receptive Fields Weight Sharing / Translational Invariance / Stationarity Each layer is just a convolution! (C) Dhruv Batra Image Credit: Chris Bishop

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

Convolutional Nets Example: http://yann.lecun.com/exdb/lenet/index.html (C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]

Autoencoders Goal Compression: Output tries to predict input (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders

Autoencoders Goal Learns a low-dimensional “basis” for the data (C) Dhruv Batra Image Credit: Andrew Ng

Stacked Autoencoders How about we compress the low-dim features more? (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders

Figure courtesy: Quoc Le Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le (C) Dhruv Batra

Stacked Autoencoders Finally perform classification with these low-dim features. (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders

What you need to know about neural networks Perceptron: Representation Derivation Multilayer neural nets Derivation of backprop Learning rule Expressive power