Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Artificial neural networks
Kostas Kontogiannis E&CE
Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Radial Basis Functions
Chapter 5 NEURAL NETWORKS
Neural Networks Multi-stage regression/classification model activation function PPR also known as ridge functions in PPR output function bias unit synaptic.
Neural Networks Marco Loog.
Back-Propagation Algorithm
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 8: Two simple learning algorithms
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
NEURAL NETWORKS FOR DATA MINING
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
1 Predictive Learning from Data Electrical and Computer Engineering LECTURE SET 5 Nonlinear Optimization Strategies.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Machine Learning Supervised Learning Classification and Regression
Predictive Learning from Data
Deep Feedforward Networks
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
One-layer neural networks Approximation problems
第 3 章 神经网络.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning Today: Reading: Maria Florina Balcan
Collaborative Filtering Matrix Factorization Approach
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Chapter 3 Neural Networks
Predictive Learning from Data
Neural Networks Geoff Hulten.
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
COSC 4335: Part2: Other Classification Techniques
Introduction to Neural Networks
Ch4: Backpropagation (BP)
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo

2 Modeling the Human Brain Input builds up on receptors (dendrites) Cell has an input threshold Upon breech of cell’s threshold, activation is fired down the axon.

3 “Magical” Secrets Revealed Linear features are derived from inputs Target concept(s) are non-linear functions of features

4 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

5 Projection Pursuit Regression Generalization of 2-layer regression NN Universal approximator Good for prediction Not good for deriving interpretable models of data

6 Projection Pursuit Regression Output Inputs ridge functions unit vectors &

7 PPR: Derived Features Dot product is projection of onto Ridge function varies only in the direction

8 PPR: Training Minimize squared error Consider Given, we derive features and smooth Given, we minimize over with Newton’s Method Iterate those two steps to convergence

9 PPR: Newton’s Method Use derivatives to iteratively improve estimate Least squares regression to hit the target

10 PPR: Implementation Details Suggested smoothing methods Local regression Smoothing splines ‘s can be readjusted with backfitting ‘s usually not readjusted “(, ) pairs added in a forward stage-wise manner”

11 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

12 Neural Networks

13 NNs: Sigmoid and Softmax Transforming activation to probability (?) Sigmoid: Softmax: Just like multilogit model

14 NNs: Training We need an error function to minimize Regression: sum squared error Classification: cross-entropy Generic approach: Gradient Descent (a.k.a. back propagation) Error functions are differentiable Forward pass to evaluate activations, backward pass to update weights

15 NNs: Back Propagation Back propagation equations: Update rules:

16 NNs: Back Propagation Details Those were regression equations; classification equations are similar Can be batch or online Online learning rates can be decreased during training, ensuring convergence Usually want to start weights small Sometimes unpractically slow

17 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

18 Issues in Training: Overfitting Problem: might reach the global minimum of Proposed solutions: Limit training by watching the performance of a test set Weight decay: penalizing large weights

19 A Closer Look at Weight Decay Less complicated hypothesis has lower error rate

20 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

21 Example #1: Synthetic Data More hidden nodes -> overfitting Multiple initial weight settings should be tried Radial function learned poorly

22 Example #1: Synthetic Data 2 parameters to tune: Weight decay Hidden units Suggested training strategy: Fix either parameter where model is least constrained Cross validate other

23 Example #2: ZIP Code Data Yann LeCun NNs can be structurally tailored to suit the data Weight sharing: multiple units in a given layer will condition the same weights

24 Example #2: 5 Networks Net 1: No hidden layer Net 2: One hidden layer Net 3: 2 hidden layers Local connectivity Net 4: 2 hidden layers Local connectivity 1 layer weight sharing Net 5: 2 hidden layers Local connectivity 2 layer weight sharing

25 Example #2: Results Net 5 does best Small number of features identifiable throughout image

26 Conclusions Neural Networks are very general approach to both regression and classification Effective learning tool when: Signal / noise is high Prediction is desired Formulating a description of a problem’s solution is not desired Targets are naturally distinguished by direction as opposed to distance