Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Similar presentations


Presentation on theme: "Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo."— Presentation transcript:

1 Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo

2 2 Modeling the Human Brain Input builds up on receptors (dendrites) Cell has an input threshold Upon breech of cell’s threshold, activation is fired down the axon.

3 3 “Magical” Secrets Revealed Linear features are derived from inputs Target concept(s) are non-linear functions of features

4 4 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

5 5 Projection Pursuit Regression Generalization of 2-layer regression NN Universal approximator Good for prediction Not good for deriving interpretable models of data

6 6 Projection Pursuit Regression Output Inputs ridge functions unit vectors &

7 7 PPR: Derived Features Dot product is projection of onto Ridge function varies only in the direction

8 8 PPR: Training Minimize squared error Consider Given, we derive features and smooth Given, we minimize over with Newton’s Method Iterate those two steps to convergence

9 9 PPR: Newton’s Method Use derivatives to iteratively improve estimate Least squares regression to hit the target

10 10 PPR: Implementation Details Suggested smoothing methods Local regression Smoothing splines ‘s can be readjusted with backfitting ‘s usually not readjusted “(, ) pairs added in a forward stage-wise manner”

11 11 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

12 12 Neural Networks

13 13 NNs: Sigmoid and Softmax Transforming activation to probability (?) Sigmoid: Softmax: Just like multilogit model

14 14 NNs: Training We need an error function to minimize Regression: sum squared error Classification: cross-entropy Generic approach: Gradient Descent (a.k.a. back propagation) Error functions are differentiable Forward pass to evaluate activations, backward pass to update weights

15 15 NNs: Back Propagation Back propagation equations: Update rules:

16 16 NNs: Back Propagation Details Those were regression equations; classification equations are similar Can be batch or online Online learning rates can be decreased during training, ensuring convergence Usually want to start weights small Sometimes unpractically slow

17 17 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

18 18 Issues in Training: Overfitting Problem: might reach the global minimum of Proposed solutions: Limit training by watching the performance of a test set Weight decay: penalizing large weights

19 19 A Closer Look at Weight Decay Less complicated hypothesis has lower error rate

20 20 Outline Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

21 21 Example #1: Synthetic Data More hidden nodes -> overfitting Multiple initial weight settings should be tried Radial function learned poorly

22 22 Example #1: Synthetic Data 2 parameters to tune: Weight decay Hidden units Suggested training strategy: Fix either parameter where model is least constrained Cross validate other

23 23 Example #2: ZIP Code Data Yann LeCun NNs can be structurally tailored to suit the data Weight sharing: multiple units in a given layer will condition the same weights

24 24 Example #2: 5 Networks Net 1: No hidden layer Net 2: One hidden layer Net 3: 2 hidden layers Local connectivity Net 4: 2 hidden layers Local connectivity 1 layer weight sharing Net 5: 2 hidden layers Local connectivity 2 layer weight sharing

25 25 Example #2: Results Net 5 does best Small number of features identifiable throughout image

26 26 Conclusions Neural Networks are very general approach to both regression and classification Effective learning tool when: Signal / noise is high Prediction is desired Formulating a description of a problem’s solution is not desired Targets are naturally distinguished by direction as opposed to distance


Download ppt "Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo."

Similar presentations


Ads by Google