Download presentation

Presentation is loading. Please wait.

Published byDominik Silverthorn Modified over 2 years ago

1
Neural Networks and SVM Stat 600

2
Neural Networks History: started in the 50s and peaked in the 90s Idea: learning the way the brain does. Numerous applications – Handwriting, face, speech recognition – Vehicles that drive themselves – Models of reading, sentence production, dreaming

3
Non-linear Regression At the end this is a non-linear regression problem. Let us consider our usual data set: Y (response, numerical or categorical) X1,…,Xp my predictors In the linear model we model Y as: Y=X + Here we say that Y is a function of Y = g(H ) + e Where h = f(X ) So essentially it is non-linear because of the functions g or f. The function g is generally chosen as the logit transform, [1+e -z ] -1

4
Model Form

5
Parameter Estimation In order to control the level of overfitting we use penalized least squares which penalize for the overfit using a Ridge regression like squared error penalty. The penalty is imposed NOT on the number of parameters but on the MAGNITUDE of the parameters. The criterion is given by:

6
Rcode for NNet #neural networks Library(nnet) nnetmodel=nnet(class~.,data=train.all,size=8,decay=.2,linout=FALSE, entropy=TRUE) nnetmodel nnetpred1=predict(nnetmodel,newdata=train.all,type="class") nnetpred2=predict(nnetmodel,newdata=test.all,type="raw") table(nnetpred1,test.all$class) library(devtools) source_url('https://gist.github.com/fawda123/7471137/raw/c720af2cea5f31 2717f020a09946800d55b8f45b/nnet_plot_update.r') plot.nnet(nnetmodel)

7
Example: Apple data

8
Fitting Neural Networks Generally the gradient descent method is used to fit the models where: The r is the learning rate taken as a constant and can be optimized by a line search that minimizes error function at each update.

9
Issues Starting Values: Pick weights close to zero to start the process Overfitting: Ridge or other penalties are used Scaling inputs: good idea to scale weights Number of hidden layers: better to have too many than too few

10
Support Vector Machines Highly flexible, powerful modeling methods Remember in linear regression we seek parameter estimates that minimize SSE, and a drawback is that outliers affect this minimization. In Robust regression we use HUBER weights to minimize the effect of influential observations. SVM for regression uses a similar function to Huber but with a difference. – In SVM (given the threshold) set by the researcher, data points with residuals within the threshold DO NOT contribute to the regression fit, while data points with absolute difference greater than the threshold contribute a linear scale amount. – Samples that fit the model well have NO effect on the regression. – If threshold is set high, ONLY the outliers affect the regression.

11
SVM Estimation To estimate the model parameters SVM uses a user specified loss function Le but also adds a penalty. The SVM coefficients minimize: The cost penalty is specified by the user and penalizes for a LARGE residual (this is opposite of Ridge regression and Nnet, which puts the penalty for large betas).

12
Svm PLOT FOR PROTEIN DATA X1 TO X7

Similar presentations

OK

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on mathematics for class 7 Ppt on recent natural disasters Ppt on natural disasters Ppt on rainwater harvesting in india Ppt on sources of energy for class 8th december Ppt on switching network Convert pdf ppt to ppt online Ppt on levels of organization in biology Ppt on operating system of mobile Ppt on history of atom