Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Similar presentations


Presentation on theme: "Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship."— Presentation transcript:

1 Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship to other prediction models  Some simple examples of neural networks  Parameter estimation  Joint framework for prediction and classification  Features of neural networks

2 Data mining and statistical learning - lecture 11 Ordinary least squares regression (OLS) x1x1 x2x2 xpxp … y Model: Terminology:  0 : intercept (or bias)  1, …,  p : regression coefficients (or weights) The response variable responds directly and linearly to changes in the inputs

3 Data mining and statistical learning - lecture 11 Principal components regression (PCR) Extract principal components (linear combinations of the inputs) as derived features, and then model the target (response) as a linear function of these features x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y The response variable responds indirectly and linearly to changes in the inputs

4 Data mining and statistical learning - lecture 11 Neural network with a single target Output x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y Hidden layer of neurons Inputs The response to changes in inputs is indirect and nonlinear

5 Data mining and statistical learning - lecture 11 Neuron Sigmoid activation function

6 Data mining and statistical learning - lecture 11 Neural networks with a single target Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function (activation function) of these features x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y

7 Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target x z y

8 Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target x z y

9 Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target - a simple example Select Advanced user interface Select 1 hidden node Tick Outputs from Training,…

10 Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target

11 Data mining and statistical learning - lecture 11 Output from proc Neural - one input, one neuron, one target Parameter Estimates Gradient Objective N Parameter Estimate Function 1 x_H11 -5.851506 0.000000103 2 BIAS_H11 -0.032606 -0.000001516 3 H11_y -1.017515 1.8123827E-8 4 BIAS_y -0.006434 1.2814216E-8 Value of Objective Function = 0.0106538302 H11 = Hidden layer 1, neuron 1

12 Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target - manual calculation of predicted values Parameter Estimates Gradient Objective N Parameter Estimate Function 1 x_H11 -5.851506 0.000000103 2 BIAS_H11 -0.032606 -0.000001516 3 H11_y -1.017515 1.8123827E-8 4 BIAS_y -0.006434 1.2814216E-8 Standardize x to mean zero and variance one Compute xstand*x_H11+BIAS_H11 Take tanh to compute z Compute z*H11_y+BIAS_y

13 Data mining and statistical learning - lecture 11 Neural networks with one input, two neurons, and one target x z1z1 z2z2 y

14 Data mining and statistical learning - lecture 11 Output from proc Neural - one input, two neurons, one target Parameter Estimates Gradient Objective N Parameter Estimate Function 1 x_H11 -4.040296 -0.000006221 2 x_H12 -4.755015 0.000008922 3 BIAS_H11 0.449445 -0.000046905 4 BIAS_H12 0.176599 0.000092579 5 H11_y 0.767115 0.000009568 6 H12_y -1.781053 0.000026628 7 BIAS_y -0.014300 -0.000086070 Value of Objective Function = 0.0104173896

15 Data mining and statistical learning - lecture 11 Absorbance records for ten samples of chopped meat 1 response variable (fat) 100 predictors (absorbance at 100 wavelengths or channels) The predictors are strongly correlated to each other

16 Data mining and statistical learning - lecture 11 Absorbance records for 215 samples of chopped meat The target is poorly correlated to each predictor

17 Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - the fat content and absorbance dataset A total of (p+2)*3+1 parameters are estimated x1x1 x1x1 xpxp z1z1 z2z2 z3z3 … y

18 Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - parameter estimates for a model with three neurons. 291 Channel90_H13 -0.534226 -0.243706 292 Channel91_H13 -0.590502 -0.245327 293 Channel92_H13 -0.482705 -0.246851 294 Channel93_H13 -0.528643 -0.248195 295 Channel94_H13 -0.333949 -0.249403 296 Channel95_H13 -0.258637 -0.250348 297 Channel96_H13 0.162351 -0.250953 298 Channel97_H13 0.273746 -0.251128 299 Channel98_H13 0.711445 -0.250887 300 Channel99_H13 0.879623 -0.250285 301 BIAS_H11 -2.144805 0.003961 302 BIAS_H12 0.738894 0.095724 303 BIAS_H13 -0.771776 0.587769 304 H11_Fat -1.504744 0.054906 305 H12_Fat -15.057170 -0.025459 306 H13_Fat -18.345040 0.006471 307 BIAS_Fat 16.856496 -0.029187 Value of Objective Function = 0.3045279048 A total of 307 parameters

19 Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - output from a model with three neurons

20 Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - output from models with 1 to 10 neurons Convergence problems

21 Data mining and statistical learning - lecture 11 Neural networks with multiple targets Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function (activation function) of these features x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK …

22 Data mining and statistical learning - lecture 11 Neural networks for K-class classification With the softmax activation function and the deviance (cross-entropy) error function the neural network model is exactly a logistic regression model in the hidden units, and all the parameters are estimated by maximum likelihood x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK …

23 Data mining and statistical learning - lecture 11 Neural networks for regression and K-class classification For regression, we use the sum-of- squared errors as our measure of fit For classification, we normally use the deviance (cross-entropy) error function and the corresponding classifier is. x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK …

24 Data mining and statistical learning - lecture 11 Fitting neural networks x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK … M(p+1)+K(M+1) parameters (weights) We don’t want the global minimizer of the deviance (cross-entropy) function. Instead we use early stopping or a penalty term

25 Data mining and statistical learning - lecture 11 Neural networks  Provide a joint framework for prediction and classification  Can describe both linear and nonlinear responses  Can accommodate multidimensional correlated inputs  Are normally over-fitted – validation is a must  Are difficult to interpret  Convergence problems are not uncommon

26 Data mining and statistical learning - lecture 11 Some characteristics of different learning methods CharacteristicNeural networksTrees Natural handling of data of “mixed” type Handling of missing values Robustness to outliers in input space Insensitive to monotone transformations of inputs Computational scalability (large N) Ability to deal with irrelevant inputs Ability to extract linear combinations of features GoodPoor InterpretabilityPoorFair/good Predictive powerGoodPoor


Download ppt "Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship."

Similar presentations


Ads by Google