Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites.

Similar presentations


Presentation on theme: "Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites."— Presentation transcript:

1 Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites

2 Data mining and statistical learning - lab2-4 SAS code for ridge regression proc reg data=mining.dailytemperature outest = dtempbeta ridge=0 to 10 by 1; model daily_consumption = stockholm g_teborg malm_ /p; output out=olsoutput pred=olspred; proc print data=dtempbeta; run;

3 Data mining and statistical learning - lab2-4 Estimated regression parameters in ridge regression

4 Data mining and statistical learning - lab2-4 Predicted vs observed values in OLS regression and ridge regression - trade-off between variance and bias

5 Data mining and statistical learning - lab2-4 Fat content vs absorbance in different channels (wavelengths)

6 Data mining and statistical learning - lab2-4 OLS regression fat vs channel10, channel30, channel50, channel70, channel90

7 Data mining and statistical learning - lab2-4 OLS regression fat vs channel1 – channel 100

8 Data mining and statistical learning - lab2-4 OLS regression fat vs channel1 – channel 100

9 Data mining and statistical learning - lab2-4 OLS regression with strongly correlated predictors If the X T X matrix has not full rank (some X -variables are linearly dependent) the mean square solution is not unique If the X -variables are strongly correlated, then: (i) the regression coefficients will be uncertain; (ii) the predictions may be OK

10 Data mining and statistical learning - lab2-4 Principal Component Analysis of lake survey data Some variables vary much more than others How does this influence principal components derived from the covariance and correlation matrices, respectively?

11 Data mining and statistical learning - lab2-4 Principal Component Analysis of lake survey data - score plot derived from the correlation matrix

12 Data mining and statistical learning - lab2-4 Principal Component Analysis of lake survey data - eigenvectors derived from the correlation matrix

13 Data mining and statistical learning - lab2-4 Principal Component Analysis of lake survey data with outliers removed - score plot derived from the correlation matrix

14 Data mining and statistical learning - lab2-4 Principal Component Analysis of lake survey data with outliers removed - eigenvectors derived from the correlation matrix

15 Data mining and statistical learning - lab2-4 Principal Component Analysis of lake survey data with outliers removed - MINITAB score plot derived from the correlation matrix

16 Data mining and statistical learning - lab2-4 Principal Component Analysis of lake survey data with outliers removed - MINITAB loading plot derived from the correlation matrix

17 Data mining and statistical learning - lab2-4 Regression of an indicator matrix Find a linear function which is (on average) one for objects in class 1 and otherwise (on average) zero Find a linear function which is (on average) one for objects in class 1 and otherwise (on average) zero Assign a new object to class 1 if

18 Data mining and statistical learning - lab2-4 Discriminant analysis - decision border

19 Data mining and statistical learning - lab2-4 3D-plot of an indicator matrix for class 1

20 Data mining and statistical learning - lab2-4 3D-plot of an indicator matrix for class 2

21 Data mining and statistical learning - lab2-4 Regression of an indicator matrix - discriminating function Estimate discriminant functions for each class, and then classify a new object to the class with the largest value for its discriminant function

22 Data mining and statistical learning - lab2-4 Linear discriminant analysis (LDA) LDA is an optimal classification method when the data arise from Gaussian distributions with different means and a common covariance matrix


Download ppt "Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites."

Similar presentations


Ads by Google