Presentation is loading. Please wait.

Presentation is loading. Please wait.

Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in.

Similar presentations


Presentation on theme: "Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in."— Presentation transcript:

1 Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in GAM

2 Datamining and statistical learning - lecture 9 Linear regression models The inputs can be:  quantitative inputs  functions of quantitative inputs  base expansions of quantitative inputs  dummy variables  interaction terms

3 Datamining and statistical learning - lecture 9 Justification of linear regression models  Many response variables are linearly or almost linearly related to a set of inputs  Linear models are easy to comprehend and to fit to observed data  Linear regression models are particularly useful when: the number of cases is moderate data are sparse the signal-to-noise ratio is low

4 Datamining and statistical learning - lecture 9 Performance of predictors based on: (i) a simple linear regression model (ii) a quadratic regression model when the true expected response is a second order polynomial in the input Predictions based on a linear modelPredictions based on a quadratic model

5 Datamining and statistical learning - lecture 9 Logistic regression of multiple purchases vs first amount spent

6 Datamining and statistical learning - lecture 9 Logistic regression for a binary response variable Y The expectation of Y given x is a linear function of x

7 Datamining and statistical learning - lecture 9 Generalized additive models: some examples A nonlinear, additive model A mixed linear and nonlinear, additive model A mixed linear and nonlinear, additive model with a class variable

8 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Observed data Fitted model Output: Total-N conc Inputs: Monthly pattern Trend function

9 Datamining and statistical learning - lecture 9 Modelling the concentration of total nitrogen at Lobith on the Rhine: Extracted additive components Year components Month components

10 Datamining and statistical learning - lecture 9 Weekly mortality and confirmed cases of influenza in Sweden Response: Weekly mortality Inputs: Confirmed cases of influenza Seasonal dummies Long-term trend

11 Datamining and statistical learning - lecture 9 SYNTAX for common GAM models Type of ModelSyntaxMathematical Form Parametricmodel y = param(x); Nonparametricmodel y = spline(x); Nonparametricmodel y = loess(x); Semiparametricmodel y = param(x1) spline(x2); Additivemodel y = spline(x1) spline(x2); Thin-plate splinemodel y = spline2(x1,x2);

12 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run; Model 2 proc gam data=Mining.Rhine; model Nconc = spline2(Year, Month); output out = addmodel2; run;

13 Datamining and statistical learning - lecture 9 Proc GAM – degrees of freedom of the spline components The degrees of freedom of the spline components is selected by the user or by specifying method=GCV proc gam data=Mining.Rhine; model Nconc = spline(Year, df=3) spline(Month, df=3); output out = addmodel1; run; Df=3 implies that the same cubic polynomial is valid in the entire range of the input Increasing the df-value implies that knots are introduced

14 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run;

15 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1

16 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 df=4

17 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 3 df=20

18 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine The GAM Procedure Dependent Variable: Nconc Smoothing Model Component(s): spline(Year) spline(Month) Summary of Input Data Set Number of Observations 168 Number of Missing Observations 0 Distribution Gaussian Link Function Identity Iteration Summary and Fit Statistics Final Number of Backfitting Iterations 2 Final Backfitting Criterion 1.987193E-30 The Deviance of the Final Estimate 42.92519322 The local score algorithm converged. Model 1

19 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Regression Model Analysis Parameter Estimates Parameter Standard Parameter Estimate Error t Value Pr > |t| Intercept 420.69388 19.84413 21.20 <.0001 Linear(Year) -0.20824 0.00994 -20.94 <.0001 Linear(Month) -0.10461 0.01161 -9.01 <.0001 Smoothing Model Analysis Analysis of Deviance Sum of Source DF Squares Chi-Square Pr > ChiSq Spline(Year) 3.00000 2.527155 9.3609 0.0249 Spline(Month) 3.00000 51.143931 189.4432 <.0001 Model 1

20 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2

21 Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 (20 df)

22 Datamining and statistical learning - lecture 9 Estimation of additive models - the backfitting algorithm

23 Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden proc gam data=sasuser.smhi; model lnDaily_consumption = spline(Meantemp, df=20); ID Time; output out=smhiouttemp pred resid; run;

24 Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden: residual analysis

25 Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies

26 Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies Splines of contemporaneous and time-lagged weather data Splines of Julian day and time Weekday and holiday dummies

27 Datamining and statistical learning - lecture 9 Deviance analysis of the investigated models of ln daily electricity consumption in Sweden The residual deviance of a fitted model is minus twice its log-likelihood If the error terms are normally distributed, the deviance is equal to the sum of squared residuals

28 Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption in Sweden: time series plot of residuals


Download ppt "Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in."

Similar presentations


Ads by Google