Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scatterplot Smoothing Using PROC LOESS and Restricted Cubic Splines

Similar presentations


Presentation on theme: "Scatterplot Smoothing Using PROC LOESS and Restricted Cubic Splines"— Presentation transcript:

1 Scatterplot Smoothing Using PROC LOESS and Restricted Cubic Splines
Jonas V. Bilenas Barclays Global Retail Bank/UK Adjunct Faculty, Saint Joseph University, School of Business June 23, 2011

2 Introduction In this tutorial we will look at 2 scatterplot smoothing techniques: The LOESS Procedure: Non-parametric regression smoothing (local regression or DWLS; Distance Weighted Least Squares). Restricted Cubic Splines: Parametric smoothing that can be used in regression procedures to fit functional models.

3 SUG, RUG, & LUG Pictures

4 LOESS documentation from SAS
The LOESS procedure implements a nonparametric method for estimating regression surfaces pioneered by Cleveland, Devlin, and Grosse (1988), Cleveland and Grosse (1991), and Cleveland, Grosse, and Shyu (1992). The LOESS procedure allows great flexibility because no assumptions about the parametric form of the regression surface are needed. The main features of the LOESS procedure are as follows: fits nonparametric models supports the use of multidimensional data supports multiple dependent variables supports both direct and interpolated fitting that uses kd trees performs statistical inference performs automatic smoothing parameter selection performs iterative reweighting to provide robust fitting when there are outliers in the data supports graphical displays produced through ODS Graphics

5 LOESS Procedure Details
LOESS fits a local regression function to the data within a chosen neighborhood of points. The radius of each neighborhood is chosen so that the neighborhood contains a specified percentage of the data points. This percentage of the region is specified by a smoothing parameter (0 < smooth <= 1). The larger the smoothing parameter the smoother the graphed function. Default value of smoothing is at 0.5. Smoothing parameter can also be optimized: AICC specifies the AICC criterion.. AICC1 specifies the AICC1 criterion. GCV specifies the generalized cross validation criterion. The regression procedure performs a fit weighted by the distance of points from the center of the neighborhood. Missing values are deleted.

6 Example of some LOESS proc loess data=sashelp.cars;
ods output outputstatistics=outstay; model MPG_Highway=MSRP /smooth=0.8 alpha=.05 all; run; Fit Summary Fit Method kd Tree Blending Linear Number of Observations Number of Fitting Points kd Tree Bucket Size Degree of Local Polynomials Smoothing Parameter Points in Local Neighborhood Residual Sum of Squares Trace[L] GCV AICC AICC Delta Delta Equivalent Number of Parameters Lookup Degrees of Freedom Residual Standard Error

7 SUG, RUG, & LUG Pictures

8 Example of some LOESS proc sort data=outstay; by pred; run;
axis1 label = (angle=90 "MPG HIGHWAY"); axis2 label = (h=1.5 "MSRP"); symbol1 i=none c=black v=dot h=0.5; symbol2 i=j value=none color=red l=1 width=30; proc gplot data=outstay; plot (depvar pred)*MSRP / overlay haxis=axis2 vaxis=axis1 grid; title "LOESS Smooth=0.8"; run;quit;

9

10 LOESS with ODS GRAPHICS
ods html; ods graphics on; proc loess data=sashelp.cars; model MPG_Highway=MSRP /smooth=( ) alpha=.05 all; run; ods grapahics off; ods html close;

11 Optimized LOESS ods html; ods graphics on;
proc loess data=sashelp.cars; model MPG_Highway=MSRP / SELECT=AICC; run; ods grapahics off; ods html close;

12 LOESS in SGPLOT ods html; ods graphics on; title 'LOESS/SMOOTH=0.60';
proc sgplot data=sashelp.cars; loess x=MSRP y=MPG_Highway / smooth=0.60; run; quit; ods graphics off; ods html close;

13 Optimized LOESS ods html; ods graphics on;
proc loess data=sashelp.cars; model MPG_Highway=MSRP Horsepower / SELECT=AICC; run; ods grapahics off; ods html close;

14 SUG, RUG, & LUG Pictures

15 LOESS for Time Series Plots
ods html; ods graphics on; title 'Time series plot'; proc loess data=ENSO; model Pressure = Month / SMOOTH= ; run; quit; ods graphics off; ods html close; Data from Cohen (SUGI 24) Data also online:

16 LOESS for Time Series Plots (AICC optimized)

17 Large Number of Observations
Peter Flom Blog. Set PLOTS(MAXPOINTS= ) in PROC LOESS. Default limit is 5000, Run PROC LOESS on all data. But plot after binning independent variable and running means on binned data. proc loess data=test; /* output 300 for each record */ axis1 label = (angle=90 "MPG HIGHWAY") ods output outputstatistics=outstay; ; model MPG_Highway=horsepower axis2 label = (h=1.5 "Horsepower"); /smooth=0.4 ; run; symbol1 i=none c=black v=dot h=0.5; symbol2 i=j value=none color=red l=1 width=10; proc rank data=outstay groups=100 ties=low out=ranked; var horsepower; proc gplot data=means; ranks r_horsepower; plot (depvar pred)*Horsepower / overlay haxis=axis2 proc means data=ranked noprint nway; vaxis=axis1 class r_horsepower; grid; var depvar pred Horsepower; title "LOESS Smooth=0.4"; output out=means mean=; run;quit;

18 Large Number of Observations

19 SUG, RUG, & LUG Pictures

20 Restricted Cubic Splines
Recommended by Frank Harrell Knots are specified in advanced. Placement of Knots are not important. Usually determined predetermined percentiles based on sample size, k Quantiles

21 Restricted Cubic Splines
Percentile values can be derived using PROC UNIVARIATE. Can Optimize number of Knots selecting number based on minimizing AICC. Provides a parametric regression function. Sometimes knot transformations make for difficult interpretation. May be difficult to incorporate interaction terms. Much more efficient than categorizing continuous variables into dummy terms. Macro available:

22 Restricted Cubic Splines
proc univariate data=sashelp.cars noprint; var horsepower; output out=knots pctlpre=P_ pctlpts= ; run; proc print data=knots; run; Obs P_5 P_27_5 P_50 P_72_5 P_95

23 Restricted Cubic Splines
options nocenter mprint; data test; set sashelp.cars; %rcspline (horsepower,115, 170, 210, 245, 340); run; LOG: MPRINT(RCSPLINE): DROP _kd_; MPRINT(RCSPLINE): _kd_= ( )** ; MPRINT(RCSPLINE): horsepower1=max((horsepower-115)/_kd_,0)**3+(( )*max((horsepower-340)/_kd_,0)**3 -( )*max((horsepower-245)/_kd_,0)**3)/( ); MPRINT(RCSPLINE): ; horsepower2=max((horsepower-170)/_kd_,0)**3+(( )*max((horsepower-340)/_kd_,0)**3 -( )*max((horsepower-245)/_kd_,0)**3)/( ); horsepower3=max((horsepower-210)/_kd_,0)**3+(( )*max((horsepower-340)/_kd_,0)**3 -( )*max((horsepower-245)/_kd_,0)**3)/( ); 43 run;

24 Restricted Cubic Splines
proc reg data=test; model MPG_Highway = horsepower horsepower1 horsepower2 horsepower3; LINEAR: TEST horsepower1, horsepower2, horsepower3; run; quit; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept <.0001 Horsepower <.0001 horsepower <.0001 horsepower <.0001 horsepower Test LINEAR Results for Dependent Variable MPG_Highway Mean Source DF Square F Value Pr > F Numerator <.0001 Denominator

25 Restricted Cubic Splines (5 Knots)

26 Restricted Cubic Splines (7 Knots): Time Series Data
Regression terms not significant

27 SUG, RUG, & LUG Pictures

28 References Akaike, H. (1973), “Information Theory and an Extension of the Maximum Likelihood Principle,” in Petrov and Csaki, eds., Proceedings of the Second International Symposium on Information Theory, 267–281. Cleveland, W. S., Devlin, S. J., and Grosse, E. (1988), “Regression by Local Fitting,” Journal of Econometrics, 37, 87–114. Cleveland, W. S. and Grosse, E. (1991), “Computational Methods for Local Regression,” Statistics and Computing, 1, 47–62. Cohen, R.A. (SUGI 24). “An Introduction to PROC LOESS for Local Regression,” Paper Harrell, F. (2010). “Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis (Springer Series in Statistics),” Springer. Harrell RCSPLINE MACRO: C. J. Stone and C. Y. Koo (1985), “Additive splines in statistics,” In Proceedings of the Statistical Computing Section ASA, pages 45{48, Washington, DC, [34, 39]


Download ppt "Scatterplot Smoothing Using PROC LOESS and Restricted Cubic Splines"

Similar presentations


Ads by Google