# Speed Dating with Regression Procedures

## Presentation on theme: "Speed Dating with Regression Procedures"— Presentation transcript:

Speed Dating with Regression Procedures
David J Corliss, PhD Wayne State University Physics and Astronomy / Public Outreach

Model Selection Flowchart
NON-LINEAR LINEAR MIXED NON-PARAMETRIC

Decision: Continuous or Discrete Outcome
PROC LOGISTIC PROC REG

Simple Linear Regression
Regression Type: Continuous, linear General regression procedure with a number of options but limited specialized capabilities, for which other procedures or packages have been developed Choice of model variable selection methods (e.g., Forward, Backwards, Best Subsets), can be coded for polynomial regression, multiple model statements and features interactive capability SAS = REG, R = lm function, regress

Simple Linear Regression Model - Percent of Student Population
Example: Homeless Students by State Solid performance of the model across the range from low to high homelessness states indicates consistency of factors correlated with the number of homeless students Actual Percent r2=.652 Model - Percent of Student Population

Special Data Needs: Problems with Outliers Robust Regression
Regression Type: Continuous, linear Robust regression is achieved by identifying outliers, limiting their influence by assigning weights and then performing standard regression Choice of methods for outlier detection e.g. M, LTS, S and MM estimation; robust ANOVA SAS = ROBUSTREG, R = robustbase, robust

Example: Log-Log Regression With Weighted Outliers
PROC ROBUSTREG Example: Log-Log Regression With Weighted Outliers SAS/STAT® 9.2 User’s Guide, support.sas.com In Robust Regression, the outliers need not be disregarded: weights can be assigned and incorporated in the regression

Special Data Needs: Ill-Conditioned Data Regression Using Givens Rotations
Regression Type: Continuous, linear Regression using the Gentleman-Givens procedure instead of collecting crossproducts For ill-conditioned data, where small errors in the data may cause large errors in the results – more accurate than simple regression SAS = ORTHOREG, R = givens

Givens Rotation Regression
Example: Fitting a Higher-Order Polynomial SAS/STAT® 9.2 User’s Guide, support.sas.com An example of fitting a 9th-degree polynomial, where near singularities must be distinguished from true ones

Special Data Needs: Transformation Regression with Data Transformation
Regression Type: Continuous, linear Regression with a number of data transformations, including smooth, spline, Box-Cox and other non-linear forms Supports fitting splines with a user-specified degree and number of knots; capable of piece-wise solutions SAS = TRANSREG, R = reg, betareg

Regression with Data Transformation
Example: Spline Regression to a Complex Form Splines used to fit to a spectrographic line profile to determine the radial velocity of erupting gas from a star

Special Model Types: General Linear General Linear Models
Regression Type: Continuous, linear General purpose procedure for continuous least squares regression using classification predictor variables as well as continuous While capable of many types of models and analysis, another procedure is often better for a specific task SAS = GLM, R = glm function

Distribution of Response GLM used with Box and Whisker output
General Linear Model Example: Age Group as a Categorical Predictor Variable Distribution of Response agegroup An Overview of ODS Statistical Graphics in SAS® 9.3 Robert N. Rodriguez, SAS Institute Inc., Cary, NC GLM used with Box and Whisker output

Special Model Types: By Quantile Quantile Regression
Regression Type: Continuous, linear Quantile regression: while other procedures model the mean, quantile regression models the median and other specified quantiles to provide a more complete picture of the response variable Uncertainties for individual quantiles can be estimated by bootstrapping SAS = QUANTREG, R = quantreg

Example: 5/10/ 25/50/75/90/95% Quantiles
Quantile Regression Example: 5/10/ 25/50/75/90/95% Quantiles Predicted birth weight by maternal weight gain Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY An example of Quantile Regression demonstrating greater detail than possible with ordinary regression

Special Model Types: PLS, PCA Regression
Partial Least Squares & Principal Components Regression Type: Continuous, linear Partial Least Squares and Principal Component regression: predictor and response variables are projected into a new coordinate systems, possibly with reduced complexity Supports reduced rank regression with cross validation of the number of components SAS = PLS, R = pls

Partial Least Squares / Principal Components
Example: Variable Importance Plot Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY Principal Component variables derived from the original, observed variables

Special Model Types: Survey Data Survey Regression
Regression Type: Continuous, linear Special capabilities for analysis in the presence of common survey data features, including stratification, clustering and weighting Supports several methods for sampling and estimation of sampling error using either Taylor series or primary sample units SAS = SURVEYREG, R = survey

Example: Regression with Stratified Sampling
Survey Regression Example: Regression with Stratified Sampling Stratum Information Stratum Index State Region N Obs Population Total Sampling Rate 1 Iowa 3 100 3.00% 2 5 50 10.0% 15 20.0% 4 Nebraska 6 30 40 5.00% Estimated Regression Coefficients Parameter Estimate Standard Error t Value Pr > |t| Intercept 2.22 0.0433 FarmArea 4.66 0.0004 Tests of Model Effects Effect Num DF F Value Pr > F Model 1 21.74 0.0004 Intercept 4.93 0.0433 FarmArea Note: The denominator degrees of freedom for the F tests is 14. Covariance of Estimated Regression Coefficients Intercept FarmArea PROC SURVEYREG sas.support.com, example 98.4 Example output from application to survey data, with summary statistics and model parameters

Special Model Types: PH on Survey Data Proportional Hazards with Survey Data
Regression Type: Continuous, linear Performs Cox Proportional Hazards modeling on survey data with truncation, supporting stratification, clustering and weighting Performs estimation of variance by model parameters by Taylor series, BRR or Jackknife SAS = SURVEYPHREG, R = survey

Proportional Hazards with Survey Data
Example: Stratified Sampling with Truncated Data Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error t Value Pr > |t| Hazard Ratio BodyWeight 586 3.78 0.0002 1.012 Smoke -1 -1.59 0.1129 0.309 Smoke 1 -1.74 0.0826 0.365 Smoke 2 -1.21 0.2278 0.510 Smoke 3 . 1.000 Type III Tests of Model Effects Effect Num DF Den DF F Value Pr > F BodyWeight 1 586 14.27 0.0002 Smoke 3 1.49 0.2160 Estimate Label Standard Error DF t Value Pr > |t| Exponentiated Row 1 0.3870 586 -1.95 0.0521 0.4709 PROC SURVEYPHREG sas.support.com, example 97.2 Example output for Proportional Hazards regression on survey data with truncation: summary statistics and model parameters

Special Model Types: Categorical Regression on Categorical Data
Regression Type: Continuous, linear A generalization of continuous methods to categorical data, performs linear regression and other analyses on data than can be expressed in a contingency tables Supports both ordinary and logistic regression, log-linear and repeated measures SAS = CATMOD, R = catdata, vgam

Regression on Categorical Data
Example: Bartlett's Data, No 3-Variable Interaction Data Summary Response Length*Time*Status Response Levels 8 Weight Variable wt Populations 1 Data Set BARTLETT Total Frequency 960 Frequency Missing Observations Response Profiles Response Length Time Status 1 2 3 4 5 6 7 8 Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Length 1 2.64 0.1041 Time 5.25 0.0220 Length*Time Status 48.94 <.0001 Length*Status Time*Status 95.01 Likelihood Ratio 2.29 0.1299 PROC CATMOD sas.support.com, example 28.4 Example output from regression on categorical data, with summary statistics and model parameters

Special Model Types: Complex Optimization Response Surface Regression
Regression Type: Continuous, linear Linear regression for fitting quadratic Response Surface Models – a type of general linear model that identifies where optimal response values occur more efficiently than ordinary regression or GLM Output displays the Response Surface and identifies ridges of optimum response SAS = RSREG, R = rsm

Response Surface Regression
Example: A Response Surface with Optimal Solution Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY An example of a response surface with the optimal solution found at the minimum; multiple minima and maxima are possible

Special Model Types: Time to Failure Survival Analysis
Regression Type: Continuous, linear Models time to failure data as a linear combination of predictors and a random disturbance term, which can be described by many different distributions Supports standard survival analysis data censored on the right, left, both or neither SAS = LIFEREG, R = survival

Example: A Cumulative Hazard Model
Survival Analysis Example: A Cumulative Hazard Model Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY This example plots the log-logistic vs. the Kaplan-Meier Cumulative Hazard

Special Model Types: Time-dependent Risk Proportional Hazards Model
Regression Type: Continuous, linear Cox Proportional Hazards modeling, where the a unit increase in a predictor multiplies the risk by a factor determined by the model Supports proportional hazards models with data censored on the right, left, both or neither, variable selection by multiple methods incl. best subset SAS = PHREG, R = coxph

Proportional Hazards Model
Example: Model With Time-Dependent Predictors Example output from a Proportional Hazards model, with summary statistics and model parameters

Special Model Types: Simultaneous Outcomes Structural Equation Models
Regression Type: Continuous, linear In Structural Equation Modeling, a linear combination of predictors describes a vector equal to a linear combination of outcome variables Supports latent variables, multiple and multivariate regression, path analysis and canonical correlation SAS = CALIS, R = sem

Structural Equation Model
Example: Linear Relations among Factor Loadings Example output from a Structural Equation model, with matrices of model parameters

Discrete Outcomes: Simple Logistic Logistic Regression
Regression Type: binary & ordinal outcomes, linear General procedure for logistic regression with a number of options; other procedures may offer more capabilities for specific types of discrete models Supports many model variable selection methods and diagnostic tests SAS = LOGISTIC, R = glm function

Discrete Outcomes: Simple Logistic Logistic Regression
Data: IDRE / UCLA Example data and output from a Logistic Regression model, with summary statistics and model parameters

Discrete Outcomes: Generalized General Linear Models
Regression Type: discrete outcomes, linear Generalized linear models with discrete outcomes, appropriate where the data are not normally distributed or the variance is not the same for all observations Supports Poisson Regression and Repeated Measures SAS = GENMOD, R = glm function

Discrete Outcomes: Generalized General Linear Models
Example output from a General Linear Regression model of a discrete outcome, with summary statistics and model parameters

Discrete Outcomes: Outcome Probability PROBIT Models
Regression Type: discrete outcomes, linear Models the probability that an observation will have a particular outcome Supports probit, logit, ordinal logistic, and extreme value / gompit SAS = PROBIT, R = glm, family = binomial(link = "probit")

Discrete Outcomes: Outcome Probability PROBIT Models
Example data and output from a PROBIT model, with summary statistics and model parameters

Non-Linear Models: General Non-Linear Models
Regression Type: non-linear Performs non-linear regression with the dependent variable divided into a mean component and a (random) error component; process is iterative Supports steepest-descent, Newton, modified Gauss-Newton and Marquardt methods SAS = NLIN, R = nls function, nleqslv

Example: Fitting a Model to a Complex Curve
Non-Linear Models Example: Fitting a Model to a Complex Curve In this example observations are normally distributed about a non-linear function – in this case, a Morlet wavelet

Non-Linear Models: Mixed Effects Non-Linear Mixed-Effects Models
Regression Type: non-linear Performs non-linear regression where both the mean and errors components of the dependent variable are non-linear; process uses a Taylor series expansion about zero Supports normal, binomial and Poisson distributions and capability for programing a general distribution SAS = NLMIXED, R = nlme

Non-Linear Mixed-Effects Models
Example: Plot of Profile of Trees Over Time In this example, variability the shape of observed trees increases over time

Linear Mixed: Fixed and Random Effects
Mixed Models Regression Type: linear, fixed and random effects Performs linear regression using a linear combination of fixed effects added to a second linear combination of random effects Supports repeated measures in longitudinal studies; especially useful for dealing with missing data SAS = MIXED, R = lme4, coxme

Linear Mixed-Effects Models
Example: Repeated Measures Example of a Mixed Effects Model, incorporating both fixed and random effects to improve the predictive power

Linear Mixed: General General Mixed Models
Regression Type: linear mixed Generalization of mixed models to permit normally-distributed random effects and non-normal error terms Supports fitting models to correlated data or where the variability is not constant SAS = GLIMMIX, R = lme4

Example: Crossed Random Effects
General Mixed Models Example: Crossed Random Effects LOESS with crossed random effects analyzes in-breeding in an isolated population, allowing generalization to all populations

Non-Parametric Models: Localized
Local Regression Regression Type: linear, non-parametric Develops a model using non-parametric regression to segments of data and calculates confidence limits for the outcome; computationally intensive Supports multiple dependent variables, multidimensional predictors and interpolation using kd trees SAS = LOESS, R = locfit

Example: Periodicities in Weather Data
Local Regression Example: Periodicities in Weather Data In this example, Local Regression is used to identify potential periodicities at 12 and 42 months

Regression Type: linear, non-parametric Generalized Additive Models, with multiple independent non-parametric predictors; univariate smoothing provides finer details than is possible with the piece-wise LOESS procedure Supports non-parametric and semi-paramentric models, multidimensional predictors SAS = GAM, R = gam

Example: Segmented Response Surface
Additive Model Example: Segmented Response Surface An Additive Model used to fit a complex response surface without loss of detail to due piece-wise fitting in local regression

Questions