SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Research Support Center Chongming Yang
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Presentation and Data  Short Courses  Intro to SAS  Download Data to Desktop 1.
Multiple Linear Regression
PROC GLIMMIX: AN OVERVIEW
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
EPI 809/Spring Multiple Logistic Regression.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
ESTIMATION. STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
Lecture 4 SIMPLE LINEAR REGRESSION.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
Chapter 13 Multiple Regression
Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR.
Simple linear regression Tron Anders Moger
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Lecture 4 Ways to get data into SAS Some practice programming
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Experimental Statistics - week 9
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Appendix I A Refresher on some Statistical Terms and Tests.
Logistic Regression APKC – STATS AFAC (2016).
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
CHAPTER 7 Linear Correlation & Regression Methods
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Generalized Linear Models
Generalized Linear Models
Generalized Linear Models (GLM) in R
6-1 Introduction To Empirical Models
Mathematical Foundations of BME Reza Shadmehr
Logistic Regression.
Presentation transcript:

SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

What will the output from this program look like? How many variables will be in the dataset example, and what will be the length and type of each variable? What will the variable package look like?

What will the output from this program look like?

Modeling with SAS examine relationships between variables estimate parameters and their standard errors calculate predicted values evaluate the fit or lack of fit of a model test hypotheses design outcome

The linear model Example: Note: outcome variable must be continuous and normal given independent variables

the linear model with proc reg  estimates parameters by least squares  produces diagnostics to test model fit (e.g. scatter plots)  tests hypotheses Example: proc reg data=mydata; model weight = height age; run;

proc reg Syntax: proc reg ; model response = effects ; plot yvariable*xvariable = ’symbol’; by varlist; output ; run;

proc reg proc reg statement syntax: data = SAS data set name input data set outest = SAS data set name creates data set with parameter estimates simple prints simple statistics

proc reg the model statement model response= ; l required l variables must be numeric l many options l can specify more than one model statement Example: model weight = height age; model weight = height age / p clm cli;

proc reg the plot statement plot yvariable*xvariable ; l produces scatter plots - yvariable on the vertical axis and xvariable on the horizontal axis l can specify several plots l optional symbol to mark points l yvariable and xvariable can be variables specified in model statements or statistics available in output statement Example: plot weight * age / pred; plot r. * p. / vref = 0;

proc reg some statistics available for plotting: l P. predicted values l R. residuals l L95. lower 95% CI bound for individual prediction l U95. upper 95% CI bound for individual prediction l L95M. lower 95% CI bound for mean of dependent variable l U95M. upper 95% CI bound for mean of dependent variable Example: plot weight * age / pred; plot r. * p. / vref = 0; plot (weight p. l95. U95.) * age / overlay;

proc reg the output statement output keywords=names; l creates SAS data set l all original variables included l keyword=names specifies the statistics to include Example: output out=pvals p=pred r=resid;

Example: NMES variables of interest: totalexp – total medical expenditure ($) chd5 – indicator of CHD lastage – age at last interview male – sex of participant

proc reg example here: 1. model estimate parameters etc 2. plot make three plots 3. output make an output dataset regout

The run statement Many people assume that the run statement ends a procedure such as proc reg. This is because when SAS encounters a run statement it executes any outstanding instructions in the program buffer. But it may or may not end the procedure. proc reg data=lecture4.nmes; model totalexp = chd5 lastage male; run; model totalexp = chd5 lastage; plot r.*chd5; run; quit; /* ends the procedure */

proc glm (the general linear model) l uses least-squares with generalized inverses l performs linear regression, analysis of variance, analysis of covariance l accepts classification variables (discrete) and continuous variables l estimates and performs tests for general linear effects l proc anova is suitable for “balanced” designs; proc glm can be used for either balanced or unbalanced designs l suitable for random effects models

proc glm Syntax: proc glm data=name ; class classification variables; model response=effects /options; means effects / options; random effects / options; estimate ‘label’ effect value / options; contrast ‘label’ effect value / options; run;

proc glm response (dependent) variable is continuous – same normality assumption as in proc reg independent variables are discrete or continuous; discrete must listed on class statement interaction terms can be with an asterisk a*b, e.g. model bmi= a b a*b;

proc glm means effects / options; l computes arithmetic means and standard deviations of all continuous variables in the model (both dependent and independent) within each group for effects specified on the right-hand side of the model statement l only class variables may be specified as effects l options specify multiple comparison methods for main effect terms in the model

proc glm example here: 1. solution show estimated parameters 2. means show means for smoke variable 3. class treat smoke as discrete

proc glm example here: 1. format changes reference group

reg and glm  Both the proc reg and proc glm procedures are suitable only when the outcome variable is normally distributed.  proc reg has many regression diagnostic features, while proc glm allows you to fit more sophisticated linear models such as random effects models, models for unbalanced designs etc.

non-normal outcomes  In many situations we cannot assume our response variable is normally distributed.  proc reg and proc glm are not suitable for modeling such outcomes. Example: Suppose you are interested in estimating the prevalence of disease in a population. You have an indicator of disease (1 = Yes, 0 = No)

non-normal outcomes Example: You are interested in estimating how the incidence of infant mortality has changed as a function of time Example: You are interested in estimating the median survival time for two groups of patients receiving either a placebo or treatment.

Example: Survey data: parent agrees to close school when certain toxic elements are found in the environment Variables: close 0 = no, 1 = yes lived years lived in community proc logistic

Syntax: proc logistic ; model response = effects ; class variables; by variables; output ; run;

proc logistic descending option means that we are modeling the probability that close=1 and not the probability that close=0.

proc genmod implements the generalized linear model fits models with normal, binomial or poisson response variable (among others) fits generalized estimating equations for repeated measures data

proc genmod Syntax: proc genmod ; by variables; class variables; model response = effects ; output ; make ‘table’ out=name; run;

proc genmod: class statement says which variables are classification (categorical) variables by statement produces a separate analysis for each level of the by variables (data must be sorted in the order of the by variables) response variable is the response (dependent) variable in the regression model. are a list of variables. These are the independent variables in the regression model. Any independent variables that are categorical must be listed in the Class statement.

Example: Same model as we produced with proc glm. The default is a linear model. smoke will be treated as a categorical variable because of the class statement

options for the model statement dist = option specifies the distribution of the response variable. (default = normal) link = option specifies the link that will transform the response variable (default = identity) Examples: logistic regression: dist=binomial link=logit poisson regression: dist=poisson link=log

options for the model statement alpha = specifies confidence level for confidence intervals waldci or lrci specifies that confidence intervals are to be computed. The waldci gives approximate intervals and doesn’t take as long as lrci. The lrci give intervals based on likelihood ratio.

the output statement the output statement is just one of the ways to create a new SAS dataset containing results form the genmod procedure. statement is similar to that found in proc means and proc glm. Example: output out=new predicted=fit upper=upper lower=lower;

the make statement the make statement is another way to create a new SAS dataset containing results form the genmod procedure. ods is another more general way (see later). Example: make ‘ParameterEstimates’ out=parms; make ‘ParmInfo’ out=parminfo;

example: logistic regression  Perform a logistic regression analysis to determine how the odds of CHD are associated with age and gender in the 1987 NMES  Save the parameter estimates as a new dataset.  Save the predicted values along with the original data.

Example: descending options means that we are modeling the probability that chd5=1 and not the probability that chd5=0.