Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.

Similar presentations


Presentation on theme: "SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005."— Presentation transcript:

1 SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

2 What will the output from this program look like? How many variables will be in the dataset example, and what will be the length and type of each variable? What will the variable package look like?

3 What will the output from this program look like?

4 Modeling with SAS examine relationships between variables estimate parameters and their standard errors calculate predicted values evaluate the fit or lack of fit of a model test hypotheses design outcome

5 The linear model Example: Note: outcome variable must be continuous and normal given independent variables

6 the linear model with proc reg  estimates parameters by least squares  produces diagnostics to test model fit (e.g. scatter plots)  tests hypotheses Example: proc reg data=mydata; model weight = height age; run;

7 proc reg Syntax: proc reg ; model response = effects ; plot yvariable*xvariable = ’symbol’; by varlist; output ; run;

8 proc reg proc reg statement syntax: data = SAS data set name input data set outest = SAS data set name creates data set with parameter estimates simple prints simple statistics

9 proc reg the model statement model response= ; l required l variables must be numeric l many options l can specify more than one model statement Example: model weight = height age; model weight = height age / p clm cli;

10 proc reg the plot statement plot yvariable*xvariable ; l produces scatter plots - yvariable on the vertical axis and xvariable on the horizontal axis l can specify several plots l optional symbol to mark points l yvariable and xvariable can be variables specified in model statements or statistics available in output statement Example: plot weight * age / pred; plot r. * p. / vref = 0;

11 proc reg some statistics available for plotting: l P. predicted values l R. residuals l L95. lower 95% CI bound for individual prediction l U95. upper 95% CI bound for individual prediction l L95M. lower 95% CI bound for mean of dependent variable l U95M. upper 95% CI bound for mean of dependent variable Example: plot weight * age / pred; plot r. * p. / vref = 0; plot (weight p. l95. U95.) * age / overlay;

12 proc reg the output statement output keywords=names; l creates SAS data set l all original variables included l keyword=names specifies the statistics to include Example: output out=pvals p=pred r=resid;

13 Example: NMES variables of interest: totalexp – total medical expenditure ($) chd5 – indicator of CHD lastage – age at last interview male – sex of participant

14 proc reg example here: 1. model estimate parameters etc 2. plot make three plots 3. output make an output dataset regout

15

16

17 The run statement Many people assume that the run statement ends a procedure such as proc reg. This is because when SAS encounters a run statement it executes any outstanding instructions in the program buffer. But it may or may not end the procedure. proc reg data=lecture4.nmes; model totalexp = chd5 lastage male; run; model totalexp = chd5 lastage; plot r.*chd5; run; quit; /* ends the procedure */

18 proc glm (the general linear model) l uses least-squares with generalized inverses l performs linear regression, analysis of variance, analysis of covariance l accepts classification variables (discrete) and continuous variables l estimates and performs tests for general linear effects l proc anova is suitable for “balanced” designs; proc glm can be used for either balanced or unbalanced designs l suitable for random effects models

19 proc glm Syntax: proc glm data=name ; class classification variables; model response=effects /options; means effects / options; random effects / options; estimate ‘label’ effect value / options; contrast ‘label’ effect value / options; run;

20 proc glm response (dependent) variable is continuous – same normality assumption as in proc reg independent variables are discrete or continuous; discrete must listed on class statement interaction terms can be with an asterisk a*b, e.g. model bmi= a b a*b;

21 proc glm means effects / options; l computes arithmetic means and standard deviations of all continuous variables in the model (both dependent and independent) within each group for effects specified on the right-hand side of the model statement l only class variables may be specified as effects l options specify multiple comparison methods for main effect terms in the model

22 proc glm example here: 1. solution show estimated parameters 2. means show means for smoke variable 3. class treat smoke as discrete

23

24

25 proc glm example here: 1. format changes reference group

26

27 reg and glm  Both the proc reg and proc glm procedures are suitable only when the outcome variable is normally distributed.  proc reg has many regression diagnostic features, while proc glm allows you to fit more sophisticated linear models such as random effects models, models for unbalanced designs etc.

28 non-normal outcomes  In many situations we cannot assume our response variable is normally distributed.  proc reg and proc glm are not suitable for modeling such outcomes. Example: Suppose you are interested in estimating the prevalence of disease in a population. You have an indicator of disease (1 = Yes, 0 = No)

29 non-normal outcomes Example: You are interested in estimating how the incidence of infant mortality has changed as a function of time Example: You are interested in estimating the median survival time for two groups of patients receiving either a placebo or treatment.

30 Example: Survey data: parent agrees to close school when certain toxic elements are found in the environment Variables: close 0 = no, 1 = yes lived years lived in community proc logistic

31

32 Syntax: proc logistic ; model response = effects ; class variables; by variables; output ; run;

33 proc logistic descending option means that we are modeling the probability that close=1 and not the probability that close=0.

34

35

36 proc genmod implements the generalized linear model fits models with normal, binomial or poisson response variable (among others) fits generalized estimating equations for repeated measures data

37 proc genmod Syntax: proc genmod ; by variables; class variables; model response = effects ; output ; make ‘table’ out=name; run;

38 proc genmod: class statement says which variables are classification (categorical) variables by statement produces a separate analysis for each level of the by variables (data must be sorted in the order of the by variables) response variable is the response (dependent) variable in the regression model. are a list of variables. These are the independent variables in the regression model. Any independent variables that are categorical must be listed in the Class statement.

39 Example: Same model as we produced with proc glm. The default is a linear model. smoke will be treated as a categorical variable because of the class statement

40

41 options for the model statement dist = option specifies the distribution of the response variable. (default = normal) link = option specifies the link that will transform the response variable (default = identity) Examples: logistic regression: dist=binomial link=logit poisson regression: dist=poisson link=log

42 options for the model statement alpha = specifies confidence level for confidence intervals waldci or lrci specifies that confidence intervals are to be computed. The waldci gives approximate intervals and doesn’t take as long as lrci. The lrci give intervals based on likelihood ratio.

43 the output statement the output statement is just one of the ways to create a new SAS dataset containing results form the genmod procedure. statement is similar to that found in proc means and proc glm. Example: output out=new predicted=fit upper=upper lower=lower;

44 the make statement the make statement is another way to create a new SAS dataset containing results form the genmod procedure. ods is another more general way (see later). Example: make ‘ParameterEstimates’ out=parms; make ‘ParmInfo’ out=parminfo;

45 example: logistic regression  Perform a logistic regression analysis to determine how the odds of CHD are associated with age and gender in the 1987 NMES  Save the parameter estimates as a new dataset.  Save the predicted values along with the original data.

46 Example: descending options means that we are modeling the probability that chd5=1 and not the probability that chd5=0.

47

48


Download ppt "SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005."

Similar presentations


Ads by Google