Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sudaan - koulutus KTL/TTO 2004. 2 Research Triangle Institute.

Similar presentations


Presentation on theme: "Sudaan - koulutus KTL/TTO 2004. 2 Research Triangle Institute."— Presentation transcript:

1 Sudaan - koulutus KTL/TTO 2004

2 2 Research Triangle Institute

3 3 SUDAAN

4 4 Why ?

5 5 PITFALLS OF USING STANDARD STATISTICAL SOFTWARE PACKAGES FOR SAMPLE SURVEY DATA Donna J. Brogan, Ph.D. Rollins School of Public Health, Emory University, Atlanta April 15, 1997 COPYRIGHT: This article is copyrighted and is not to be used without proper acknowledgment and citation. It will appear as a chapter in Encyclopedia of Biostatistics, edited by Peter Armitage and Theodore Colton (Editors-in-Chief), to be published by John Wiley in summer, 1998 as six volumes. The article will be in a section titled “Design of Experiments and Sample Surveys”, edited by Paul Levy. AUTHOR CONTACT INFORMATION: Donna Jean Brogan, Ph.D. Professor of Biostatistics Rollins School of Public Health 1518 Clifton Road N.E.—Room 324 Emory University Atlanta, GA phone: fax: CAUTIONS IN USING STANDARD STATISTICAL SOFTWARE PACKAGES Standard statistical software packages generally do not take into account four common characteristics of sample survey data: (1) unequal probability selection of observations, (2) clustering of observations, (3) stratification and (4) nonresponse and other adjustments [2 ]. Point estimates of population parameters are impacted by the value of the analysis weight for each observation. These weights depend upon the selection probabilities and other survey design features such as stratification and clustering. Hence, standard packages will yield biased point estimates if the weights are ignored. Estimated variance formulas for point estimates based on sample survey data are impacted by clustering, stratification and the weights. By ignoring these aspects, standard packages generally underestimate the estimated variance of a point estimate, sometimes substantially so. Most standard statistical packages can perform weighted analyses, usually via a WEIGHT statement added to the program code. Use of standard statistical packages with a weighting variable may yield the same point estimates for population parameters as sample survey software packages. However, the estimated variance often is not correct and can be substantially wrong, depending upon the particular program within the standard software package. Pitfalls

6 6 History

7 7 Platforms

8 8 Price

9 9 SUDAAN Overview SUDAAN is a single program consisting of a family of procedures used to analyze data from complex surveys and other observational and experimental studies involving cluster-correlated data. A complex sample may be multistage, stratified, or clustered. Many samples also have unequal probabilities of selection, or are drawn from finite populations. SUDAAN enables you to use survey data to obtain consistent estimates of population parameters and their standard errors in accordance with the sample design. SUDAAN also produces consistent estimates of regression coefficients, descriptive statistics, and their associated standard errors for cluster-correlated and repeated measures data applications in clinical, epidemiological, toxicological, and behavioral research. SUDAAN SUPPORT Direct inquiries about SUDAAN to: SUDAAN Business Coordinator Telephone: Fax: Research Triangle Institute, 3040 Cornwallis Road, Research Triangle Park, NC USA

10 10 SUDAAN Procedures Utility Procedure RECORDS Procedure The RECORDS procedure is designed to print records from any ASCII, SAS, SASXPORT, SUDAAN, SUDXPORT or SPSS record file. This is particularly useful when you wish to verify that SUDAAN is reading your data properly. You can also use this procedure to obtain a file contents summary or convert an input file of one type to another. For instance, you can convert an ASCII data set to a SUDAAN data set (and vice versa). NOTE: For SAS-Callable SUDAAN only, you can convert among the following file types: SAS, SUDAAN and SUDXPORT. The RECORDS procedure statements can be grouped into these categories: O Procedure statement: PROC RECORDS O Computation statement: SUBPOPN O Output statements: TITLE, FOOTNOTE, SETENV, PRINT, OUTPUT

11 11 SUDAAN Procedures Utility Procedure RECORDS Quick Reference PROC RECORDS DATA=filename [SUDDATA=filename] [FILETYPE=ASCII|SAS|SPSS|SUDAAN|SUDXPORT|SASXPORT] [COUNTREC] [CONTENTS] [HISTORY][NOPRINT] [MAXOBS=count] [NAMEFILE=filename] [LEVFILE=filename]; /* SUDDATA=filename must be used to specify a SUDAAN input file in SAS-Callable SUDAAN */ [SUBPOPN expression / [ NAME="label" ];] /* for SAS-Callable SUDAAN use RTITLE */ /* for SAS-Callable SUDAAN use RFOOTNOTE */ < SETENV{PAGEBEG=integer TABBEG=integer LINESIZE=integer PAGESIZE=integer LINESPCE=integer ROWSPCE=integer COLSPCE=integer ROWWIDTH=integer COLWIDTH=integer DECWIDTH=integer INDROWD=integer INDROWS=integer MAXIND=integer TOPMGN=integer LEFTMGN=integer LABWIDTH=integer }; > / [FILENAME=filename] [REPLACE] [STARTREC=number] [MAXREC=number] [NOHEAD] [NODATE] [NOTIME];> / FILENAME=filename [REPLACE] [NOCOMP|NOCOMPRESS] [FILETYPE=ASCII|SAS|SUDAAN|SUDXPORT|SPSS|SASXPORT] [NAMEFILE=filename] [LEVFILE=filename] [STDTYPE=number];> RUN;

12 12 SUDAAN Procedures Descriptive Procedures CROSSTAB Procedure The CROSSTAB procedure produces weighted frequency and percentage distributions for one-way (univariate, single-variable) and multi-way (multivariate or multiple-variable) tabulations. CROSSTAB also tests the hypothesis of no association between row and column variables in 2-way and multi-way tables, as well as odds ratios and relative risks in 2x2 tables. CROSSTAB is primarily for descriptive analyses of categorical variables. DESCRIPT and RATIO produce descriptive statistics for continuous variables. Although DESCRIPT allows you to request weighted frequency counts, CROSSTAB is computationally more efficient for this purpose. The CROSSTAB statements can be grouped into these categories: O Procedure statement: PROC CROSSTAB O Sample design statements: WEIGHT, NEST, TOTCNT, SAMCNT, JOINTPROB, REPWGT, IDVAR, JACKWGTS, JACKMULT O Computation statements: SUBGROUP, LEVELS, RECODE, SUBPOPN, TABLES, TEST O Output statements: SETENV, PRINT, TITLE, FOOTNOTE, OUTPUT, FORMAT

13 13 SUDAAN Procedures Descriptive Procedures CROSSTAB Quick Reference PROC CROSSTAB DATA=filename [ SUDDATA=filename ] [ FILETYPE=ASCII|SAS|SPSS|SUDAAN|SUDXPORT|SASXPORT ] [ DESIGN=WR|WOR|UNEQWOR|STRWR|STRWOR|SRS|BRR|JACKKNIFE ] [ PSUDATA=filename ] [ PSU_REC=count ] [ CONF_LIM=percent ] [ SMALL_CELL=count ] [ ATLEVEL1=position ] [ ATLEVEL2=position ] [ DDF=number ] [ DEFT4|DEFT1|DEFT2|DEFT3|DEFT|DEFF ] [ DISPLAY ][ INCLUDE] [ MERGEHI ] [ NOMARG ] [ NOCOL] [ NOROW ] [ NOTOT ] [ NOPER] [ NOSE ] [ NOWGT ] [ NOPRINT] [ MAXOBS=count ] [ REPDATA=filename ] [ REP_REC=count ] [ EST_STR=count ] [ EST_PSU = count ] [ NAMEFILE=filename ] [ LEVFILE=filename ]; /* SUDDATA=filename must be used to specify a SUDAAN input file /* in SAS-Callable SUDAAN */ [ WEIGHT variable; ] [ REPWGT variables / [ ADJFAY=value ] ; ] [ IDVAR variable(s) ; ] [ NEST variable(s) / [ PSULEV=position|FRL=position] [ STRLEV=position ] [ MISSUNIT] [ NOSORTCK ] ; ] [ TOTCNT variable(s); ] [ SAMCNT variable(s); ] [ JOINTPROB variable(s); ] [ JACKWGTS varlist / ADJJACK=value ; ] [ JACKMULT value(s) ; ] [ RECODE variable=(code_list) ; ] [ SUBPOPN expression / [ NAME=”label” ]; ] [ SUBGROUP variable(s); ] [ LEVELS level(s); ] [ TABLES table_request(s);] [ TEST { CHISQ LLCHISQ CMH }; ] continued

14 14 SUDAAN Procedures Descriptive Procedures CROSSTAB Quick Reference (cont.) /* for SAS-Callable SUDAAN use RTITLE */ /* for SAS-Callable SUDAAN use RFOOTNOTE */ /* for SAS-Callable SUDAAN use RFORMAT */ ...or... NSUM=label WSUM=label... etc. continued

15 15 SUDAAN Procedures Descriptive Procedures CROSSTAB Quick Reference (cont.) ...or... NSUM=label WSUM=label... etc. RUN;

16 16 SUDAAN Procedures Descriptive Procedures DESCRIPT Procedure The DESCRIPT procedure produces descriptive statistics for analysis variables, including means, totals, percentages, geometric means, medians and other quantiles, and their standard errors for sample surveys and other clustered data applications. The analysis variables can be continuous or categorical. DESCRIPT computes standardized means according to the method of direct standardization. The standardizing weights are assumed to be known. Within one call to DESCRIPT, all analysis variables must be either continuous or categorical. The analysis of both continuous and categorical variables requires separate calls to the DESCRIPT procedure. For continuous analysis variables, you can request estimates of totals, means, proportions, geometric means, and quantiles. For categorical variables, you can request estimates of totals, percentages, and their standard errors. You can request design effects for means, totals, and percentages. Design effects are not available for contrast statistics, standardized estimates, or post-stratified estimates. DESCRIPT is primarily for the descriptive analysis of continuous (and sometimes discrete) variables, while CROSSTAB is primarily for descriptive analyses of categorical variables.

17 17 SUDAAN Procedures Descriptive Procedures RATIO Procedure The RATIO procedure produces ratio estimates and their standard errors for sample surveys and other clustered data applications. The numerator and denominator variables can be continuous or categorical. RATIO computes standardized means according to the method of direct standardization. The standardizing weights are assumed to be known. For continuous variables, RATIO computes a ratio of weighted sums. If VAR1 and VAR2 denote the numerator and denominator variables respectively, and the variable WT denotes the weight, then the ratio estimate is computed by summing over the analysis observations on the input data set. For categorical variables, RATIO computes a ratio of weighted counts of individuals falling into a specified response category, as follows: 1) Any positive integer is a valid response category. 2) The numerator is the weighted sum of individuals who gave the specified integer response to the numerator variable. 3) The denominator is the weighted sum of individuals who gave the specified integer response to the denominator variable. RATIO estimates can consist of a continuous numerator variable and a categorical denominator variable or vice versa. However, within one call to RATIO, all numerator variables must be of the same type, and all denominator variables must be of the same type.

18 18 SUDAAN Procedures Regression Procedures REGRESS Procedure The REGRESS procedure fits linear models to sample survey data and other clustered data and repeated measures applications. Estimates of the model parameters and their standard errors are computed, along with tests of hypotheses. REGRESS offers GEE model fitting techniques for efficient parameter estimation. For estimating variance of the parameter estimates, REGRESS implements two robust methods described in Binder (1983) and Zeger and Liang (1986), as well as a model-based (naive) variance estimation method. A choice of independent or exchangeable "working" correlations is also provided. You can specify tests for linear combinations of the model parameters, and you can output the predicted values, residuals, parameter estimates, and their associated variance-covariance matrix for further hypothesis testing. Also, you can estimate and test linear combinations of the adjusted group means (also known as least squares means).

19 19 SUDAAN Procedures Regression Procedures LOGISTIC (RLOGIST) Procedure The LOGISTIC procedure fits logistic regression models to complex sample survey data and other clustered data applications. LOGISTIC produces estimates of the model parameters and their standard errors, and tests the null hypothesis that individual regression coefficients associated with each variable in the model are equal to zero. LOGISTIC also provides tests for overall model significance, model minus intercept, as well as model main effects and interactions. In addition, you can test linear combinations of the model parameters or output the parameter estimates and variance-covariance matrix to a data set for further hypothesis testing. You can also estimate and test linear combinations of the conditional and predicted marginals (generalizations of adjusted group means to non-linear models.) The LOGISTIC procedure estimates model parameters using generalized estimating equations (GEE). A choice of independent vs. exchangeable "working" correlations is also provided. For estimating variance of the parameter estimates, LOGISTIC implements two robust methods described in Binder (1983) and Zeger and Liang (1986), as well as a model-based (naive) variance estimation method. NOTE: For SAS-Callable SUDAAN, the name LOGISTIC conflicts with a SAS procedure of the same name. Use RLOGIST to invoke the SUDAAN logistic regression procedure.

20 20 SUDAAN Procedures Regression Procedures MULTILOG Procedure The MULTILOG procedure extends the modeling capabilities of SUDAAN to include categorical outcomes with more than two categories which may or may not have a natural ordering. These models can be viewed as generalizations of logit models for binary outcomes already available in SUDAAN in the LOGISTIC procedure. MULTILOG analyzes data from sample surveys as well as from randomized experiments and other observational studies involving cluster-correlated or longitudinal responses. Two models have been implemented in the MULTILOG procedure: the proportional odds model with cumulative logit link for ordinal responses and a generalized multinomial logit model for nominal outcomes. Both models handle continuous as well as discrete explanatory variables. The generalized Multinomial Logit Model produces separate parameter vectors for each of the generalized logit equations of interest; the Proportional Odds Model produces a common slope but separate intercepts for each of the cumulative logit equations of interest. The MULTILOG procedure estimates model parameters using generalized estimating equations (GEE). For estimating variance of the parameter estimates, MULTILOG implements two robust methods described in Binder (1983) and Zeger and Liang (1986), as well as a model-based (naive) variance estimation method. All three variance estimation methods allow a choice of independent vs exchangeable working correlations for describing the dependence of responses within clusters. By default, the GEE iterative fitting procedure in the exchangeable case uses the one-step approach, although a multistep GEE procedure can also be obtained. MULTILOG produces estimates of the model parameters and their standard errors, and tests the null hypothesis that individual regression coefficients associated with each variable in the model are equal to zero. MULTILOG also provides tests for overall model significance, model minus intercept, as well as model main effects and interactions. In addition, you can test linear combinations of the model parameters and output many statistics to an output data set. You can also estimate and test linear combinations of the conditional and predicted marginals (generalizations of adjusted group means to non-linear models).

21 21 SUDAAN Procedures Regression Procedures LOGLINK Procedure The LOGLINK procedure in SUDAAN fits log-linear regression models to cluster-correlated count data not in the form of proportions. The counts are typically counts of events in a Poisson-like process. The LOGLINK procedure estimates model parameters using generalized estimating equations (GEE). LOGLINK implements two robust variance estimation methods described in Binder (1983) and Zeger and Liang (1986), as well as a model-based (naive) variance estimation method. A choice of independent vs. exchangeable "working" correlations is also provided. You can specify tests for linear combinations of the model parameters, and you can output many statistics for further hypothesis testing. Also, you can estimate and test linear combinations of the conditional and predicted marginals (generalizations of adjusted group means to non-linear models). Like all of SUDAAN's procedures, LOGLINK is designed to analyze data from complex sample surveys (weighted, stratified, cluster-correlated data) as well as from randomized experiments and other observational studies involving cluster-correlated or longitudinal responses.

22 22 SUDAAN Procedures Regression Procedures SURVIVAL Procedure SURVIVAL provides proportional hazards modeling for failure time outcomes, which may contain left- and right- censored observations, time-dependent covariates, and multiple events per subject. The SURVIVAL procedure fits the discrete or continuous (Cox) proportional hazards model to sample surveys and other clustered data applications. Estimates of the model parameters and their standard errors are computed, along with tests of hypotheses. Enhancements in the current software release are as follows: • Counting process style of input (Andersen and Gill, 1982) to permit left truncation, multiple events per subject, and time-dependent covariates. A time-dependent covariate is one whose value for any given individual can change over time during the course of a study. • Computation of Schoenfeld residuals and Score residuals to allow users to evaluate “goodness of fit” and the validity of the proportional hazards assumption. • Computation of Efron's likelihood approximation for ties in addition to the current default formula by Breslow. • Option to allow stratified baseline hazard functions for different types of failures or different subgroups of the population.

23 23 Tunnusluvut eri ohjelmistoilla TITLE1 'TUNNUSLUVUT'; TITLE2 'SAS MEANS'; TITLE2 'SAS MEANS + WEIGHT'; PROC MEANS DATA=WORK.T2K_DATA N NMISS MEAN STDERR MAXDEC=3; N NMISS SUMWGT MEAN STDERR MAXDEC=3; CLASS SP2 T2K; WEIGHT WAN_UNIONI; VAR SYSTBP2; CLASS SP2 T2K; TYPES () SP2 T2K SP2*T2K; VAR SYSTBP2; RUN; TYPES () SP2 T2K SP2*T2K; RUN; options nolabel; TITLE2 'SAS SURVEYMEANS'; TITLE2 'SUDAAN DESCRIPT'; PROC SURVEYMEANS DATA=WORK.T2K_DATA PROC DESCRIPT DATA=WORK.T2K_DATA DESIGN=WR; NOBS NMISS MEAN STDERR SUMWGT; SETENV COLWIDTH=12 DECWIDTH=3; STRATA OSITE; NEST OSITE RYVAS; CLUSTER RYVAS; WEIGHT WAN_UNIONI; WEIGHT WAN_UNIONI; SUBGROUP SP2 T2K; DOMAIN T2K SP2 T2K*SP2; LEVELS 2 2; VAR SYSTBP2; VAR SYSTBP2; RUN; TABLES SP2*T2K; options label; PRINT / STYLE=NCHS; * tai STYLE=BOX; RUN;

24 24 Tunnusluvut SAS Means Analysis Variable : systbp2 Systolinen verenpaine N N Obs N Miss Mean Std Error Tutkimus N (1=T2K,2=MS) N Obs N Miss Mean Std Error Sukupuoli N (1=M,2=N) N Obs N Miss Mean Std Error Sukupuoli Tutkimus N (1=M,2=N) (1=T2K,2=MS) N Obs N Miss Mean Std Error

25 25 Tunnusluvut SAS Means + Weight Analysis Variable : systbp2 Systolinen verenpaine N N Obs N Miss Sum Wgts Mean Std Error Tutkimus N (1=T2K,2=MS) N Obs N Miss Sum Wgts Mean Std Error Sukupuoli N (1=M,2=N) N Obs N Miss Sum Wgts Mean Std Error Sukupuoli Tutkimus N (1=M,2=N) (1=T2K,2=MS) N Obs N Miss Sum Wgts Mean Std Error

26 26 Tunnusluvut SAS Surveymeans Data Summary Number of Strata 44 Number of Clusters 5155 Number of Observations Sum of Weights Statistics Sum of Std Error Variable N N Miss Weights Mean of Mean systbp Sum of Std Error t2k Variable N N Miss Weights Mean of Mean systbp systbp Sum of Std Error sp2 Variable N N Miss Weights Mean of Mean systbp systbp Sum of Std Error sp2 t2k Variable N N Miss Weights Mean of Mean systbp systbp systbp systbp

27 27 Tunnusluvut SUDAAN Descript Number of observations read : Weighted count : Denominator degrees of freedom : 5111 Variance Estimation Method: Taylor Series (WR) by: Variable, Sukupuoli (1=M,2=N), Tutkimus (1=T2K,2=MS). for: Variable = Systolinen verenpaine Sukupuoli (1=M,2=N) Tutkimus Weighted (1=T2K,2=MS) Sample Size Size Total Mean SE Mean Total Total Missing Missing Total Missing Total Missing Total Missing

28 28 Frekvenssit eri ohjelmistoilla TITLE1 'FREKVENSSIT'; TITLE2 'SAS FREQ'; TITLE2 'SAS FREQ + WEIGHT'; PROC FREQ DATA=WORK.T2K_DATA; TABLE SYSTBP2_123; WEIGHT WAN_UNIONI; RUN; TABLE SYSTBP2_123; PROC FREQ DATA=WORK.T2K_DATA; RUN; TABLE SYSTBP2_123; PROC FREQ DATA=WORK.T2K_DATA; BY T2K; WEIGHT WAN_UNIONI; RUN; TABLE SYSTBP2_123; BY T2K; RUN; TITLE2 'SAS SURVEYMEANS'; TITLE2 'SUDAAN CROSSTAB'; PROC SURVEYMEANS DATA=WORK.T2K_DATA PROC CROSSTAB DATA=WORK.T2K_DATA DESIGN=WR; NOBS MEAN STDERR SUMWGT; SETENV COLWIDTH=12 DECWIDTH=3; STRATA OSITE; NEST OSITE RYVAS; CLUSTER RYVAS; WEIGHT WAN_UNIONI; WEIGHT WAN_UNIONI; SUBGROUP T2K SYSTBP2_123; DOMAIN T2K; LEVELS 2 3; VAR SYSTBP2_123; TABLES T2K*SYSTBP2_123; CLASS SYSTBP2_123; PRINT NSUM WSUM ROWPER SEROW / STYLE=NCHS; RUN;

29 29 Frekvenssit SAS Freq Syst.vp 3-luok. Cumulative Cumulative SystBP2_123 Frequency Percent Frequency Percent Tutkimus (1=T2K,2=MS)=1 Syst.vp 3-luok. Cumulative Cumulative SystBP2_123 Frequency Percent Frequency Percent Tutkimus (1=T2K,2=MS)=2 Syst.vp 3-luok. Cumulative Cumulative SystBP2_123 Frequency Percent Frequency Percent

30 30 Frekvenssit SAS Freq + Weight Syst.vp 3-luok. Cumulative Cumulative SystBP2_123 Frequency Percent Frequency Percent Tutkimus (1=T2K,2=MS)=1 Syst.vp 3-luok. Cumulative Cumulative SystBP2_123 Frequency Percent Frequency Percent Tutkimus (1=T2K,2=MS)=2 Syst.vp 3-luok. Cumulative Cumulative SystBP2_123 Frequency Percent Frequency Percent

31 31 Frekvenssit SAS Surveymeans Data Summary Number of Strata 44 Number of Clusters 5155 Number of Observations Sum of Weights Statistics Sum of Std Error Variable N Weights Mean of Mean SystBP2_123= SystBP2_123= SystBP2_123= Tutkimus (1=T2K,2=MS) Sum of Std Error Variable N Weights Mean of Mean SystBP2_123= SystBP2_123= SystBP2_123= SystBP2_123= SystBP2_123= SystBP2_123=

32 32 Frekvenssit SUDAAN Crosstab Number of observations read : Weighted count : Denominator degrees of freedom : 5111 Variance Estimation Method: Taylor Series (WR) by: Tutkimus (1=T2K,2=MS), Syst.vp 3-luok Tutkimus (1=T2K,2=MS) Weighted SE Row Syst.vp 3-luok. Sample Size Size Row Percent Percent Total Total Total Total

33 33 Lineaarinen malli eri ohjelmistoilla TITLE1 'LINEAARINEN MALLI'; TITLE2 'SAS GLM + WEIGHT'; TITLE2 'SAS SURVEYREG'; PROC GLM DATA=WORK.T2K_DATA; PROC SURVEYREG DATA=WORK.T2K_DATA; WEIGHT WAN_UNIONI; STRATA OSITE; CLASS SP2 IKA6 T2K; CLUSTER RYVAS; MODEL SYSTBP2 = SP2 IKA6 T2K BMI / SOLUTION; WEIGHT WAN_UNIONI; RUN; CLASS SP2 IKA6 T2K; MODEL SYSTBP2 = SP2 IKA6 T2K BMI / SOLUTION; RUN; TITLE2 'SUDAAN REGRESS'; PROC REGRESS DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBGROUP SP2 IKA6 T2K; LEVELS 2 6 2; MODEL SYSTBP2 = SP2 IKA6 T2K BMI; PREDMARG SP2 IKA6 T2K; RUN;

34 34 Lineaarinen malli Testit SAS GLM + WEIGHT: Source DF Type III SS Mean Square F Value Pr > F sp ika <.0001 t2k <.0001 bmi <.0001 SAS SURVEYREG: Effect Num DF F Value Pr > F Model <.0001 Intercept <.0001 sp ika <.0001 t2k <.0001 bmi <.0001 SUDAAN REGRESS: Contrast Degrees of P-value Wald Freedom Wald F F OVERALL MODEL MODEL MINUS INTERCEPT INTERCEPT... SP IKA T2K BMI

35 35 Lineaarinen malli Parametriestimaatit SAS GLM + WEIGHT: Standard Parameter Estimate Error t Value Pr > |t| Intercept B <.0001 sp B sp B... ika B <.0001 ika B <.0001 ika B <.0001 ika B <.0001 ika B ika B... t2k B <.0001 t2k B... bmi <.0001 SAS SURVEYREG: Standard Parameter Estimate Error t Value Pr > |t| Intercept <.0001 sp sp ika <.0001 ika <.0001 ika <.0001 ika <.0001 ika ika t2k <.0001 t2k bmi <.0001

36 36 Lineaarinen malli Parametriestimaatit SUDAAN REGRESS: Independent Variables and P-value T- Effects Beta Coeff. SE Beta T-Test B=0 Test B= Intercept Sukupuoli (1=M,2=N) Ikäryhmä Tutkimus (1=T2K,2=MS) BodyMass-index

37 37 Lineaarinen malli Mallivakiointi SUDAAN REGRESS: Marginal Predicted Marginal SE T:Marg=0 P-value Sukupuoli (1=M,2=N) Ikäryhmä Tutkimus (1=T2K,2=MS)

38 38 Poikkileikkaustutkimus / jatkuva vaste Regress TITLE1 'POIKKILEIKKAUSTUTKIMUS / JATKUVA VASTE'; TITLE2 'SUDAAN REGRESS'; PROC REGRESS DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBPOPN T2K=1 / NAME="TERVEYS 2000"; SUBGROUP SP2 AA01 PORTAANNOUSU; LEVELS 2 4 2; MODEL SYSTBP2 = BMI SP2 SP2*T114 IKA2 AA01 PORTAANNOUSU; TEST WALDF SATADJF; REFLEVEL SP2=1; PREDMARG SP2; PRINT / TESTS=DEFAULT BETAS=ALL PRED_MRG=ALL; RUN;

39 39 Poikkileikkaustutkimus / jatkuva vaste SUBPOPN SUBPOPN EXAMPLE: By including the following statements in your SUDAAN program, you can limit the analysis to records for which the value of the RACE variable is 2 (African- Americans in this case) and the value of the SEX variable is 2 (Females in this case), and the value of the AGE variable is either less than 18 or over 65. SUBGROUP RACE SEX; LEVELS 2 2; SUBPOPN RACE=2 & SEX=2 & (AGE 65) / NAME='African-American Females not in Labor Force'; WARNING: Expressions such as 18 <= AGE <= 65 are NOT appropriate on the SUBPOPN statement and may lead to unexpected results. To indicate all values of AGE between 18 and 65, use the expression: (18 <= AGE) & (AGE <= 65).

40 40 Poikkileikkaustutkimus / jatkuva vaste Parametrit Variance Estimation Method: Taylor Series (WR) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Identity Response variable SYSTBP2: Systolinen verenpaine For Subpopulation: TERVEYS Independent Variables and P-value T- Effects Beta Coeff. DEFF Beta #4 SE Beta T-Test B=0 Test B= Intercept Sukupuoli (1=M,2=N) Siviilisääty Kahden portaan nousu BodyMass-index Ikä Sukupuoli (1=M,2=N), fS-Kol mmol/l 1, ,

41 41 Poikkileikkaustutkimus / jatkuva vaste Testit Contrast P-value Degrees of S_waite Adj S_waite Adj S_waite Adj P-value Wald Freedom DF F F Wald F F OVERALL MODEL MODEL MINUS INTERCEPT INTERCEPT SP AA PORTAANNOUSU BMI IKA T114 * SP Marginal Predicted Marginal SE T:Marg=0 P-value Sukupuoli (1=M,2=N)

42 42 Poikkileikkaustutkimus / binäärinen vaste Rlogist (Logistic) TITLE 'POIKKILEIKKAUSTUTKIMUS / BINÄÄRINEN (0/1) VASTE'; TITLE2 'SUDAAN RLOGIST (LOGISTIC)'; PROC RLOGIST DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBPOPN T2K=1 / NAME="TERVEYS 2000"; SUBGROUP SP2 AA01 PORTAANNOUSU; LEVELS 2 4 2; MODEL SYSTBP2_01 = BMI SP2 SP2*T114 IKA2 AA01 PORTAANNOUSU; TEST WALDF SATADJF; REFLEVEL SP2=1; PREDMARG SP2; PRINT / TESTS=DEFAULT BETAS=ALL PRED_MRG=ALL RISK=ALL; RUN;

43 43 Poikkileikkaustutkimus / binäärinen vaste Parametrit Variance Estimation Method: Taylor Series (WR) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Logit Response variable SYSTBP2_01: Syst.vp (0/1) For Subpopulation: TERVEYS Independent Variables and P-value T- Effects Beta Coeff. DEFF Beta #4 SE Beta T-Test B=0 Test B= Intercept Sukupuoli (1=M,2=N) Siviilisääty Kahden portaan nousu BodyMass-index Ikä Sukupuoli (1=M,2=N), fS-Kol mmol/l 1, ,

44 44 Poikkileikkaustutkimus / binäärinen vaste Testit Contrast P-value Degrees of S_waite Adj S_waite Adj S_waite Adj P-value Wald Freedom DF F F Wald F F OVERALL MODEL MODEL MINUS INTERCEPT INTERCEPT SP AA PORTAANNOUSU BMI IKA T114 * SP Marginal Predicted Marginal SE T:Marg=0 P-value Sukupuoli (1=M,2=N)

45 45 Poikkileikkaustutkimus / binäärinen vaste OR Independent Variables and Lower 95% Upper 95% Effects Odds Ratio Limit OR Limit OR Intercept Sukupuoli (1=M,2=N) Siviilisääty Kahden portaan nousu BodyMass-index Ikä Sukupuoli (1=M,2=N), fS-Kol mmol/l 1, ,

46 46 Poikkileikkaustutkimus / moniluokkainen vaste Multilog TITLE 'POIKKILEIKKAUSTUTKIMUS / MONILUOKKAINEN VASTE'; TITLE2 'SUDAAN MULTILOG'; PROC MULTILOG DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBPOPN T2K=1 / NAME="TERVEYS 2000"; SUBGROUP SYSTBP2_123 SP2 AA01 PORTAANNOUSU; LEVELS ; MODEL SYSTBP2_123 = BMI SP2 T114 IKA2 AA01 PORTAANNOUSU / CUMLOGIT; TEST WALDF SATADJF; REFLEVEL SP2=1; PREDMARG SP2; PRINT / TESTS=DEFAULT BETAS=ALL PRED_MRG=ALL STYLE=NCHS; RUN;

47 47 Poikkileikkaustutkimus / moniluokkainen vaste Parametrit SYSTBP2_123 (log-odds) Independent Variables and P-value T- Effects Beta Coeff. DEFF Beta #4 SE Beta T-Test B=0 Test B= vs 3 Intercept Sukupuoli (1=M,2=N) Siviilisääty Kahden portaan nousu Ikä BodyMass-index fS-Kol mmol/l vs 3 Intercept Sukupuoli (1=M,2=N) Siviilisääty Kahden portaan nousu Ikä BodyMass-index fS-Kol mmol/l

48 48 Poikkileikkaustutkimus / moniluokkainen vaste Testit Contrast P-value Degrees of S_waite Adj S_waite Adj S_waite Adj P-value Wald Freedom DF F F Wald F F OVERALL MODEL MODEL MINUS INTERCEPT INTERCEPT SP AA PORTAANNOUSU IKA BMI T Syst.vp 3-luok. Predicted Marginal Marginal SE T:Marg=0 P-value Sukupuoli (1=M,2=N) Sukupuoli (1=M,2=N) Sukupuoli (1=M,2=N)

49 49 Kahden otoksen vertailu / jatkuva vaste Regress TITLE1 'KAHDEN OTOKSEN VERTAILU / JATKUVA VASTE'; TITLE2 'SUDAAN REGRESS'; PROC REGRESS DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBGROUP T2K SP2 AA01 PORTAANNOUSU; LEVELS ; MODEL SYSTBP2 = T2K BMI SP2 IKA2 AA01 PORTAANNOUSU; TEST SATADJF; REFLEVEL T2K=1; PREDMARG T2K; RUN;

50 50 Kahden otoksen vertailu / binäärinen vaste Rlogist (Logistic) TITLE 'KAHDEN OTOKSEN VERTAILU / BINÄÄRINEN VASTE'; TITLE2 'SUDAAN RLOGIST (LOGISTIC)'; PROC RLOGIST DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBGROUP T2K SP2 AA01 PORTAANNOUSU; LEVELS ; MODEL SYSTBP2_01 = T2K BMI SP2 IKA2 AA01 PORTAANNOUSU; TEST SATADJF; REFLEVEL T2K=1; PREDMARG T2K; RUN;

51 51 Kahden otoksen vertailu / moniluokkainen vaste Multilog TITLE 'KAHDEN OTOKSEN VERTAILU / MONILUOKKAINEN VASTE'; TITLE2 'SUDAAN MULTILOG'; PROC MULTILOG DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBGROUP SYSTBP2_123 T2K SP2 AA01 PORTAANNOUSU; LEVELS ; MODEL SYSTBP2_123 = T2K BMI SP2 IKA2 AA01 PORTAANNOUSU / CUMLOGIT; TEST SATADJF; REFLEVEL T2K=1; PREDMARG T2K; PRINT / STYLE=NCHS; RUN;

52 52 Ajonaikainen uudelleen luokittelu Recode TITLE1 'AJONAIKAINEN UUDELLEEN LUOKITTELU'; TITLE2 'RECODE'; PROC CROSSTAB DATA=WORK.T2K_DATA DESIGN=WR; SETENV COLWIDTH=12 DECWIDTH=3; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; RECODE SYSTBP2_01=(0 1); SUBGROUP T2K SYSTBP2_01; LEVELS 2 2; TABLES T2K*SYSTBP2_01; PRINT NSUM ROWPER / STYLE=NCHS; RUN; RECODE EXAMPLES: RECODE X = 1.5; will recode the continuous or categorical variable X to a 0-1 variable whose value is 0 if the input value is less than 1.5 and 1 if the input value is greater than or equal to 1.5. RECODE ZERONE = (0 1); recodes the 0-1 variable ZERONE to be a 1-2 variable suitable for use on the SUBGROUP statement. Level 0 goes to 1; level 1 goes to 2.

53 53 Ajonaikainen uudelleen luokittelu Ennen - jälkeen Tutkimus (1=T2K,2=MS) (1=T2K,2=MS) Syst.vp (0/1) Sample Size Row Percent Syst.vp (0/1) Sample Size Row Percent Total Total Total Total Total Total Total

54 54 Suora vakiointi Descript + Stdvar & Stdwgt TITLE1 'SUORA VAKIOINTI'; TITLE2 'SUDAAN DESCRIPT + STDVAR & STDWGT'; PROC DESCRIPT DATA=WORK.T2K_DATA DESIGN=WR; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBGROUP IKA6 T2K; LEVELS 6 2; STDVAR IKA6; STDWGT ; * summa=1; * STDWGT ; * summa=100; * STDWGT ; * ohjelma skaalaa itse; VAR SYSTBP2; TABLES T2K; PRINT / STYLE=NCHS; RUN;

55 55 Suora vakiointi Ennen - jälkeen Variable Tutkimus Sample Weighted (1=T2K,2=MS) Size Size Total Mean SE Mean Systolinen verenpaine Total Variable Tutkimus Sample Weighted (1=T2K,2=MS) Size Size Total Mean SE Mean Systolinen verenpaine Total

56 56 Mallivakioidut keskiarvot tiedostoon Regress + Output TITLE1 'MALLIVAKIOIDUT KESKIARVOT TIEDOSTOON'; TITLE2 'SUDAAN REGRESS + OUTPUT'; PROC REGRESS DATA=WORK.T2K_DATA DESIGN=WR; NEST OSITE RYVAS; WEIGHT WAN_UNIONI; SUBGROUP SP2 IKA6 T2K; LEVELS 2 6 2; MODEL SYSTBP2 = SP2 IKA6 T2K BMI; PREDMARG SP2 IKA6 T2K; OUTPUT / FILENAME=MARGIN FILETYPE=SAS REPLACE PRED_MRG=ALL; RUN; PROC PRINT DATA=MARGIN LABEL; RUN;

57 57 Mallivakioidut keskiarvot tiedostoon SUDAAN - tulostus Marginal Predicted Marginal SE T:Marg=0 P-value Sukupuoli (1=M,2=N) Ikäryhmä Tutkimus (1=T2K,2=MS)

58 58 Mallivakioidut keskiarvot tiedostoon SAS - tulostus Procedure Table Predicted Obs Number Number Marginal Marginal SE T:Marg=0 P-value

59 59 Kiitos!


Download ppt "Sudaan - koulutus KTL/TTO 2004. 2 Research Triangle Institute."

Similar presentations


Ads by Google