Download presentation

Presentation is loading. Please wait.

Published byTaryn Lucking Modified over 2 years ago

1
PROC SURVEYCORR Jessica Hampton CCSU, New Britain, CT September 2013

2
Introduction

3
3 Medical Expenditures Panel Survey (MEPS) Administered annually by the U.S. Department of Health and Human Services since 1996 Agency for Healthcare Research and Quality (ARHQ) Anonymity protected by removing individual identifiers from the public data files MEPS 2010 consolidated data file released September 2012 Multiple components (household, insurance/employer, and medical provider). Household component (1,911 variables) covers the following topics: Demographics Household income Employment Diagnosed health conditions Additional health status issues Medical expenditures and utilization Satisfaction with and access to care Insurance coverage 18,692 after excluding out of scope, negative person weights, under 18 and 65+ U.S. civilian, noninstitutionalized population ~3% out of scope (birth/adoption, death, incarceration, living abroad)

4
4 MEPS Survey Design Methods MEPS is a representative but NOT a random sample of the population Person weights must be used to produce reliable population estimates Stratification: By demographic variables such as age, race, sex, income, etc. Goal is to maximize homogeneity within and heterogeneity between strata Sometimes used to oversample certain groups under-represented in the general population or with interesting characteristics relevant to study For example: blacks, Hispanics, and low-income households Clustering: By geography in order to reduce survey costs -- not feasible or cost- effective to do a random sample of the entire population of the U.S. Within-cluster correlation underestimates variance/error -- two families in the same neighborhood are more likely to be similar demographically (for example, similar income) Desire clusters spatially close for cost effectiveness but as heterogeneous within as possible for reasonable variance. Multi-stage clustering used in MEPS: sample of counties >> sample of blocks >> individuals/households surveyed from block sample

5
5 Survey Design Considerations If person weights are ignored and one tries to generalize sample findings to the entire population, total numbers, percentages, or means are inflated for the groups that are oversampled and underestimated for others In regression analysis, ignoring person weights leads to biased coefficient estimates If sampling strata and cluster variables are ignored, means and coefficient estimates are unaffected, but standard error (or population variance) may be underestimated; that is, the reliability of an estimate may be overestimated Or when comparing one estimated population mean to another, the difference may appear to be statistically significant when it is not (Machlin, S., Yu, W., & Zodet, M., 2005)

6
SAS Survey Procedures

7
7 Intended for use with sample designs that may include unequal person weights, clustering, and stratification. PROC SURVEYMEANS estimates population totals, percentages, and means. Includes estimated variance, confidence intervals, and descriptive statistics. PROC SURVEYFREQ produces frequency tables, population estimates, percentages, and standard error. PROC SURVEYREG estimates regression coefficients by generalized least squares. PROC SURVEYLOGISTIC fits logistic regression models for discrete response (categorical) survey data by maximum likelihood. PROC SURVEYMEANS and PROC SURVEYREG available starting with SAS version 8. PROC SURVEYFREQ and PROC SURVEYLOGISTIC available starting with version 9. PROC SURVEYSELECT for sampling which will not be used in this project

8
8 PROC SURVEYMEANS Syntax PROC SURVEYMEANS DATA=PQI.MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT10F; DOMAIN INSCOV10; VAR TOTEXP10 TOTSLF10; RUN;

9
9 PROC SURVEYMEANS Output

10
10 PROC SURVEYFREQ Syntax PROC SURVEYFREQ DATA=PQI.MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT10F; TABLES PRIEU10 PRING10 INSCOV10; RUN;

11
11 PROC SURVEYFREQ Output

12
12 PROC SURVEYREG Syntax PROC SURVEYREG DATA=PQI.MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT10F; MODEL &TARGET=&&VAR&I /SOLUTION; ODS OUTPUT PARAMETERESTIMATES=PARAMETER_EST FITSTATISTICS=FIT; RUN;

13
13 PROC SURVEYLOGISTIC Syntax PROC SURVEYLOGISTIC DATA=SASUSER.MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT10F; MODEL TOTEXP_HIGH(EVENT='1')=AGE10X MARRIED--HISPANX POVLEV10--PHYACT53 OBESE--ADSMOK42 ADINSA42--LOCATN_ER; ODS OUTPUT PARAMETERESTIMATES=WORK.PARAM; RUN;

14
14 PROC SURVEYLOGISTIC/REG Output Default output (similar to PROC LOGISTIC and PROC REG): fit statistics (AIC, Schwartzs criterion, R-square) chi-squared tests of the global null hypothesis degrees of freedom coefficient estimates standard error of coefficient estimates and p-values odds ratio point estimates 95% Wald confidence intervals Does not include: Option for stepwise selection chi-squared test of residuals/tabled residuals (assumptions of normality and equal variance do not apply) influential obs/outliers (person weights)

15
PROC SURVEYCORR

16
16 Correlations Three approaches Unweighted PROC CORR PROC CORR with person weights PROC SURVEYCORR macro with PROC SURVEYREG: Uses all survey design variables (strata/cluster/weight) Iteratively runs simple regression models for each predictor variable Builds table with r-squared, r, and p-values Sorted by r Similar results for all three approaches PROC CORR output unwieldy with large # of predictor variables PROC CORR cannot use strata and cluster variables

17
17 PROC CORR PROC CORR DATA=PQI.MEPS_2010 PLOTS=MATRIX RANK; VAR AGE10X WAGEP10X TTLP10X FAMINC10 POVLEV10 TOTSLF10 ERTEXP10 ERTOT10 RXEXP10 OPTEXP10 OPTOTV10 OBVEXP10 OBTOTV10 IPTEXP10 IPNGTD10; WITH TOTEXP10; WEIGHT PERWT10F; RUN;

18
18 Step 1: PROC SURVEYCORR PROC SQL; SELECT NVAR INTO :NVAR FROM DICTIONARY.TABLES WHERE LIBNAME='PQI' AND MEMNAME='MEPS_2010'; QUIT; SQL dictionary tables used to select # of predictor variables in the dataset and store in macro variable. Note: Data set names stored in dictionary tables using all caps. # of predictor variables (nvar) = # of iterations SAS will use in DO LOOP later on in the program.

19
19 Step 2: PROC SURVEYCORR PROC CONTENTS DATA=PQI.MEPS_2010 OUT=CONTENTS NOPRINT; RUN; PROC SQL NOPRINT; SELECT NAME INTO:VAR1-:VAR76 FROM WORK.CONTENTS; QUIT; PROC CONTENTS used to obtain a list of predictor variable names List of variable names stored as macro variables using PROC SQL SELECT INTO statement:

20
20 Step 3: PROC SURVEYCORR PROC SQL; CREATE TABLE SURVEYCORR (PARAMETER CHAR(15),R_SQUARE CHAR(8),R NUM(8),PROBT NUM(8)); QUIT; Create empty table to store data Output from PROC SURVEYREG will be inserted one row at a time

21
21 Step 4: PROC SURVEYCORR %MACRO CORR(TARGET=); PROC SURVEYREG DATA=PQI.MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT10F; MODEL &TARGET=&&VAR&I /SOLUTION; ODS OUTPUT PARAMETERESTIMATES=PARAMETER_EST FITSTATISTICS=FIT; RUN; First part of macro PROC SURVEYREG uses survey design variables in strata, cluster, and weight statements Optional ODS OUTPUT statement stores parameter estimates, fit statistics, and other information created when the model runs

22
22 Step 5: PROC SURVEYCORR PROC SQL; INSERT INTO SURVEYCORR SELECT PARAMETER,CVALUE1 AS R_SQUARE,SIGN(ESTIMATE)* SQRT(INPUT(CVALUE1,8.)) AS R,PROBT AS PVALUE FROM FIT,PARAMETER_EST WHERE LABEL1 = "R-SQUARE" AND PARAMETER = "&&VAR&I"; QUIT; %MEND CORR; R-square value extracted from FitStatistics output with PROC SQL P-value and sign of estimated regression coefficient from ParameterEstimates Square root function to get correlation coefficient Sign of regression coefficient = direction of correlation (-/+) with target Target variable input as a parameter when the macro is called

23
23 Step 6: PROC SURVEYCORR %MACRO LOOP; %DO I=1 %TO &NVAR; %CORR(TARGET=PUBAT10X); %END; %MEND LOOP; Call the macro Input desired target variable as parameter Iterate for each predictor variable (NVAR times) Each time macro is run, new row inserted in table SURVEYCORR

24
24 Step 7: PROC SURVEYCORR PROC SQL; CREATE TABLE PQI.SURVEYCORR AS SELECT PARAMETER,R_SQUARE,R FORMAT BEST6.4,PROBT AS PVALUE FORMAT PVALUE6.4,CASE WHEN PROBT <=0.05 THEN "YES" ELSE "NO" END AS SIGNIFICANT_95 FROM SURVEYCORR WHERE PARAMETER NOT IN ('DUPERSID','VARSTR','VARPSU','PERWT10F') ORDER BY ABS(R) DESC; QUIT; Use PROC SQL to: Format results Sort by correlation size Exclude survey design variables from tabulated output

25
25 PROC SURVEYCORR Output parameterr-squarerp-value significance (95% C.L.) TOTEXP <0.0001yes IPTEXP <0.0001yes TOTEXP_HIGH <0.0001yes IPNGTD <0.0001yes OBVEXP <0.0001yes RXEXP <0.0001yes OBTOTV <0.0001yes OPTEXP <0.0001yes TOTSLF <0.0001yes ADAPPT <0.0001yes

26
Conclusions

27
27 Recommendations/Conclusions Only 4 SAS Survey Procedures No PROC SURVEYCORR Person weights, but No strata/cluster variables Significance level (p values) may be less accurate with complex survey designs Iterative approach with PROC SURVEYREG Can get r and p for large # of predictor variables Output tabled and ranked For categorical variables: Either reformat to numeric first Or use CLASS statement in PROC SURVEYREG

28
References

29
29 References Carrington, W. J., Eltinge, J. L., & McCue, K. (2000). An Economists Primer on Survey Samples. Working Paper no Suitland, MD: Center for Economic Studies, U.S. Bureau of the Census, October Retrieved from ftp://tigerline.census.gov/ces/wp/2000/CES-WP pdf January 15, ftp://tigerline.census.gov/ces/wp/2000/CES-WP pdf Cohen, J.W., & Rhoades, J.A. (2009). Group and Non-Group Private Health Insurance Coverage, 1996 to 2007: Estimates for the U.S. Civilian Noninstitutionalized Population under Age 65. Medical Expenditure Panel Survey (MEPS) Statistical Brief #267. Agency for Healthcare Research and Quality, Rockville, MD. Retrieved from DiJulio, B., & Claxton, G. (2010). Comparison of Expenditures in Nongroup and Employer- Sponsored Insurance: Kaiser Family Foundation, Menlo Park, CA. Retrieved from Kaiser Family Foundation (2008). How Non-Group Health Coverage Varies with Income. Menlo Park, CA. Retrieved from Machlin, S., & Yu, W. (2005). MEPS Sample Persons In-Scope for Part of the Year: Identification and Analytic Considerations. April Agency for Healthcare Research and Quality, Rockville, MD. Retrieved from /survey_comp/hc_survey/hc_sample.shtml

30
30 References (continued) Machlin, S., Yu, W., & Zodet, M. (2005). Computing Standard Errors for MEPS Estimates. January Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from Medical Expenditure Panel Survey (MEPS). (2012). MEPS HC-138: 2010 Full Year Consolidated Data File. Rockville, MD: Agency for Healthcare Research and Quality (AHRQ), September Retrieved from September 27, Medical Expenditure Panel Survey (MEPS). (2012). MEPS HC-138: 2010 Full Year Consolidated Data Codebook. Rockville, MD: Agency for Healthcare Research and Quality (AHRQ), August 30, Retrieved from 8 September 27, Medical Expenditure Panel Survey (MEPS). MEPS-HC Panel Design and Collection Process. Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from Medical Expenditure Panel Survey (MEPS). Data Use Agreement. Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from

31
31 References (continued) ONeill, J., & ONeill, D. (2009). Who are the uninsured? An Analysis of Americas Uninsured Population, Their Characteristics, and Their Health. Employment Policies Institute, Washington, D.C. SAS Institute Inc.(2008). SAS/STAT 9.2 Users Guide. Chapter 14: Introduction to Survey Sampling and Analysis Procedures. Pp Cary, NC: SAS Institute Inc. Retrieved from urveysamp.pdf on January 15, urveysamp.pdf Trish, E., Damico, A., Claxton, G., Levitt, L., & Garfield, R. (2011). A Profile of Health Insurance Exchange Enrollees. Kaiser Family Foundation, Menlo Park, CA. Retrieved from

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google