Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Imputation and Missing Race in the Pre-Invasive Cervical Cancer Study among Three States 2010 NAACCR Conference Quebec City, June 22, 2010 Bin.

Similar presentations


Presentation on theme: "Multiple Imputation and Missing Race in the Pre-Invasive Cervical Cancer Study among Three States 2010 NAACCR Conference Quebec City, June 22, 2010 Bin."— Presentation transcript:

1 Multiple Imputation and Missing Race in the Pre-Invasive Cervical Cancer Study among Three States 2010 NAACCR Conference Quebec City, June 22, 2010 Bin Huang Kentucky Cancer Registry University of Kentucky

2 The Pre-invasive Cervical Cancer Study HPV vaccine  Quadrivalent vaccine licensed for females in June 2006  ACS developed the guideline for HPV vaccine use June 2007  Anticipated reductions in cervical cancers, other anogenital cancers Need for surveillance systems  Collection of population data for pre-invasive cervical cancer cases  Monitoring effectiveness and efficacy CDC funded study  Includes three cancer registries – Michigan, Kentucky, Louisiana  Pre-pilot period (Sept-Dec 2008)  Data collection Jan 2009-Dec 2009

3 Missing Data In the Study Missing data issue  Race : 30% missing.  Overall cases with complete data: 68.7% Potential to cause bias or lead to inefficient analyses.

4 Missing Data Mechanism Missing completely at random (MCAR).  The missingness is independent of both the missing response and the observed response. Missing at random (MAR).  The missingness is independent of the missing response given the observed values. Not missing at random (NMAR).  The missingness depends on both observed and missing responses.

5 Methods to Treat Missing Data Available Case Methods  Complete case method (listwise deletion).  Pairwise deletion Single Imputation methods  Mean substitution  Hot deck imputation  Regression substitution Modern Approaches  Maximum Likelihood (ML) method  Bayesian method  Multiple Imputation (MI)

6 Multiple Imputation (MI) MI is a three-step approach to estimation for incomplete data, first proposed by Rubin in 1977. MI assumes missing data are MAR. Imputation - the missing data are filled in m times to generate m complete data sets. Imputation model preserves the distributional relationship between the missing values and the observed values. Analysis - the m complete data sets are analyzed separately using standard statistical analyses. Combination - the results from the m complete data sets are combined to produce inferential results.

7 Software Available SAS  PROC MI; PROC MIANALYZE.  MCMC option - assumption of multivariate normality. SOLAS (Statistical Solutions Inc)  Same assumption as SAS Proc MI. S-Plus: NORM IVEware: SAS callable  PROC IMPUTE; PROC DESCRIBE; PROC REGRESS  Does not assume multivariate normality.

8 Aim of the Study To impute the missing race with MI To examine the difference of estimates between complete case method and the MI method  Percentage of race  The correlation between having AIS and Race.

9 Data – Pre-Cervical Cancer Cases Three states – Kentucky, Louisiana and Michigan  Total – 3843  Kentucky: 953 (24.8%), Louisiana: 653 (17.0%), Michigan: 2237 (58.2%) Variables (17)  Demographics: race, address, age, ethnicity  Data sources: reporting facility, facility type, time at diagnosis  Disease data: site, histology code, histology terminology code, sequence code Added variable (2) – 2000 US Census  % of Whites at county level  % of Blacks at county level

10 Data Collection Process KentuckyMichiganLouisiana Data entry methods Web-based entry AIM reprogrammed Modifications to already existing methods AIM reprogrammed new web based entry form hard copy Facilities Reporting 38 hospital-based path labs 10 indep free-standing path labs 74 hospital- based path labs 2 indep labs 1 out-of-State cancer registry 47 out of 104 hospitals 0 out of 3 medical oncology centers 11 out of 13 pathology laboratories 5 out of 16 surgery centers 6 out of 10 physician offices that use E-Path

11 Descriptive Analysis Demographics for Cases in the Cervical Cancer Study VariablesN% N% StateCounty Code KY95324.8Known333886.9 LA65317.0Missing50513.1 MI223758.2 Age at DiagnosisRace 15-202977.7White215456.1 21-2593324.3Black49112.8 26 - 35147438.4Other391.0 36 - 5088423.0Missing115930.2 50+2476.4State at Diagnosis Missing80.2KY95324.8 Average31.9 LA65116.9 Ethnicity (NHIA)MI171844.7 Non-Hispanics369696.2Other40.1 Hispanics1473.8Missing51713.5

12 Descriptive Analysis (cont.) Characteristics of Cases in the Cervical Cancer Study VariablesN% N% Histology Site Carcinoma1373.6 C5303449.0 Squamous356892.8 C531862.2 Adenoma In Situ1383.6 C5381363.5 C539327785.3 Histology TerminologyReport Source AIS1363.5Hospital98425.6 CIN III285474.3Laboratory273171.1 CIS3829.9Physician1052.7 Severe Dysplasia47112.3Other230.6

13 Comparison Among The Three States CharacteristicsKentuckyLouisianaMichiganP-Value N%N%N% Race White70173.642765.4102645.9 <0.0001 Black404.218928.926211.7 Other20.2121.8251.1 Missing21022.0253.892441.3 Histology Terminology <0.0001 AIS282.9172.6914.1 CIN III51353.843766.9190485.1 CIS12813.411317.31416.3 Severe Dysplasia28429.88613.21014.5 Report Source <0.0001 Hospital0030546.767930.4 Laboratory95299.932549.8145465.0 Physician0030.51024.6 Other10.1203.120.1

14 Missing Cases – Race, State at Diagnosis, County at Diagnosis

15 Comparison Between Known and Unknown Races VariablesCases with Known RaceCases with Missing RaceP-Value N%N% Age at Diagnosis 0.0459 15-202057.7928 21-2565824,527523.8 26 - 3599537.147941.5 36 - 5063723.824721.4 50+1866.9615.3 Average 32.231.2 Ethnicity (NHIA) 0.0765 Non-Hispanics259196.5110595.3 Hispanics933.5544.7 Histology Terminology 0.0033 AIS1013.8353 CIN III19607389477.1 CIS29611867.4 Severe Dysplasia32712.214412.4 Report Source <0.0001 Hospital85531.912911.1 Laboratory177666.295582.4 Physician301.1756.5 Other230.900

16 MI Methods IVEware and SAS PROC MI  Used both methods  Only results from IVEware are presented  IVEware: http://www.isr.umich.edu/src/smp/ive/http://www.isr.umich.edu/src/smp/ive/

17 Missing Pattern – All States Missing Pattern for Three State Data State at DiagnosisRaceCounty Age at Diagnosis Year at Diagnosis Month at DiagnosisNPercent OOOOOO263968.7 OOOOOX20.1 OOOXXO10.0 OOOXOX20.1 OOXOOO5 OXOOOO67717.6 OXXOOO10.0 XOOOOO100.3 XOXOOO260.7 XXOOOO80.2 XXXOOO46812.2 XXXXOO50.1

18 Associations Multivariate logistic regression showed: Race is significantly associated with ethnicity, histological terminology type, age, state. Most notably, percent of race at county level is most dominate variable predicting race.

19 Imputation Model Variables includes race, registry, age, ethnicity, facility type, site, histology terminology code, sequence code, percentages of races at county level 10 imputation sets

20 Frequency of Race Race AllKentuckyLouisianaMichigan N%S.E.N% N%S.EN% White Complete Case215480.3 0.008670194.40.008742768.00.0226102678.10.0129 MI Method314181.70.006589493.80.008644468.00.0183180380.60.0089 Black Complete Case49118.30.0175 405.40.035718930.10.033426220.00.0247 MI Method65016.90.0065565.80.008219730.10.01803617.80.0087 Other Complete Case391.50.0195 20.30.0387121.90.0394251.90.0273 MI Method521.40.002340.40.0024121.90.00543981.60.0036

21 Logistics Regression Analysis with AIS Status as the Dependent Variable Effect Complete CaseMI O.R95% C.I.O.R.95% C.I. Registry (Baseline=Michigan) Kentucky0.5240.313-0.8790.6360.398 - 1.015 Louisiana0.6150.352-1.0750.6520.370 - 1.148 Race (Baseline= Black) White3.711.594 - 8.6452.161.048 - 4.466 Other3.860.744 - 19.9932.420.440 - 13.332 Age 1.031.012 - 1.0451.021.007 - 1.038 Sequence (1st vs. 2nd) 0.070.033 - 0.1480.040.022 - 0.086

22 Summary The high percentage of cases with missing race likely introduced bias to the estimate of proportion of race, mainly among data from Michigan. The results shows that whites have much higher risk of getting AIS than blacks. Quantitative differences in estimates between the two methods were found in the logistic model. MI is relatively easy to implement and is appropriate for a wide range of datasets.

23 Acknowledgements CDC – Deblina Datta and staff Kentucky Cancer Registry: Thomas Tucker, Mary Jane Byrne, Brent Shelton Michigan Cancer Registry: Glenn Copland, Won Silva and staff Louisiana Cancer Registry: Vivien Chen and staff Macro International - Benita O’Colma

24 Words to Share John Wooden - “Be quick, but don’t hurry” “If you don’t have time to do it right, how will you find time to do it again?”

25 Questions? Bin Huang bhuang@kcr.uky.edu 859-219-0773 x 280 Thank You ! Merci !


Download ppt "Multiple Imputation and Missing Race in the Pre-Invasive Cervical Cancer Study among Three States 2010 NAACCR Conference Quebec City, June 22, 2010 Bin."

Similar presentations


Ads by Google