Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census.

Similar presentations


Presentation on theme: "Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census."— Presentation transcript:

1 Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census Bureau Presented by Samson Adeshiyan

2 2 2002 Survey Of Business Owners (SBO) Primary Goal Provide Business Ownership Statistics –State –Industry –Demographic Group Race --- Native American, Asian, Black, Hawaiian/Pacific Islander, White, Public Ethnicity --- Hispanic, Non-Hispanic Gender --- Female, Equal, Male

3 3 SBO Primary Publication Level Statistics Black-owned Grocery Stores in North Dakota (ND) –Number –Aggregate Sales –Aggregate Payroll –Aggregate Employment

4 4 What Do We Have? (Econ Census and Tax Returns) 5.5 mil. companies with paid employees – Receipts, Payroll, Employment – Geographic Codes – Industry Codes 17.5 mil. companies without paid employees –Receipts –Industry and Geography Codes

5 5 What Are We Missing For Each Business? Race of Ownership Ethnicity of Ownership Gender of Ownership Obtain this from a stratified sample of 2.5 million businesses

6 6 Distribution At the US Level 23 Million Companies Women --- 28% Hispanic --- 7% Black --- 5% Asian --- 5% Native American --- 1% Hawaiian/Pacific Islander --- 0.1%

7 7 Problem 1: Need Sufficient Representation in the Sample Black-Owned Groceries in ND 2002 Estimates –78 Black-owned businesses in ND –15 of these in Retail –Only 4 are Grocery Stores Can’t list groceries in ND in random order and sample systematically

8 8 “Modeled Guess” Codes from Admin Info For Each Company Response from a Previous SBO Population Distribution by ZIP Code State/Industry Distribution in 1997 SBO Owner’(s) Social Security Number when Available –Race/Hispanic/Gender Codes on SSN Application –Surnames (e.g. LOPEZ or WANG) –Country of Birth (e.g. Korea or CUBA) –Decennial Responses

9 9 Example Name …. Michelle Wie’s Pro Shop Modeled Guess …. Asian Female Likelihood-Race ……. 0.8912 Likelihood-Hisp ……. 0.0012 Likelihood-Female …. 0.9500

10 10 Warning: Model is not 100% accurate Michelle Wie’s Pro Shop –Responds As White, Non-Hispanic,Male –Tabbed As White, Non-Hispanic,Male If Business response is inconsistent with modeled likelihoods, tabulate by the responses If a business does not respond, don’t directly infer responses from likelihoods

11 11 Problem 2: Differential Response Rates Between Demographic Groups OwnerLikelihood-HispanicResponse Jose Martinez0.985Hispanic John Martinez0.940??? Jose’s Sub Shop0.123Non-Hispanic Juanita Martin0.060Non-Hispanic John Martin0.040Non-Hispanic

12 12 Likelihoods Aid in Non-Response Adjustment Likelihood-HispanicResponseWeight 10.985Hispanic4.0 20.940???4.0 30.123Non-Hispanic 4.0 40.060Non-Hispanic 4.0 50.040Non-Hispanic4.0 Response Rate Adjusted Hispanic-owned Est…5.0 (4.0 * 5/4) Hot Deck Imputed Hispanic-owned Est … 8.0 (4.0 + 4.0)

13 13 For Variance: Random Group Replication (RG) Considerable number of cases where the modeled guess disagrees with the actual response –Cases tabbed from other stratum –Considerable variability in the weights of the tabulated cases

14 14 Likelihoods Aid in Non-Response Adjustment LikeResponseWeightRGRcts 10.98Hispanic 4.0110 20.94??? 4.02 1 30.12Non-Hispanic 4.03 5 40.06Non-Hispanic 4.04 6 50.04Non-Hispanic 4.05 8 Imputed Hispanic Firms Est = 8 Imp Hispanic Receipts = 44

15 15 For variance calculation: Wt Adjustment Method Factors on Responding Firms Firms –Respondents Estimate = 4 –Post Impute Estimate = 8 –Weight Adjustment Factor = 2.0 Receipts –Respondents Estimate 40 –Post Impute Estimate = 44 –Weight Adjustment Factor = 1.1

16 16 Oh-Scheuren Adjustment Factor (1983) r = # respondents i = # imputed cases n = i + r = total number of cases V1 = variance with impute treated as reported V2 = V1 * (n/r + i/n)

17 17 Oh-Scheuren Method Problems with Comparison Research developed for Jackknife not Random Group Calculate response rates for cell Best response for our example –Not Missing Random –True response rate is 4 of 5 –Response rate for Hispanics is 1 of 2

18 18 Donor Imputation Method (RG # Also Donated) LikelihoodResponseWeightRGReceipts 10.98Hispanic4.0110 20.94???4.02 1 10.98Hispanic4.0110 20.94Hispanic4.01 1 Imputed Hispanic Firms Est = 8 Imputed Hispanic Receipts = 44 Only RG #1 is non -zero. Same Estimates. Higher Variances.

19 19 Advantages of Donating RG # No need to add multiple factors to record No need to calculate factors No problems for microdata users

20 20 Compare the Ratios of the Variance of the three Methods R 1 = VAR(Oh-Scheuren) / VAR (Weighted Adjustment) R 2 = VAR(Donor) / VAR (Weighted Adjustment) Mean for R 1 and R 2 across publication cells Std Dev for each of the means of R 1 and R 2 Null Hypothesis:R i = 1 (90% confidence)

21 21 Ratio of Variances --- Firm Counts * Not Statistically Significant from 1.00 at 90% # ImputesOh-Sch/ WtDonor/Wt 1 to 31.1480.984* 4 to 51.1760.963 6 to 91.1360.941 10 to 191.0871.069 20 to 491.0691.205 50 or more1.0531.367

22 22 Ratio of Variances --- Receipts * Not Statistically Significant from 1.00 at 90% # ImputesOh-Sch/ WtDonor/Wt 1 to 31.2300.958* 4 to 51.2860.876 6 to 91.5400.963* 10 to 191.5410.914 20 to 491.4990.900 50 or more1.5120.951

23 23 Ratio of Variances --- Firm Counts * Not Statistically Significant from 1.00 at 90% Response Rate Oh-Sch/ WtDonor/Wt 45 to 55%0.9301.193 55 to 65%1.0761.182 65 to 75%1.1531.101 75 to 85%1.1301.043 85 to 95%1.1531.032*

24 24 Ratio of Variances --- Receipts * Not Statistically Significant from 1.00 at 90% Response Rate Oh-Sch/ WtDonor/Wt 45 to 55%1.7900.902 55 to 65%1.5200.904 65 to 75%1.4650.940 75 to 85%1.2180.945 85 to 95%1.1530.954

25 25 Are the differences acceptable? Firm Count Variance Ratios Differ by 10% Receipts Variances Differ up to 70% => Firm Count Relative SEs Differ by about 5% Receipts Relative SEs Differ by up to 30%

26 26 Asian-Owned Retail Operations in New Hampshire in 2002 EstimatePublished RSE Max Change in RSE Firms21023%+ 1% Receipts$70 Mil19%+ 6%

27 27 Lingering Question Is the donation of the RG Number sufficient or do we need to augment the resulting variance with a factor (similar to the Oh-Scheuren factor)?

28 28 Any Questions? Richard Moore Richard.A.Moore.Jr@census.gov


Download ppt "Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census."

Similar presentations


Ads by Google