Download presentation
Presentation is loading. Please wait.
Published byStewart Wells Modified over 9 years ago
1
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical Commission and Economic Commission for Europe Conference of European Statisticians
2
Outline Introduction Tax data programs at Statistics Canada The Annual Survey of Manufactures (ASM) Overview Strategy for use of tax data Analytical studies Conclusions and Future Work
3
Introduction Desire to increase use of tax data Reduce respondent burden Reduce survey costs Can be used at many stages of survey process Stratification Survey data validation Edit and imputation Estimation
4
Tax Data programs at Statistics Canada Tax data available to Statistics Canada Collected by Canada Revenue Agency (CRA) Access via a data-sharing agreement To be used only for statistical purposes Two extensive tax data programs Unincorporated businesses (T1) Incorporated businesses (T2)
5
Tax Data programs at Statistics Canada (cont’d) T1 - Population Unincorporated businesses Account for small share of revenues Administrative Data Sample-based Limited set of variables Edit and imputation is applied Weighted benchmarked estimates
6
Tax Data programs at Statistics Canada (cont’d) T2 - Population Incorporated businesses Account for large share of revenues Administrative Data Census-based Extensive set of variables Edit and imputation is applied Micro-data is produced
7
The Annual Survey of Manufactures Manufacturing is an important sector of Canadian economy ~17% of GDP Annual Survey of Manufactures Take-none Portion and Survey Portion Extensive questionnaire (financial and commodity) Data requirements (pseudo-census)
8
The Annual Survey of Manufactures (cont’d) Target population Drawn from Statistics Canada’s Business Register (BR) All businesses classified to manufacturing Sample design Non-survey portion Administrative data Survey portion Stratified SRS (Stratum = NAICS * Province * Size) Small take-some / Large take-some / Take-all Collected via mail-out / mail-back, follow-up via telephone
9
The Annual Survey of Manufactures (cont’d) Edit and Imputation Edits applied to ensure accuracy and coherence Extensive imputation to produce ‘pseudo-census’ dataset Historical imputation Ratio imputation Nearest-neighbour donor imputation
10
The Annual Survey of Manufactures (cont’d) Estimation Non-survey portion (tax data) Total Expenses only T1: weighted domain estimates T2: aggregates from administrative census dataset Survey portion (survey data and imputed data) Aggregates from pseudo-census dataset Domains of interest: NAICS and Province
11
Analytical Studies Motivation for two studies: Which variables should be ‘replaced’? What are the effects of the strategy on final estimates for all variables? Study 1 – Data comparison Study 2 – Impact Analysis
12
Analytical Study 1 Study to select appropriate variables Comparison of reported data collected via survey and tax Simple businesses only Assess suitability for substitution of survey data Based on ~6,000 businesses
13
Analytical Study 1 (cont’d) Correlation Analysis Wide range of correlations Total Expenses: 0.9 Total Energy Expenses: -0.10 Reporting Patterns Same pattern (zero or positive) for individual businesses Total Expenses: 99% Total Energy Expenses: 50%
14
Analytical Study 1 (cont’d) Distribution of Ratios Examined histograms, fraction between 0.9 and 1.1 Total Expenses: 60% Total Energy Expenses: 16% Population Estimates Relative difference between tax and survey-based estimates Total Expenses: 3% Total Energy Expenses: 28%
15
Analytical Study 1 (cont’d) Selected several variables for direct substitution Section totals and sub-totals expenses, revenues, inventories, etc. Remaining variables are imputed Imputation => assign distribution of details within each total
16
Analytical Study 1 - Conclusions Distinctively different results for different variables Direct substitution seems feasible for totals Direct substitution not recommended for details Use standard methods to impute other variables
17
Analytical Study 2 Analysis to evaluate impact of tax data strategy Bias Comparison of estimates from different scenarios Variance Shao-Steel approach for variance estimation Reflects variance from sampling and imputation Assume equal probability of response within imputation class
18
Analytical Study 2 (cont’d) Scenarios Tax Data Used in Imputation EstimatorVariance HT – No Tax None (ratio imputation based on frame revenues) Horvitz- Thompson Sampling Imputation PC – No Tax None (ratio imputation based on frame revenues) Pseudo- census Imputation PC - Tax Non-response (in or out of sample) Direct substitution Ratio imputation Pseudo- census Imputation
19
Analytical Study 2 (cont’d) Comparison of resulting estimates for Total Expenses Relative Difference from “HT – No Tax” – Total Expenses * Median value for all such domains All Manufacturing NAICS3 x Province* PC – No Tax1.8%0.0% PC – Tax0.5%1.3%
20
Analytical Study 2 (cont’d) Comparison of estimated CV’s for Total Expenses Co-efficient of Variation – Total Expenses * Median value for all such domains All Manufacturing NAICS3 x Province* HT – No Tax0.3%1.5% PC – No Tax0.3%1.5% PC – Tax0.1%0.7%
21
Analytical Study 2 (cont’d) Comparison of resulting estimates for Total Energy Expenses Relative Difference from “HT – No Tax” – Total Energy Expenses * Median value for all such domains All Manufacturing NAICS3 x Province* PC – No Tax1.2%0.0% PC – Tax0.8%1.2%
22
Analytical Study 2 (cont’d) Comparison of estimated CV’s for Total Energy Expenses Co-efficient of Variation– Total Energy Expenses * Median value for all such domains All Manufacturing NAICS3 x Province* HT – No Tax0.3%1.8% PC – No Tax0.4%1.8% PC – Tax0.4%1.8%
23
Analytical Study 2 - Conclusions Bias Small relative difference between estimated totals from scenarios Variance Relatively low CV for all options Tax substitution variables: Scenario 3 most efficient Non-tax substitution variables: Scenario 1 most efficient Analytical capabilities Scenarios 2 and 3 provide most detail
24
Conclusions Results used to select 2004 strategy – “PC – Tax” Meets needs of data users Reduced cost and response burden Maintain (improve) quality Striving to further increase use of tax data Increased portion of population Increased number of variables
25
Future Work Editing of tax data Similar approach to survey data approach Potential to expand list of direct substitution variables Indirect use of tax data More adaptive models Quality indicators Account for increased variance and potential for bias due to imputation
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.