Eurostat Secondary data: collection and use
Presented by Arnout van Delden Methodologist Statistics Netherlands
Secondary data
Secondary Sources Secondary Sources Registers Base registers Statistical registers specific
PAST PRESENT FUTURE
Official Statistics Post-war II Identifiers Concepts: variable, units, time Population registers Administrative Census –Denmark (1981), Finland (1991), Netherlands (2001)
Use ( EU/EFTA Survey 2010 ) Frame Observations Auxiliary data Model parameters Data quality admin data only admin and survey data survey data only not specifiednon responseTotal BR12,016,02,030 SBS10,511,54,70,72,730 STS4,011,014,00,01,030 Prodcom0,010,013,01,02,026
In sum Many types of data sources Long history Potentially very useful
Collection Existence Access
Existence Data protection act Organisation registers data under DPA
Existence Data protection act Organisation registers data under DPA
Access ElementExplanation LegislationNational Statistics Act Public approvalInformed consent Identification codesBase registers (business, dwellings, …) Reliable dataObliged to report errors; multi users CooperationContacts with administration authorities
In Sum Explore potential data sources Access: legal uses and public consent
Proper use
Exploration phase Source Meta
Processing phase: data useful? March ‘04Dec ‘04 Turnover Sample Survey
Data patterns UnitPeriodValueUnitPeriodValue Q Q Q Q Q Q Q Q42200
Issues to consider DimensionIssuesMethods TimeReporting delaysNow casting, imputation Reporting Statistical period Harmonisation (time series) RepresentationAdministrative unitsLinkage Coverage errorsBusiness register MeasurementData patternsModel/time series CorrectionsUpdates Different meaningAnalyse
Access
Set of base registers data re-used report errors 1 contact person in NSI large dependency users
Properties of Administrative data 1 Collected externally 2 Administrative goal 3 Different objectives 4 Subject to changes
2 Can I use of a specific data source? What ‘steps’ are needed? Existence Access Fitness for use Fall back scenario’s Processing
Processing: data integration Linkage Micro-integration Imputation/weighting Macro-integration
Fall back scenarios Quarterly turnover from Survey en Admin data –Risk only data from month 1 and 2 –Model: missing units predicted from respondents –Indicator: how many and which units to call
Fall back scenarios Risk analyses Strategy fall back scenario –Obtain missing data elsewhere? –Model-based approach –Inform users –Postpone publication
Processing: robust estimation Medical expenses (volume, prices) Coding system for medical treatments First coding in 2008 Coding slightly revised 2009 New coding system 2010
Fitness for use DimensionDescription Technical ChecksTechnical usability of file and data Accuracy1) Closeness to true values, 2) Correctness, reliability CompletenessDescribe the corresponding set of real-world objects and variables Time-related dimensionRime and/or stability related IntegrabilityCapable of undergoing integration or of being integrated. Data
Use Type of useExampleSource type Population frameChamber of Commerce data for Business Register Base register Source for observations VAT data for quarterly turnover estimatesPublic admin source Auxiliary dataInternet data to verify the NACE code of enterprises Organic source Estimation of model parameters Energy supplier data for average energy consumption for CPI Private admin source Audit quality of statistical data Social security data to assess quality employment position based on sampling data Public admin source
Concluding remarks Merits –Reduction response burden –Detailed & Longitudinal –Longitudinal data Consequences –Relations with administrative data holder –Prone to changes