Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paul Smith Office for National Statistics

Similar presentations

Presentation on theme: "Paul Smith Office for National Statistics"— Presentation transcript:

1 Paul Smith Office for National Statistics
Methodological challenges in integrating data collections in business statistics Paul Smith Office for National Statistics

2 Outline Data quality for different sources Combinations of sources
quality measures for survey and administrative inputs quality measures for outputs Combinations of sources familiar and more advanced situations Mode effects Models Discussion

3 Statistical data collections - quality
Relevance generally questions conform to desired concepts may be tailoring for practicality consistency across collections even if concepts differ Accuracy affected by sampling impacts from non-response, measurement error Timeliness generally relatively timely

4 Administrative data - quality
Relevance questions conform to administrative (not statistical) concepts few concessions to statistical needs Accuracy unaffected by sampling processes to discourage non-response treatment of measurement error differs by variable Timeliness generally slow

5 Differences between types of source
Sampling accuracy is measurable for surveys, not relevant for administrative data sources confidence in quality reduced for admin data balance of accuracy measures different Building statistical requirements into administrative series requires negotiation and agreement VAT classification information in the UK INSEE has statistical and accounting information well integrated

6 Questionnaire design Questionnaire design principles mostly used in designing statistical collections Administrative data seen as “forms” not “questionnaires” less attention to question phrasing to obtain required answer more on statutory requirements

7 Output data quality Data quality from combined outputs can be challenging to measure function of the qualities of the input sources, and the methods used to combine them some well-known general approaches development of measures needed for particular cases (eg from models)

8 Combinations of sources - 1
Frame and sample information Sampling frames typically derived from administrative sources Multiple uses of frame information sample design sample selection validation and editing estimation and variance estimation Quality easily derived – standard situation

9 Combinations of sources - 2
Dual-frame surveys More than one administrative source Pension funds survey in the UK Units Business register Challenges of population inflation if matching not perfect Estimate probability that unit appears in sample from either source use in appropriate weighting procedure adjustment for P(in both surveys) depends on survey type

10 Combinations of sources - 3
Multiple surveys different periodicity summary information monthly, detail annually for example capital expenditure – quarterly breakdown, annual summary Benchmarking where short-period surveys small (and variable) and annual larger (and less variable) Quality measures account for sampling error in both sources account for non-response and measurement errors in larger survey

11 Combinations of sources - 4
Auxiliary information If administrative concept not close to statistical concept, data may still be useful Auxiliary information in estimation not required to be correct, only correlated with outcome the better the correlation, the better the accuracy Auxiliary information in validation use tax data to improve validation follow-up activity Data confrontation Use multiple sources to identify discrepancies Balancing

12 Mode effects Mode effects manifest in several ways
differences in contact rate differences in response rate given contact differences in question replies given response Test differences through a designed experiment (van den Brakel & Renssen 1998, 2005) evaluates whole-process differences (not individual steps) non-response adjustment if good predictors for response amongst auxiliary data (var increases) model-based adjustments for other changes

13 Temporal differences Administrative data often have longer reference period than statistical requirement Implies temporal disaggregation (model-based) – Dagum & Cholette 2006 Quality implications estimated data as inputs sensitivity of model to interesting changes

14 Models for combining data
Full flexibility in combining data available through modelling approach Models at boundary between statistical producer and user Ideally statistical results insensitive to model assumptions small area estimates useful for social surveys challenges for business surveys not yet resolved modelling for unit structures - BRES

15 Discussion Aim: more from existing sources Mixed mode collections
often imperfect matches modelling only appropriate approach subjective robust to assumptions sensitivity analysis Mixed mode collections usability and low cost data combination quality components harder to measure

16 for more details see the paper, or contact

Download ppt "Paul Smith Office for National Statistics"

Similar presentations

Ads by Google