Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.

Similar presentations


Presentation on theme: "Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy."— Presentation transcript:

1 Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy

2 Overview Introduction Related projects Combining data sources Selective editing – data sources and tools Selective editing in SDWH Framework Proposed case studies Deliverable outcomes and recommendations

3 Introduction Selective editing options for a Statistical Data Warehouse – including options for weighting the importance of different outputs UK and Italy Review or quality assure – Sweden (SELEKT) Q1: Would you like to review and give comments? (Yes/No)

4 Statistical Data Warehouse (SDWH) Benefits: – Decreased cost of data access and analysis – Common data model – Common tools – Drive increased use of administrative data – Faster and more automated data management and dissemination

5 Statistical Data Warehouse (SDWH) Drawbacks: – Can have high cost – maintenance and implement changes – Tools may need to be developed for statistical processes – Methodological issues of SDWH framework – covered by WP2 Phase 1 (SGA-1)  “Work in progress” for most NSIs

6 Combining data sources Many NSIs using admin data or registers to produce statistics Advantages include: – Reduction in data collection and statistical production costs; large amount of data available; re-use data to reduce respondent burden. Drawbacks include: – Different unit types (statistical and legal); timeliness; variable definition discrepancies. Mixed source usually required

7 Editing UNECE Glossary of terms on Statistical Data Editing: – “an activity that involves assessing and understanding data, and the three phases of detection, resolving, and treating anomalies…” Large amount of literature on: – Editing business surveys – Editing administrative data

8 Aims and related projects This deliverable aims to add value by investigating how to edit (selective editing) when combining sources Mapping with other projects: – EssNet on Data Integration – EssNet on Administrative Data – MEMOBUST – EDIMBUS Project (2007) – EUREDIT Project (2000-2003) – BLUE-ETS Q2: Do you know of any other relevant projects? (Yes/No)

9 Editing combined data sources SDWH will combine survey, register and admin data sources Editing required for: – maintaining business register and its quality; – a specific output and its integrated sources; – Improving the statistical system. Part of quality control in SDWH Split processes for data sources? (e.g. France)

10 Combined sources - Questions… Q3: Do you currently combine data sources? – A. Yes; B. No; C. Unsure. Q4: Do you have separate editing processes for each data source? – A. Only survey data edited (do not edit admin data); – B. Data sources edited separately; – C. Data sources edited separately, but units/variables in both sources edited for coherence; – D. Other.

11 Selective editing Editing – traditionally time consuming and expensive Selective / significance editing: – Prioritises based on score function that expresses the impact of their potential error on estimates – Score should consist of risk (suspicion) and influence (potential impact) components – Divide anomalies into a critical and a noncritical stream for possible clerical or manual resolution (possibly including follow-up) – More efficient editing process

12 Selective editing – Survey and Admin data Use as auxiliary data in selective editing score function for survey data (e.g. UK, Italy) Use score of differences between data sources to determine which need manual intervention (e.g. France) Use scores based on historical data Apply selective editing to admin data, same score function as survey data, but weights=1 (e.g. France SBS system)

13 Selective editing – question Q5: Is selective editing used in the processing of admin/register data at your organisation? – A. No; – B. No, but admin data used as auxiliary for selective editing of survey data; – C. No, but a score function is used to compare data sources; – D. Yes, selective editing is applied to admin data; – E. Not sure.

14 Selective editing – tools SELEMIX – ISTAT SELEKT – Statistics Sweden Significance Editing Engine (SEE) – ABS SLICE – Statistics Netherlands Q6: Are you aware of any other selective editing tools? – A. Yes, I can provide documentation; – B. Yes; – C. No.

15 Selective editing in SDWH Methodological issues: – Survey weight not meaningful in SDWH Weight=1? Several sets of weights tailored for different uses? – Selective editing data “without purpose” Importance weight for all potential uses? Alternative editing approach? – Scores to compare data sources Should score functions be used, or all discrepancies be followed up, or automatically corrected? – Selective editing of admin data – manual intervention? Is selective editing appropriate if manual intervention is not possible? Should automatic correction be applied to admin data identified as suspicious?

16 Any solutions? … Survey weights used in selective editing score not meaningful – Q7: What do you think would be the best options: A. Everything in SDHW represents itself and therefore weights=1 B. Calculate several survey weights for all known uses of unit data item and incorporate into one global score C. Calculate separate scores for all outputs, and combine (max, average, sum) D. Other – discuss!

17 Any solutions? … Selective editing data “without purpose” – Q8: Is selective editing appropriate if the data will be used multiple times, with unknown purpose at collection? A. No; B. No, another editing approach would be better; C. Yes, we would use key known/likely outputs to calculate the score; D. Yes, I can suggest/recommend a solution; E. Not sure;

18 Any solutions? … Scores to compare data sources – Q9: Should score functions be used to compare sources, or all discrepancies be followed up, or automatically corrected? A. All discrepancies need to be investigated by a data expert; B. All discrepancies need to be flagged, and can then be corrected automatically; C. Scores should be used to flag only significant/influential discrepancies, which should be investigated by a data expert; D. Scores should be used to flag only significant/influential discrepancies, which can then be corrected automatically; E. Other – discuss? F. Not sure.

19 Any solutions? … Selective editing of admin data – Q10: Is selective editing appropriate if manual intervention is not possible? A. No, only correct for fatal errors, systematic errors (e.g. unit errors), and suspicious reporting patterns; B. No, identify all errors/suspicious values and automatically correct/impute; C. Yes, identify only influential errors to avoid over editing/imputing admin source; D. Yes, as well as fatal errors, systematic errors and suspicious reporting patterns – to also identify influential errors; E. Other; F. Not sure.

20 Experimental studies ISTAT: Prototype DWH for SBS – Use SELEMIX – Combine statistical and admin data sources at micro level to estimate variables on economic accounts, known domains – Evaluate the quality of model-based selective editing and automatic correction – Re-use available data for other output ONS: Combined sources for STS – Use SELEKT – Monthly business survey and VAT Turnover data – Compare selective editing or traditional editing of admin data (followed by automatic correction), known domains – Re-use available data for other output

21 Deliverable outcome - recommendations Draft report put on CROS-portal – will include input from this workshop Provide recommendations for methodological issues of using selective editing in SDWH – Using best practice from NSIs, and – Outcome from experimental studies. Metadata checklist

22 Metadata requirements Input to editing: – Quality indictors (e.g. of data source) – Threshold for selective editing score – Potential publication domains – Question number – Predictor/Expected value for score (e.g. historical data, register data) – Domain total and/or standard error estimate for score – Edit identification – … Output from editing: – Raw and edited value – Selective editing score – Error number/description/type – Flag if suspicious – Flag if changed – …

23 Thank you!


Download ppt "Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy."

Similar presentations


Ads by Google