Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Editing with Hard and Soft Edits – Some First Experiences Sander Scholtus Sevinç Göksen (Statistics Netherlands)

Similar presentations


Presentation on theme: "Automatic Editing with Hard and Soft Edits – Some First Experiences Sander Scholtus Sevinç Göksen (Statistics Netherlands)"— Presentation transcript:

1 Automatic Editing with Hard and Soft Edits – Some First Experiences Sander Scholtus Sevinç Göksen (Statistics Netherlands)

2 Automatic Editing with Hard and Soft Edits - Some First Experiences 1 Introduction Error localisation problem: Try to identify variables with erroneous/missing values Edits: Constraints that should be satisfied by the data Hard (fatal) – e.g. Turnover – Costs = Profit Soft (query) – e.g. Profit / Turnover ≤ 0.6 Manual editing: hard and soft edits Automatic editing: only hard edits

3 Automatic Editing with Hard and Soft Edits - Some First Experiences 2 Error localisation (1) Fellegi and Holt (1976): Find the smallest (weighted) number of variables that can be imputed so that all edits are satisfied Minimise so that all edits are satisfied No room for soft edits

4 Automatic Editing with Hard and Soft Edits - Some First Experiences 3 Error localisation (2) Alternative approach: Choose a function D soft that measures the degree of suspicion associated with particular soft edit failures Minimise so that all hard edits are satisfied Prototype algorithm in R (based on editrules )

5 Automatic Editing with Hard and Soft Edits - Some First Experiences 4 Simulation study (1) Two data sets: Dutch SBS 2007, medium-sized wholesale businesses Raw and manually edited data available One half used as test data, one half as reference data Test data set 1: 728 records, 12 variables, 16 hard edits, 10 soft edits Synthetic errors Test data set 2: 580 records, 10 variables, 17 hard edits, 24 soft edits Real errors

6 Automatic Editing with Hard and Soft Edits - Some First Experiences 5 Simulation study (2) editing approach (choice of D soft )% records with perfect solution data set 1data set 2 no soft edits, only hard edits40.2%58.4% all edits as hard edits36.8%n/a

7 Automatic Editing with Hard and Soft Edits - Some First Experiences 6 Choices for D soft – fixed weights (1) Fixed failure weights: Resulting target function to be minimised: Higher failure weight  ‘harder’ soft edit

8 Automatic Editing with Hard and Soft Edits - Some First Experiences 7 Choices for D soft – fixed weights (2) Possible choices for s k : A.All failure weights equal to 1 B.Proportion of records that satisfy edit k in manually edited reference data Interpretation: P(edited record satisfies edit k) C.P(edited record satisfies edit k | raw record fails edit k) Alternative: categorised versions of B and C

9 Automatic Editing with Hard and Soft Edits - Some First Experiences 8 Simulation study (3) editing approach (choice of D soft )% records with perfect solution data set 1data set 2 no soft edits, only hard edits40.2%58.4% all edits, using soft edits as hard edits36.8%n/a sum of fixed failure weights A47.3%63.4% sum of fixed failure weights B52.1%60.9% sum of fixed failure weights C43.3%60.7% sum of fixed failure weights B(cat)50.0%64.5% sum of fixed failure weights C(cat)43.1%64.5%

10 Automatic Editing with Hard and Soft Edits - Some First Experiences 9 Choices for D soft – quantile edits (1) Drawback of fixed failure weights: no difference between large and small edit failures Trick: quantile edits

11 Automatic Editing with Hard and Soft Edits - Some First Experiences 10 Choices for D soft – quantile edits (2) Idea: use different versions of the same edit by varying one of the constants Choose values for this constant based on the fraction of reference data records that fail theresulting edit (e.g. 1%, 5%, 10%)

12 Automatic Editing with Hard and Soft Edits - Some First Experiences 11 Choices for D soft – quantile edits (3) Example: ratio edit x 1 / x 3 ≥ c % records failedc in ref. dataquantile editsksk cumul. s k 10%0.75x 1 / x 3 ≥ 0.7511 5%0.60x 1 / x 3 ≥ 0.6012 1%0.10x 1 / x 3 ≥ 0.1013

13 Automatic Editing with Hard and Soft Edits - Some First Experiences 12 Simulation study (4) editing approach (choice of D soft )% records with perfect solution data set 1data set 2 no soft edits, only hard edits40.2%58.4% all edits, using soft edits as hard edits36.8%n/a sum of fixed failure weights A47.3%63.4% sum of fixed failure weights B52.1%60.9% sum of fixed failure weights C43.3%60.7% sum of fixed failure weights B(cat)50.0%64.5% sum of fixed failure weights C(cat)43.1%64.5% 10-5-1%-quantile edits, weights 0.33-0.33-0.3354.4%63.4% 10-5-1%-quantile edits, weights 0.90-0.05-0.0556.5%63.8%

14 Automatic Editing with Hard and Soft Edits - Some First Experiences 13 Choices for D soft – dynamic expressions Size of edit failure: e k Linear equality edit: a k1 x 1 + … + a kp x p + b k = 0 Take: e k = | a k1 x 1 + … + a kp x p + b k | Linear inequality edit: a k1 x 1 + … + a kp x p + b k ≥ 0 Take: e k = max{ 0, –(a k1 x 1 + … + a kp x p + b k ) } Use reference data to standardise: Linear sum: Mahalanobis distance:

15 Automatic Editing with Hard and Soft Edits - Some First Experiences 14 Simulation study (5) editing approach (choice of D soft )% records with perfect solution data set 1data set 2 no soft edits, only hard edits40.2%58.4% all edits, using soft edits as hard edits36.8%n/a sum of fixed failure weights A47.3%63.4% sum of fixed failure weights B52.1%60.9% sum of fixed failure weights C43.3%60.7% sum of fixed failure weights B(cat)50.0%64.5% sum of fixed failure weights C(cat)43.1%64.5% 10-5-1%-quantile edits, weights 0.33-0.33-0.3354.4%63.4% 10-5-1%-quantile edits, weights 0.90-0.05-0.0556.5%63.8% sum of standardised soft edit failures49.2%? Mahalanobis distance of soft edit failures46.8%?

16 Automatic Editing with Hard and Soft Edits - Some First Experiences 15 Conclusion Using soft edits  improved error localisation Choice of D soft : Results not unequivocal Quantile edits seem to work well Room for improvement Future work: Extended simulation study with mixed data/edits


Download ppt "Automatic Editing with Hard and Soft Edits – Some First Experiences Sander Scholtus Sevinç Göksen (Statistics Netherlands)"

Similar presentations


Ads by Google