Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data processing German foreign trade statistics

Similar presentations


Presentation on theme: "Data processing German foreign trade statistics"— Presentation transcript:

1 Data processing German foreign trade statistics
ADVANCED ISSUES IN INTERNATIONAL TRADE IN GOODS STATISTICS ESTP training course 2 – 4 April 2014 German foreign trade statistics

2 Data processing

3 German foreign trade statistics
Up to 30 million records per month First results 40 days after reference period High efficient data processing necessary Uneven distributed statistical value Most records have limited effect on results

4 The ASA System

5 ASA: Data submission monitoring

6 Data submission monitoring
Monitoring the very important enterprises (“Top 60“) for the German foreign trade Checking of the variance Investigation of large deviations from previous year or month Identification of unusual deviations for all enterprises Acceptance Factor: (Current Value – Mean Value)/Std. Dev. Fast correction or confirmation of unusual values

7 Data submission monitoring

8 Data submission monitoring
Checking of data delivery Structural Checks Data file format, Field format, Readability, Statistics Delivery specific checks Declaration attributes: Form, Flow, Specific Number, Doublet Declaration specific checks: Tax Number, Serial errors Processing serial errors in the data declarations A data delivery with more then 250 errors is generally rejected Approval of data declarations for the main (micro-) data processing

9 The ASA system: Selective editing

10 The selective editing process
Limited capacity for manual correction Important data records are corrected manually The vast majority of the data records have limited impact on the results Rather unimportant data records are corrected by automated procedures Rule-based procedures Hot-Deck procedure Regression-based procedure

11 Selective editing: Threshold values
Prioritization by CN8 specific threshold values High quality results for all commodity codes Determination of the important micro data for the results The highest potential value of a record (according to statistical value, supplementary unit and net mass) is compared with the threshold value of the respective CN8 code Threshold values are calculated by the processed error free micro data of the previous 12 months

12 Selective editing: Threshold values
<25% >75% Threshold (75%) for CN8 code flow arrivals: ( )/2=4150

13 Classification by fictional value
The statistical value can be erroneous The fictional value (highest potential value) is less vulnerable for errors The fictional value is the maximum of: The statistical value The average statistical value per supplementary unit multiplied by the supplementary unit The average statistical value per net mass multiplied by the net mass

14 Selective editing: Validation checks
The data records are compared with reference data in order to find errors and to prioritize them The reference data and validation rules are managed by the tool “BASE PL-Editor“ The validation rules and the structure of the reference data are implemented in the ASA system by a XML file (Definite) Errors and possible errors

15 Selective editing: Validation checks
Errors Invalid codes Very unusual unit-price Invalid combinations Possible Errors Unlikely Partner countries etc. Unlikely unit-price, value Unlikely combinations

16 Selective editing: Validation checks

17 Selective editing: Validation checks

18

19 The ASA system: Selective editing

20 Selective editing: Automated correction
Deterministic error correction If – then correction rules Effective method provided a strong correlation between variables For example: CN8 code and mode of transport Typical errors For example: Numerical code instead of Iso-Alpha Numerical variables The supplementary unit and net mass are corrected by the statistical value and the average ratio

21 Selective editing: Automated correction
Hot-Deck error correction Correcting erroneous micro data by imputing values of error free micro data (donor records) Only categorical variables Nearest-Neighbor approach for donor determination Calculating of the distance between the records Weighting of the variables In most cases a donor with the same CN8 code Avoiding outliers as donors Considering the impact on the donor result

22 Selective editing: Automated correction
Hot-Deck Donor determination Variable 1 Variable 2 Variable 3 Distance w 1 =1 2 3 =2 Erroneous record A B C Potential donor 1 D Potential donor 2 Potential donor 3 Corrected record å = - k XY y x

23 The ASA system: Outlier detection

24 Outlier detection Comparison of current results with results of previous 12 months Outliers are highlighted by the Acceptance Factor (Current value – Mean value)/Std. dev. Detailed results at CN8 level CN8 result Partner country result Statistical value, net mass, supplementary unit and their ratios

25 Outlier detection


Download ppt "Data processing German foreign trade statistics"

Similar presentations


Ads by Google