Data processing German foreign trade statistics

Data processing German foreign trade statistics
ADVANCED ISSUES IN INTERNATIONAL TRADE IN GOODS STATISTICS ESTP training course 2 – 4 April 2014 German foreign trade statistics

Data processing

German foreign trade statistics
Up to 30 million records per month First results 40 days after reference period High efficient data processing necessary Uneven distributed statistical value Most records have limited effect on results

The ASA System

ASA: Data submission monitoring

Data submission monitoring
Monitoring the very important enterprises (“Top 60“) for the German foreign trade Checking of the variance Investigation of large deviations from previous year or month Identification of unusual deviations for all enterprises Acceptance Factor: (Current Value – Mean Value)/Std. Dev. Fast correction or confirmation of unusual values

Checking of data delivery Structural Checks Data file format, Field format, Readability, Statistics Delivery specific checks Declaration attributes: Form, Flow, Specific Number, Doublet Declaration specific checks: Tax Number, Serial errors Processing serial errors in the data declarations A data delivery with more then 250 errors is generally rejected Approval of data declarations for the main (micro-) data processing

The ASA system: Selective editing

The selective editing process
Limited capacity for manual correction Important data records are corrected manually The vast majority of the data records have limited impact on the results Rather unimportant data records are corrected by automated procedures Rule-based procedures Hot-Deck procedure Regression-based procedure

Selective editing: Threshold values
Prioritization by CN8 specific threshold values High quality results for all commodity codes Determination of the important micro data for the results The highest potential value of a record (according to statistical value, supplementary unit and net mass) is compared with the threshold value of the respective CN8 code Threshold values are calculated by the processed error free micro data of the previous 12 months

Selective editing: Threshold values
<25% >75% Threshold (75%) for CN8 code flow arrivals: ( )/2=4150

Classification by fictional value
The statistical value can be erroneous The fictional value (highest potential value) is less vulnerable for errors The fictional value is the maximum of: The statistical value The average statistical value per supplementary unit multiplied by the supplementary unit The average statistical value per net mass multiplied by the net mass

Selective editing: Validation checks
The data records are compared with reference data in order to find errors and to prioritize them The reference data and validation rules are managed by the tool “BASE PL-Editor“ The validation rules and the structure of the reference data are implemented in the ASA system by a XML file (Definite) Errors and possible errors

Errors Invalid codes Very unusual unit-price Invalid combinations Possible Errors Unlikely Partner countries etc. Unlikely unit-price, value Unlikely combinations

The ASA system: Selective editing

Selective editing: Automated correction
Deterministic error correction If – then correction rules Effective method provided a strong correlation between variables For example: CN8 code and mode of transport Typical errors For example: Numerical code instead of Iso-Alpha Numerical variables The supplementary unit and net mass are corrected by the statistical value and the average ratio

Hot-Deck error correction Correcting erroneous micro data by imputing values of error free micro data (donor records) Only categorical variables Nearest-Neighbor approach for donor determination Calculating of the distance between the records Weighting of the variables In most cases a donor with the same CN8 code Avoiding outliers as donors Considering the impact on the donor result

Hot-Deck Donor determination Variable 1 Variable 2 Variable 3 Distance w 1 =1 2 3 =2 Erroneous record A B C Potential donor 1 D Potential donor 2 Potential donor 3 Corrected record å = - k XY y x

The ASA system: Outlier detection

Outlier detection Comparison of current results with results of previous 12 months Outliers are highlighted by the Acceptance Factor (Current value – Mean value)/Std. dev. Detailed results at CN8 level CN8 result Partner country result Statistical value, net mass, supplementary unit and their ratios

Outlier detection

Data processing German foreign trade statistics

Similar presentations

Presentation on theme: "Data processing German foreign trade statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data processing German foreign trade statistics

Similar presentations

Presentation on theme: "Data processing German foreign trade statistics"— Presentation transcript:

Similar presentations

About project

Feedback