Data processing German foreign trade statistics

Slides:



Advertisements
Similar presentations
Katherine Jenny Thompson
Advertisements

Integrated Data Editing and Imputation Ton de Waal Department of Methodology Voorburg Statistics Netherlands ICES III conference, Montréal June 19, 2007.
Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
1 Continuity Equations: Analytical Monitoring of Business Processes and Anomaly Detection in Continuous Auditing Michael G. Alles Alexander Kogan Miklos.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
External Trade Statistical System UNECA – Addis Ababa, October 2011.
Appendix 7 Statistical Tools –Arithmetic Mean –Geometric Mean –Standard Deviation –Correlation –Regression Analysis.
1 Editing Administrative Data and Combined Data Sources Introduction.
Advanced GIS Using ESRI ArcGIS 9.3 Arc ToolBox 5 (Spatial Statistics)
1 Methods for detecting errors in VAT Turnover data Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
Eurostat Statistical Data Editing and Imputation.
France : Improving checks in customs data OCDE – 7 November 2011.
Statistical Analysis & Techniques Ali Alkhafaji & Brian Grey.
Assessing validation effectiveness – results from a recent MEETS project and future plans Anette Hertz, Head of section, Statistics Denmark
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
New and Emerging Methods Maria Garcia and Ton de Waal UN/ECE Work Session on Statistical Data Editing, May 2005, Ottawa.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
1 Calculation of unit value indices at Eurostat Training course on Trade Indices Beirut, December 2009 European Commission, DG Eurostat Unit G3 International.
Using cluster analysis for Identifying outliers and possibilities offered when calculating Unit Value Indices OECD NOVEMBER 2011 Evangelos Pongas.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
PROCESSING, ANALYSIS & INTERPRETATION OF DATA
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
Integrated Approach Processing Marie Brodeur Director General, Industry Statistics Branch, Statistics Canada St. Lucia February, 2014 SNA seminar in the.
Chapter 6: Analyzing and Interpreting Quantitative Data
A selective editing method considering both suspicion and potential impact, developed and applied to the Swedish foreign trade statistics Topic (ii), WP.
Improvements of the Swedish estimated Intrastat data by adding estimations based on VIES data Nordic meeting 16 September 2014 Jennie Bergman & Ari Mansikkaviita.
Validation and credibility checking procedures in UK trade-in-goods statistics HMRC Trade Statistics Don Priest.
Working group “Maritime Transport Statistics” Luxembourg, April 2008 Data related issues Item 8 of the agenda.
Maria Garcia US Census Bureau UNECE/SDE, Oslo, Norway, September 2012 An Application of Selective Editing to the US Census Bureau Trade Data.
4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.
PRESENTATION OF MONTENEGRO
Software Testing.
Theme (i): New and emerging methods
Modeling approaches for the allocation of costs
Editing and Imputing Income Data in the 2008 Integrated Census prepared by Yael Klejman Israel Central Bureau of Statistics UNITED NATIONS ECONOMIC.
Evolving Data Processing in the Statistics Centre – Abu Dhabi
International Trade in Goods Statistics in the EU Lídia Bassó
Stats Tools for Analyzing Data
Some elements on compliance actions and threshold setting in the Italian Intrastat system ADVANCED ISSUES IN INTERNATIONAL TRADE IN GOODS STATISTICS ESTP.
Structural Business Statistics Data validation
Hardware Hash Quality Assurance Tool V2
WinTIM, Indices methodology and tool Wiking Althoff, CESD Communautaire External trade experts meeting on the CARDS Programme, Luxembourg, May.
Estimation techniques for missing intra-EU trade
State Reporting Processing
Validation in International Trade in Goods Statistics Lídia Bassó
ESS.VIP VALIDATION An ESS.VIP project for mutual benefits
Validation of WStatR-Data
Data validation at DESTATIS
ETS WG meeting 6-7 September 2006
Prodcom ESTP course October 2010
Education and Training Statistics Working Group – 2-3 June 2016
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Data validation handbook
DATA VALIDATION Foreign Trade Statistics
Working Party on Fisheries Statistics 14 October 2013
ANALYSIS OF POSSIBILITY TO USE TAX AUTHORITY DATA IN STS. RESULTS
Mapping Data Production Processes to the GSBPM
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
The Challenges in compiling an Education Register in Iceland from multiple sources Ásta M. Urbancic A presentation at the ESSnet on Quality of Multisource.
Statistical data editing near the source using cloud computing concepts George Pongas, Christine Wirtz -Eurostat.
Education and Training Statistics Working Group – 1-2 June 2017
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
ESTP course on International Trade in Goods Statistics
Presentation transcript:

Data processing German foreign trade statistics ADVANCED ISSUES IN INTERNATIONAL TRADE IN GOODS STATISTICS ESTP training course 2 – 4 April 2014 German foreign trade statistics

Data processing

German foreign trade statistics Up to 30 million records per month First results 40 days after reference period High efficient data processing necessary Uneven distributed statistical value Most records have limited effect on results

The ASA System

ASA: Data submission monitoring

Data submission monitoring Monitoring the very important enterprises (“Top 60“) for the German foreign trade Checking of the variance Investigation of large deviations from previous year or month Identification of unusual deviations for all enterprises Acceptance Factor: (Current Value – Mean Value)/Std. Dev. Fast correction or confirmation of unusual values

Data submission monitoring

Data submission monitoring Checking of data delivery Structural Checks Data file format, Field format, Readability, Statistics Delivery specific checks Declaration attributes: Form, Flow, Specific Number, Doublet Declaration specific checks: Tax Number, Serial errors Processing serial errors in the data declarations A data delivery with more then 250 errors is generally rejected Approval of data declarations for the main (micro-) data processing

The ASA system: Selective editing

The selective editing process Limited capacity for manual correction Important data records are corrected manually The vast majority of the data records have limited impact on the results Rather unimportant data records are corrected by automated procedures Rule-based procedures Hot-Deck procedure Regression-based procedure

Selective editing: Threshold values Prioritization by CN8 specific threshold values High quality results for all commodity codes Determination of the important micro data for the results The highest potential value of a record (according to statistical value, supplementary unit and net mass) is compared with the threshold value of the respective CN8 code Threshold values are calculated by the processed error free micro data of the previous 12 months

Selective editing: Threshold values <25% >75% Threshold (75%) for CN8 code 22042979 flow arrivals: (6000+2300)/2=4150

Classification by fictional value The statistical value can be erroneous The fictional value (highest potential value) is less vulnerable for errors The fictional value is the maximum of: The statistical value The average statistical value per supplementary unit multiplied by the supplementary unit The average statistical value per net mass multiplied by the net mass

Selective editing: Validation checks The data records are compared with reference data in order to find errors and to prioritize them The reference data and validation rules are managed by the tool “BASE PL-Editor“ The validation rules and the structure of the reference data are implemented in the ASA system by a XML file (Definite) Errors and possible errors

Selective editing: Validation checks Errors Invalid codes Very unusual unit-price Invalid combinations Possible Errors Unlikely Partner countries etc. Unlikely unit-price, value Unlikely combinations

Selective editing: Validation checks

Selective editing: Validation checks

The ASA system: Selective editing

Selective editing: Automated correction Deterministic error correction If – then correction rules Effective method provided a strong correlation between variables For example: CN8 code and mode of transport Typical errors For example: Numerical code instead of Iso-Alpha Numerical variables The supplementary unit and net mass are corrected by the statistical value and the average ratio

Selective editing: Automated correction Hot-Deck error correction Correcting erroneous micro data by imputing values of error free micro data (donor records) Only categorical variables Nearest-Neighbor approach for donor determination Calculating of the distance between the records Weighting of the variables In most cases a donor with the same CN8 code Avoiding outliers as donors Considering the impact on the donor result

Selective editing: Automated correction Hot-Deck Donor determination Variable 1 Variable 2 Variable 3 Distance w 1 =1 2 3 =2 Erroneous record A B C Potential donor 1 D Potential donor 2 Potential donor 3 Corrected record å = - k XY y x

The ASA system: Outlier detection

Outlier detection Comparison of current results with results of previous 12 months Outliers are highlighted by the Acceptance Factor (Current value – Mean value)/Std. dev. Detailed results at CN8 level CN8 result Partner country result Statistical value, net mass, supplementary unit and their ratios

Outlier detection