Presentation is loading. Please wait.

Presentation is loading. Please wait.

EDIT – Eurostat’s editing tool

Similar presentations


Presentation on theme: "EDIT – Eurostat’s editing tool"— Presentation transcript:

1 EDIT – Eurostat’s editing tool
SDMX - TWG Paris, Dec. 2012

2 EDIT - features Eurostat generic data editing and imputation tool
The process relies on a Scripting Language capable of expressing complex rules; Data and metadata are isolated into independent ‘Domains’; Available in a stand-alone version, a server version and a freely accessible Web version 1-way SSL and ECAS protected; The Web version allows editing statistical data by anyone registered without any software installation.

3 Some functional capabilities
Allow to define formats by using an editor in the User Interface. Imports/Exports Metadata from/to external files; Imports/Exports from/to Oracle; Imports Auxiliary Data (lookup datasets); Accepts GESMES, CSV, SDMX-ML and flat files; Executes programs on imported dataset instances. Data set operations Editing and deterministic imputation Outlier detection TRAMO, Hidiroglou-Berthelot and sigma-gap

4 Integration Capabilities
Integration with EDAMIS – Eurostat's Single Exchange Point (ongoing): detects incoming files and process them in unattended mode; publishes validation results to the EDAMIS Back Channel; Integrated with the Euro SDMX Registry: fetches DSDs into EDIT structures; loads code lists from the Registry;

5 EDIT main principles Treatment of micro and macro data;
Scripting principle (symbols/placeholders); Editing seen as computations; Multi - dataset approach; Cube approach in computations.

6 EDIT Terminology Domain – isolated workspace inside EDIT;
Dataset Definition (Format) – defines the structure of the data; Identified by name inside a Domain; Dataset – collection of data rows according to the structure of a format; Program – a set of operations to be performed on a specified Format; Key set – a set of fields which uniquely identify a row in a data set; Partition – a sub-set of data identified by a fixed sub-set of the key set; Transposition – key set fields which uniquely identifies a row inside a partition. 6

7 Scripting Language Capabilities
Custom Scripting Language designed specifically for data editing Attempt to be as simple as possible and still enough flexible to fit the requirements of any known / analyzed domain; The programs describe the rules and are composed of a set of steps with inputs and outputs; Drawbacks: Programs difficult to be written by non-programmers; Does not follow pure cube approach.

8 Working with EDIT Define a format (input file characteristics);
Write a program composed by: Field rules – treating a single cell Horizontal Rules – at the level of records; Vertical Rules – sub set of the data set is seen as transposed; Hierarchical rules – on two or more interlinked datasets; Dataset Operations; Import the data set instance; Import auxiliary data (other datasets, lookup tables); Establish or import program parameters Execute a job = run the program against all the imported or set data. 8

9 Rules - examples 1. RECORD FL171 {CONDITION (NOT isNull (A1bis)) -> inLookup (A1bis, NACE, "CODE"); ERRMSG "Rule FL171 failed for field [A1bis]: NACE rev 1.1" SEVERITY "Warning" (A1bis) ; } 2. RECORD pureRecord { PRICE := 20;} 3. RECORD conditionalRecord {CONDITION isNull(VALUE); THEN {VALUE := PRICE * QUANTITY;} ELSE { PRICE := VALUE / 5; QUANTITY := VALUE / PRICE; } 4. VERTICAL pureVertical { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; // dimensions – now the data set is a cube TRKEYS COUNTRY; //divide the cube in sub-cubes by country VALUE['TOTAL'] := nvl(VALUE['TOTAL'],0); } }

10 Editing rules - overview
1. Cell level rules may involve determining: whether the entry of any cell is an invalid blank; whether the recorded entries are among a set of valid codes for the cell; 2. Horizontal validation rule – at the level of a record - usually specified on the basis of extensive knowledge of the subject matter of data. Example: combinations of fields which are jointly unacceptable. 3. Vertical validation rule - involve a data integrity check for entries across a collection of related records: Examples: total number of imports for a given product is equal to the sum of imports from individual countries; stock value in the beginning of month is equal to the closing stocks in the previous month; 4. Hierarchical validation rule – checks involving one or more datasets hierarchically interlinked.

11 Eurostat's Meta-Language
SDMX - TWG Paris, Dec. 2012

12 VIP on Validation (VIP = Vision Infrastructure Project)
It is an ongoing project; Main task of the project: organize and optimize data editing among MSs and Eurostat for ESS data collections. Main deliverables: set of standards including a common Meta Language Additionally, a GUI driven IT system capable to generate rule sets in a meta language and standardised documentation of the validation rules understandable by business users

13 The need for a common language
A formal unambiguous language was needed to allow rules encoding so that they can be translated into other existing syntaxes; This can help to create a more efficient production chain with responsibilities clearly assigned to the different actors

14 Accompanying tools Guidelines for selecting a set of rules to ensure a defined minimum standard of quality for the data exchanged; Guidelines for the assignment of responsibility in the data editing chain (Member States and Eurostat); User requirements and functional specifications for: A tool to edit and monitor compliance; A tool to specify the rules.

15 Scope of the Meta – Language (ML)
A formal unambiguous language for encoding validation rules; Friendly to statisticians – if possible, the rules to be expressed in a human understandable way; To be able to treat both micro data and aggregate data; To allow exchange of validation rules between organizations; Able to work with cubes and with bi-dimensional data sets (e.g. micro-data).

16 The Meta Language (ML) – under development
Information model: The simpler the information model – the more flexible the language; Data model = bi-dimensional data sets consisting of rows and columns; To allow working with all types of incoming files; Operators/functions/calculations: Statistical needs oriented; Act on data model objects = data sets; Allow expression of logical operators and computations.

17 Documentation of the rules
The same operator/function/expression may be used for expressing different statistical meaning A rule is documented by: A list of ML - expressions used; A set of parameters; Documentation – a text provided by the user when implementing the rule


Download ppt "EDIT – Eurostat’s editing tool"

Similar presentations


Ads by Google