Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.

Similar presentations


Presentation on theme: "Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics."— Presentation transcript:

1 Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics

2 PROCESS SCENARIO

3 Eurostat Web Site The two output tables are the focus of the Processes

4 GSBPM Stages

5 Process Scenario Survey/Register Raw Data Set Anonymization, cleaning, recoding, etc. Micro-Data Set/ Public Use Files Tabulation, processing, case selection, etc. Aggregation,harmonization Aggregation,harmonization Aggregate Data Set (Lower level) Aggregate Data Set (Higher Level) DDI SDMX Indicators Structure Described by DDI NCube and SDMX DSD

6 PROCESS STAGES

7 Stage 1: Input Data Received Survey and Unit Record Conceptual Model Survey targetted at specific population Comprises questions Question may be linked to Variable Variable has conceptual meaning (Concept) Valid responses are Categories Survey output is Unit Record Data Set

8 Stage 2: Data Processing and Cleaning Editing Process Can be a variety of functions Validation Outlier Trimming Recoding Edit for Non Response Comprises Description of the Process Program Code used

9 Stage 3: Data Derivation Survey and Unit Record Conceptual Model New Variables created From existing variables or Create from Concepts Maybe new Classifications (codes, categories) Need description and the program code that derives the new Variables

10 Stage 4: Tabulation Dimensional Structure Maps to DDI NCube and SDMX DSD DDI NCube describes the structure and Provenance to the Variables etc. SDMX Data is published as SDMX Data Set DSD describes the Dissemination Structure DSD can also describe NCube Structure Structure Map can describe mapping between the two Applications can link back from SDMX structures to DDI structures SDMX data can link back to Variables, data collection etc. SDMX

11 Stage 5: Dissemination (SDMX) Data Set References a Dataflow, DSD, or Provision Agreement This identifies the Structure (DSD) Provision Agreement also identifies the Data Provider Category Scheme supports “drill down” data discovery Constraint contains actual keys and Dimensions values present in the data source Application now has all of the metadata required to query for and process (e.g. visualise) the data

12 DDI PROCESSING AND STRUCTURES

13 Describing Unit-Record Data Sets in DDI [DEMO]

14 Describing Processes in DDI In our example we have several types of processing: – Recoding – Validation and editing – Derivation of new variables In DDI, these are described as Processing Events”

15 Describing Processes in DDI (Continued) The Collection Event element is part of the “Data Collection” module, but is also used for describing processing later in the data lifecycle A Processing Event can be: – Control operation – Cleaning operation – Weighting – Coding

16 Describing Processes in DDI (Continued) These elements allow for a description of the event and a link to or the direct expression of the processing “code” (SAS, SPSS, Java, etc.) used to perform the process The Coding element is divided into: – General Instruction – a generic process description – Derivation Instruction – for deriving new variables – These link to the variables used in the process

17 Tabulation in DDI DDI describes dimensionalized data sets as “Ncubes” This is very similar to an SDMX DSD except: – The values are addressed using references to variables in a unit-record data set – Calculations of measures can be described in detail (dependent and independent variables, computation, etc.) This means that the actual process of tabulation can be described

18 DDI NCUBE MAP TO SDMX DSD

19 DDI DDI DDI NCube Model

20 SDMX SDMX DSD Model

21 DDI NCube to SDMX DSD Model Map

22 DDI Representation to SDMX Representation Model Map

23 Note that the column names are not used (these are just for viewing). These are mapped to the Variable Id in NCube and the Component (Dimension, Data Attribute, Primary Measure) Id in SDMX DDI Data (CSV Describable by DDI NCube Format)

24 DDI NCube Data Set Model Fundamentally, the Physical Location describes the CSV format. The CSV file can either be converted to SDMX_ML using data readers and data writers or loaded directly into a database using an appropriate data reader. In both cases the map of the Dimension and Attribute Ids to the CSV columns and Id of the Dataflow will need to be passed to the Data Reader so that it can verify the data content with the relevant DSD.

25 Data Writers and Readers

26 SDMX STRUCTURES AND DATA DISCOVERY AND VISUALISATION

27 SDMX Structural Metadata DSD LFS_STRUCTURE1 Dataflow EMPLY_SEX_OCC_EDUC Dataflow EMPLY_SEX_AGE_NATION Constraint Constraint EMPLY_SEX_OCC_EDUC Provision Agr ES_EMPLY_SEX_AGE_NATION Provision Agr ES_EMPLY_SEX_OCC_EDUC Data Provider ESTAT Category Scheme ESTAT_TOPICS Category LABOR Category POPULATION Category NAC Category Categorisation LAB_SEX_OCCC

28 Data Discovery Registry Structures Data Discovery GUI

29 User Data Selection User Selection Generated SDMX REST Query

30 Pivot Table Built from Query Result


Download ppt "Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics."

Similar presentations


Ads by Google