ESCWA SDMX Workshop Session: SDMX and Data
Session Objectives At the end of this session you will: –Know the SDMX model of a data structure definition –Understand the techniques to identify the structure of data –Identify the concepts in a simple data set –Be able to develop simple data structure definitions using SDMX tools
Data Set
Data Set: Structure
Data Set Structure Computers need to know the structure of data in terms of: –Concepts –Code Lists –Dimensionality –Additional metadata
First: Identify the Concepts A concept is a unit of knowledge created by a unique combination of characteristics (SDMX Information Model)
Unit Multiplier Unit Topic Time/Frequency Country Stock/Flow Data Set Structure: Concepts
Data Set Structure: Code Lists Code Lists TOPIC A Brady Bonds B Bank Loans C Debt Securities AR Argentina MX Mexico ZA South Africa COUNTRYSTOCK/FLOW 1 Stock 2 Flow CONCEPTS Topic Country Flow Concepts
16457 Q,ZA,B,1, =16457 Data Makes Sense
Data Set Structure: Defining Multi- dimensional Structures Comprises –Concepts that identify the observation value –Concepts that add additional metadata about the observation value –Concept that is the observation value –Any of these may be coded text date/time number etc. Dimensions Attributes Measure Representation
Data Set Structure: Concept Usage Unit Multiplier Unit Topic Time/Frequency Country Stock/Flow Observation (Dimension) (Attribute) (Dimension) (Attribute) (Measure)
has code list Code List Attributes concepts that add metadata has format concepts that identify groups of keys concepts that identify the observation Data Structure Definition Key Group Key Dimensions Concept Measures CONCEPTS Topic Country Flow takes semantic from has format takes semantic from has format concepts that are observed phenomenon TOPIC A Brady Bonds B Bank Loans C Debt Securities Representation Coded Non- coded
16457 Q,ZA,B,1, =16457 Data Makes Sense Frequency,Country,Topic,Stock/Flow,Time=Observation Quarterly, South Africa, Bank Loans, Stocks, 2 nd quarter 1999
Identifying Concepts Identifying Concepts - Sources –Existing data set tables From website From applications –Data Collection Instruments Questionnaires Excel spreadsheets –Regulations, Handbooks, User Guides Labour Statistics Convention, 1985 (No. 160), Recommendation, 1985 (No. 170) Council Regulation No: 311/76/EEC of 09/021976; OJ: L039 of 14/02/1976; Compilation of statistics on foreign workers –Database Tables –Existing Data Structure Definitions From other organisations
Identify Concepts – from website Source: FAO proof of concept project Measurement = 1,000 Kg
Concepts Reference Region Commodity Frequency and Time Observation Value Measure Type Unit and Unit Multiplier Measurement = 1,000 Kg
Exercise: Identify Concept Role
Concept Role: Reminder Dimensions –Are the concepts that identify the observation value Attributes –Are the concepts that add additional metadata about the observation value Measure –Is the concept that is the observation value
Concepts Reference Region Commodity Frequency and Time Observation Value Measure Type Unit and Unit Multiplier Measurement = 1,000 Kg
Exercise:Concept Role Reference Region Commodity Frequency and Time Observation Value Measure Type Unit and Unit Multiplier Measurement = 1,000 Kg (Dimension) (Dimensions) (Measure) (Dimension) (Attributes)
Data Set and Structure Dimension Concept FREQ REF_AREA_REG COMMODITY MEASURE_TYPE TIME Measure Concept OBS_VALUE Attribute Concept OBS_STATUS OBS_CONF UNIT UNIT_MULTIPLIER
Identify/Define Code Lists Purpose of a Code List –Constrains the value domain of concepts when used in a structure like a data structure definition –Defines a shortened language independent representation of the values –Gives semantic meaning to the values, possibly in multiple languages Agreeing on harmonised code lists is the most difficult aspect of defining a data structure definition
Code Lists Required Source: FAO proof of concept project Reference Region Commodity Frequency Measure Type Unit and Unit Multiplier Measurement = 1,000 Kg
Code Lists
Code Lists (CL_) For Time Series the SDMX Cross Domain Concepts recommend all observations have a status code (Concept = OBS_STATUS) and a confidentiality code (Concept = OBS_CONF)
Data Structure Definition
Key Group Key Dimensions Concept Attributes Measures takes semantic from has format takes semantic from has format concepts that add metadata concepts that identify the observation concepts that are observed phenomenon concepts that identify groups of keys Representation Coded Non- coded Code List has code list Data Structure Definition - Reminder
CL_FREQ CL_AREA_CTY CL_COMMODITY CL_MEASURE_ELEMENT Data Structure Definition - Agriculture CL_OBS_STATUS CL_OBS_CONF CL_UNIT CL_UNIT_MULT Data Structure Definition Key Group Key Dimensions Concept Attributes Measures AGRICULTURE_COMMODITY OBS_STATUS OBS_CONF UNIT UNIT_MULT FREQ REF_AREA_REG COMMODITY MEASURE_TYPE TIME OBS_VALUE Representation Coded Non- coded Code List
© Metadata Technology SDMX and Data Formats Exercise: Identify Concepts
Identifying Concepts Identifying Concepts - Sources –Existing data set tables From website From applications –Data Collection Instruments Questionnaires Excel spreadsheets –Regulations, Handbooks, User Guides Labour Statistics Convention, 1985 (No. 160), Recommendation, 1985 (No. 170) Council Regulation No: 311/76/EEC of 09/021976; OJ: L039 of 14/02/1976; Compilation of statistics on foreign workers –Database Tables –Existing Data Structure Definitions From other organisations
Identifying Concepts Identifying Concepts - Sources –Existing data set tables From website From applications –Data Collection Instruments Questionnaires Excel spreadsheets –Regulations, Handbooks, User Guides Labour Statistics Convention, 1985 (No. 160), Recommendation, 1985 (No. 170) Council Regulation No: 311/76/EEC of 09/021976; OJ: L039 of 14/02/1976; Compilation of statistics on foreign workers –Database Tables –Existing Data Structure Definitions From other organisations
Exercise: Identify Concepts – from collection instrument Source: UNESCO Institute for Statistics
Data Entry - Table 2.1 Source: UNESCO Institute for Statistics
Data Entry - Table 2.2 Source: UNESCO Institute for Statistics
Identifying Concepts Identifying Concepts - Sources –Existing data set tables From website From applications –Data Collection Instruments Questionnaires Excel spreadsheets –Regulations, Handbooks, User Guides Labour Statistics Convention, 1985 (No. 160), Recommendation, 1985 (No. 170) Council Regulation No: 311/76/EEC of 09/021976; OJ: L039 of 14/02/1976; Compilation of statistics on foreign workers –Database Tables –Existing Data Structure Definitions From other organisations
Exercise: Identify Dimension Concepts – from website Source: International Labor Organisation
Identify Concepts: Table 2A Source: International Labor Organisation
Identify Concepts: Table 2B Source: International Labor Organisation
Identify Concepts: Table 2C Source: International Labor Organisation
Identify Concepts: Table 2D Source: International Labor Organisation
Identify Concepts: Table 2E Source: International Labor Organisation
Dimension Concept
Identify Concepts: Table 2A Reference Area Sex Time PeriodFrequency Measure Type
Identify Concepts: Table 2B Economic Activity Measure Type
Identify Concepts: Table 2C OCCUPATION Measure Type
Identify Concepts: Table 2D Status in Employment Measure Type
Identify Concepts: Table 2E Measure Type
Exercise: Identify Concepts – from collection instrument Source: UNESCO Institute for Statistics Time Reference Area
Dimension Concepts - Tables 2.1/2.2 Source: UNESCO Institute for Statistics Education Level Sex Institution Type Measure Type Work Mode Programme Orientation
© Metadata Technology Labor Statistics: Data Structure Definition (Incomplete)
Dimension ConceptRepresentation Frequency (FREQ)CL_FREQ Reference Area (REF_AREA)CL_REF_AREA Education level (EDUC_LEVEL)CL_EDUCATLVTYP Sex (SEX)CL_SEX Programme Orientation (PROG_ORIENTATION) CL_PROG_ORIENTATION Institution Type (INSTITUTION_TYPE)CL_INSTITUTION_TYPE Work Mode (WORK_MODE)CL_WORK_MODE Measure Type (MEASURE_TYPE)CL_MEASURE_TYPE Time (TIME)Date/Time Measure ConceptRepresentation Observation Value (OBS_VAL)Numeric Education Statistics : Data Structure Definition (Incomplete)
Attribute ConceptAssignment Status AttachmentRepresentation Observation Status (OBS_STATUS) M(andatory)ObservationCL_OBS_STATUS Observation Confidentiality (OBS_CONF) C(onditional)ObservationCL_OBS_CONF Unit (UNIT)MSeriesCL_UNIT Unit Multiplier (UNIT_MULTIPLIER) MSeriesCL_UNIT_MULT Education Statistics : Data Structure Definition (Incomplete)
Identify Concepts from User Guide