Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology.

Similar presentations


Presentation on theme: "Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology."— Presentation transcript:

1 Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology

2 Outline Background Capabilities of SDMX for Describing Microdata and Related Information – Intended use – The nature of microdata Capabilities of DDI for Describing Microdata and Related Information Comparison Criteria for Choosing a Standard

3 Background There has been much discussion of how SDMX and DDI relate – UN/ECE SDMX-DDI Dialogue – a discussion involving users and members from both standards bodies – METIS and other conferences – HLG and GSIM In order to understand how a standard should be chosen, we need to understand the implications of our choices

4 Background (continued) First, we must understand that capabilities of each standard, and whether it supports what we are trying to do We must consider the implications for IT infrastructure and tools used within the organization We must understand the cost of adopting each standard in terms of staff and organizational capabilities

5 Points of Discussion For time-series and aggregate data reported to international organizations, SDMX is seen as the best standard to use – But it is possible to describe aggregates in DDI For describing questionnaires, DDI is the preferred standard in most cases – But it is possible to describe questionnaires using SDMX For describing microdata sets, there is no simple choice: both standards are useful for certain microdata sets

6 Comparison In order to compare the standards for certain purposes, we will look at the functionalities they were designed for, and then consider the implications

7 SDMX Capabilities SDMX is able to describe many types of data – Time series and cross-sectional aggregates – “Reference” metadata in a very configurable way (eg, quality frameworks and methodological metadata) – Information about managing data exchange between counterparties Data description is highly dimensional – All data sets are seen as having a dimensional structure for addressing each observation within the data set – Microdata can be modelled in a dimensionalized way, as well as aggregate data SDMX is designed to support specific types of microdata – Financial transactional registers

8 SDMX Capabilities (Continued) SDMX Reference metadata does not provide an explicit modelling of the metadata it can describe – You define the needed concepts – Concepts are arranged into a flat or hierarchical structure – Concepts are given suitable representations But nothing in the SDMX specifications provides the model – This is provided by the using organization, and can be a standard (eg, Eurostat’s quality frameworks)

9 SDMX Capabilities (Continued) Questionnaires can be described as SDMX Reference metadata structures – The now-finished ESSnet project proved this to be the case – But it was a very complicated use of this SDMX feature set Methodological metadata can be expressed as SDMX Reference metadata – This works quite well, but is not necessarily “standard”

10 The Nature of Microdata When we consider aggregate data, there are clear dimensions, sufficient to differentiate every observation in a data set – Eg, Percentage of Employment expressed as Sex by Age by Region Microdata can also be described dimensionally – Any classificatory variable can act as a dimension – But each record also has a case identifier – The variables often hold different types of measures Unlike aggregate data, there are very few necessary dimensions for identifying an observation – All you need is the case identifier and the variable

11 The Nature of Microdata (Continued) Microdata can be described also in a different way – As a rectangular table where variables are columns, and cases are rows – This is a very common way to describe the structure of microdata – Many tools use this approach (SAS, SPSS, Stata, etc.) – This is a much more “relational” approach that a dimensionalized one (as seen in OLAP data warehouses, for example)

12 The Capabilities of DDI DDI comes from the data archive community, which has a strong focus on the microdata deposited by social science researchers – It has excellent capabilities for describing microdata sets using the unit-record (row-column) paradigm – Also good capabilities for describing various phases of the data lifecycle: data collection, archiving, data processing, tabulation, methodology with explicit models

13 DDI Capabilities (Continued) Very detailed description of questionnaires – Also an explicit model DDI provides a description of the aggregation process, including the structural metadata for dimensionalized data sets (“Ncubes”)

14 Comparison Because of the use cases which SDMX and DDI were designed to support, they have specific strengths – SDMX for exchange, reporting, and dissemination of aggregate data – DDI for describing the data collection and resulting microdata, along with the processes applied to it But both standards can be used to support some common use cases – Questionnaires – Microdata description – Dimensionalized data

15 Comparison (Continued) For microdata description specifically, there are some significant differences between the standards capabilities – SDMX has the data in an XML format, which can be problematic for large data sets – DDI describes an ASCII data file (or other external format) – DDI can describe data files with different linked record types – SDMX cannot do this

16 Comparison (Continued) For describing questionnaires, and other types of related metadata (methodology, etc.) there are also major differences – SDMX relies on the Reference Metadata mechanism for these metadata, which has no specified model (it is configured by users) – DDI has an explicit model in the standard itself – These facts can be strengths or weaknesses

17 Criteria for Choosing a Standard Does the standard support needed functionality? – Eg, SDMX can describe questionnaires, but if you need detailed flow logic, DDI is much better How good is tools support for the needed functions? – Eg, For graphical data display, SDMX has good tools – DDI does not Is there a high cost in terms of the learning curve? – Maintaining competencies among staff can be costly and difficult – Using a familiar standard may be the best choice – SDMX and DDI both require a significant learning investment for developers

18 Conclusions SDMX and DDI were designed to support different uses, and have different strengths as a result In most cases, SDMX is better for dimensionalized data sets, exchange, and dissemination DDI is generally better for working with microdata and its collection and processing But: The choice of a suitable standard can only be made by taking into consideration a larger number of factors – it is not a simple black-and- white choice


Download ppt "Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology."

Similar presentations


Ads by Google