Agenda Item 3.3 SDMX reference architecture for NSIs Francesco Rizzo 24 th Meeting of the STNE Working Group “Statistics, Telematic Network & EDI” June 2009 Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June
2 Presentation summary The NSI perspective on SDMX: benefits The NSI perspective on SDMX: how to reduce costs The NSI perspective on SDMX: three important aspects Data Repository (Warehousing) Architecture Data Hub Architecture SDMX reference metadata architecture The Mapping Process Codes mapping: example for frequency The NSI perspective on SDMX: where to start from First of all a quick simple analysis Disseminating and reporting data in SDMX: different scenarios Free/open SDMX software and tools inventory SDMX Reference architecture for MSs The toolkit for participating to an Eurostat SDMX project
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June The NSI perspective on SDMX: benefits Reduce reporting burden to National, European and International institutions Can improve harmonisation, standardisation and integration processes inside a NSI be part of an international “community” where NSIs can: –share experiences and best practices –share freely software and tools
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June The NSI perspective on SDMX: how to reduce costs SDMX in general follows the open source culture and, as such, tools used within the SDMX initiative can be made publicly available Eurostat has been designing a SDMX reference architecture for MSs and developing some building blocks in order to facilitate SDMX implementations Several NSIs and International Organizations have been producing case studies from their direct experience in implementing SDMX Eurostat from 2007 has launched a training plan, oriented principally to MSs Eurostat, upon request, provides technical advice to NSIs interested in starting some SDMX projects
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June The NSI perspective on SDMX: analysis factors Data Repository (Warehousing) Architecture Data Hub architecture The Mapping Process
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June Data Repository (Warehousing) Architecture NSI Eurostat Pull Requestor eDAMIS Data Input SDMX Registry Intermediate storage Verification / Conversion To SDMX Received data in SDMX-ML Loader register Warehouse storage Eurobase query Dissemination XSL for SDMX-ML PULLPULL PUSHPUSH
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June Data Hub Architecture NSI SDMX Registry RSS / data registration Dissemination XSL for SDMX-ML Data Portal Query Data query Response Retrieve dataset cache
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June The Mapping Process Data sets within data producers’ Information System are described using “local” structural metadata (concepts, code lists, formats) SDMX standards harmonize structural metadata within a statistical community, and describe data sets by DSDs (concepts, code lists, dimensions, attributes, measures, etc.) –SDMX-ML structure files “local” structural metadata and SDMX structure metadata must be mapped(*): –concepts mapping –codes mapping (*) see SDMX User Guide page 73
Concepts mapping one concept of the DSD corresponds to a single “local” concept. A typical example is the measured value in the data provider database that corresponds to the Primary measure in the DSD; one “local” concept corresponds two or more concepts within the SDMX Structure file. For example in the “local” concept, named Um, there is an element as follows: “one million of Euro”. In the related SDMX structure file it corresponds to two concepts: Unit (Euro) and Unit multiple (one million); one concept within the SDMX structure file doesn’t correspond to any “local” concept. For example the concept Reference area, in fact that concept is generally not used in a National Organization because is the default; one concept within the SDMX Structure file corresponds to two or more “local” concepts. For example the Adjustment concept could corresponds to two concepts named DAYADJ (working day adjusted) and SEASADJ (Seasonally adjusted), see Model version 2.1 of pc- axis
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June Codes mapping: example for frequency CODEDESCRIPTION 1Annual 12Monthly 365Daily 4quarterly 52weekly CODEDESCRIPTION AAnnual MMonthly DDaily Qquarterly WWeekly H Half-yearly B Business SDMX CODE Proprietary CODE DESCRIPTION A1Annual M12Monthly D365Daily Q4quarterly W52weekly HHalf-yearly BBusiness
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June The NSI perspective on SDMX: where to start from Decide to start in using SDMX autonomously –Design and build “unilaterally” DSDs and/or reuse those already available at European and International level –Decide which part of the Information System will be affected (collection, processing, analysis, dissemination) and which kind of SDMX architecture would be more suitable Join SDMX projects launched by International organizations –Several pilot project launched by Eurostat within ESS (SODI, Census Hub, EuroGroup Register, etc.) –DSDs defined centrally by Eurostat after agreements taken within WG and TF –The SDMX architecture implemented in the NSI must be compatible with the reference architecture of the whole project
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June First of all a quick simple analysis Which statistical domains are involved Where data and structural metadata are currently stored How the involved data are currently disseminated, exchanged or reported? What is the business process behind the exercise? Will the new SDMX architecture be part of a data warehouse, a Hub or of both reference architectures?
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June Disseminating and reporting data in SDMX: different scenarios Starting pointActionComments 1Files: Excel, CSV, Gesmes, etc. Convert in SDMX-ML data files and: a. push the files to the data collector b.store those files on a web server and notify to the data collector the URL to pull the files Low development cost High production cost Low mapping process cost 2Existing databases a.extract SDMX-ML data files and: - push the files to the data collector - store those files on a web server and notify to the data collector the URL to pull the files b.data will be available upon request directly from the database on Internet High development cost Low production cost Partial re-using in different databases High mapping process cost 3Ad-hoc database As in point 2High development cost Low production cost Re-usable for each statistical domain Low mapping cost
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June Free/open SDMX software and tools inventory 1/2 Data/Metadata Structure Definition and transformations –SDMX Converter (Eurostat) –Data Structure Wizard (Eurostat) –SDMX Transformation Package (Metadata Technology) –SDMX Authoring Tool (Metadata Technology) –Data Structure Definition Tool (Metadata Technology) –Metadata Structure definition Editor (Metadata Technology) Implementation of SDMX registry specifications –SDMX Registry (Eurostat) (Metadata Technology) (UNSD) –KeyMaster (Metadata Technology) –Data provisioning (Metadata Technology) –Data Set registration (Metadata Technology) –SDMX Query Tool (Metadata Technology) –SDMX Query Client (Metadata Technology)
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June Free/open SDMX software and tools inventory 2/2 Presentation of SDMX-ML data files to users –Business Cycle clock (Eurostat) –SDMX Visualization Tools (Eurostat) –Visual framework (ECB) Frameworks and toolkits for working with SDMX –SDMX Framework (Istat) –SDMX framework (Ole Sørensen) –The NSI Web Service Prototype (Eurostat) –Data Retriever Building Block (Eurostat) Mapping tools –SDMX Mapping assistant (Eurostat)
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June SDMX Reference architecture for MSs The architecture represents the syntheses of several experiences worldwide and can be considered not a strict specification rather than a guide or “best practice” document The main objective is to provide a description/specification of a generalized architecture to be used partially or as whole by MSs interesting in starting SDMX projects In 2009 Eurostat will develop and sharing two building blocks detailed in the architecture as open source In 2010 more building block will be developed
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June
Eurostat Unit B5 – Statistical Information Technologies STNE 24 th Meeting – June The toolkit for participating to an Eurostat SDMX project SDMX Structure file XML schema SDMX Generic data file example SDMX Compact or Cross-sectional data file example Message Implementation Guide RSS2 feed file example Atom feed example Practical Guidelines for the Implementation of web feeds Requirements for pull data transmission approach
Eurostat SDMX Technical Workshop Title –From the SDMX Information Model to the development of reusable software components Purpose –The workshop is aimed at software Designers and Developers and will be organised in several technical sessions, with the main goal of providing the knowhow for starting to design and develop SDMX architectures for data exchange. When and where –The workshop duration is two full days, beginning Tuesday 22 September 2009 at 09:00 and ending Wednesday 23 September at 17:00, in the Instituto Nacional de Estadistica, Paseo de la Castellana 183, Room number 118 ( First floor), MADRID