Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDMX Reference Infrastructure

Similar presentations


Presentation on theme: "SDMX Reference Infrastructure"— Presentation transcript:

1 SDMX Reference Infrastructure
SDMX for IT Experts SDMX Reference Infrastructure Jorge Nunes Raynald Palmieri 18-20 February 2014 Eurostat Unit B3 – IT and standards for data and metadata exchange

2 Table of Contents SDMX Reference Infrastructure
Software Components First implementation Intermediate implementation Ultimate implementation

3 SDMX Reference Infrastructure
Mapping Assistant Test Client NSI Web Service NSI Client Test Auth Config

4 SDMX-RI Components MSDB Store DDB Firewall Firewall Firewall DB Server
DB Server MSDB Store Workstation App Server Mapping Assistant NSI-WS External Client Test Client NSI-Client DB Server DDB Internal Network Secure DMZ DMZ Internet

5 Overview SDMX-RI First implementation SDMX-RI Intermediate solution
Overview, shortcomings SDMX-RI Intermediate solution Overview, rationale, changes, benefits, impact to users SDMX-RI “Ultimate” solution SRI First Implementation Streaming Common API SDMX v2.1 SRI Intermediate Solution SRI “Ultimate” Solution SDMX RI Development Roadmap 2012, unit B-3

6 SDMX-RI First implementation
Schematic overview Overview of different components Shortcomings

7 SDMX-RI overview SDMX RI Development Roadmap 2012, unit B-3

8 SDMX-RI Eurostat’s 1st implementation
getGenericData getCompactData getCrossSectionalData queryStructure Web Service Provider Web Service (1)Structure Retriever (2)Query Parser (3)Data Retriever (4)SDMX Data Generator (5)SDMX Model Web Service Provider This component is responsible for exposing the data using a Web Service interface that provides SDMX-ML messages. It follows the SDMX v2.0 WS Guidelines. (1) Structure Retriever This component is responsible for serving the “queryStructure” calls, i.e. to handle SDMX Structure related queries. Currently, SRI responds to queries for DSDs, Codelists, Concept Schemes, Category Schemes, Dataflows and Hierarchical Codelists (2) Query Parser This component is responsible for getting the request from the “Web Service Provider” and populating the internal data model (i.e SDMX Model) with the query received in the request. (3) Data Retriever This component is responsible for querying the dissemination database, getting the respective recordset and populating SDMX Model objects with the data retrieved, which is then returned. It serves the data related queries of the SRI WS, i.e. getGenericData, getCompactData, getCrossSectionalData. (4) SDMX-ML Data Generator This component is responsible for generating an SDMX-ML Dataset message upon receiving the DSD and the SDMX Model objects containing the data retrieved. (5) SDMX Model This component contains objects for storing data and metadata based on the SDMX information model. It is already used in several Eurostat components. Mapping Store This component (here a database) is responsible for keeping the mappings between the SDMX structural metadata and the native format (a file or a DB schema). The mappings are created and edited off-line by the Mapping Assistant. In other words, the Mapping Store is responsible for creating the mappings between an SDMX Data Structure Definition (DSD) and a DB schema (dissemination database) or a set of dissemination data files (PC-Axis files). It maps the DB schema from the database to the SDMX DSD. Dissemination database This is the final storage data warehouse maintained by the Data Provider. It stores data that can be published to potential Data Consumers. PC-Axis files This is the PC-Axis dissemination environment file format (aka px-files). A custom driver has been implemented for loading the data from px-files into a temporary in-memory database so as to be queried by the Data Retriever. PC-Axis Mapping Store Dissemination DB

9 First implementation shortcomings
Memory issues with large messages: Keeps all data in memory before sending the response. Performance: Decrease of performance for larger datasets and concurrent requests. Does not support SDMX 2.1 Current SDMX Model (5) design shortcomings: Tightly coupled to SDMX 2.0 XSD Schema. Does not provide API. Requires in-depth knowledge of SDMX. Current SDMX model design shortcomings: Tight to SDMX Schema v2.0 The current SDMX model has been designed using the SDMX Schema v2.0 and the SDMX Information Model. More specifically, from SDMX IM, it uses only the inheritance of the artefacts (e.g. the abstract classes Identifiable, Maintainable, ItemScheme etc), since in v2.0 XSDs the class hierarchy was not present . The rest of the classes were based on the XSDs and therefore their design is close to the SDMX v2.0 XSDs. For this reason, it is difficult to make them SDMX version independent in the future. It does not provide programmatic interfaces The SDMX model contains only a set of specific classes (aka beans) for keeping the information. In order to be integrated in various applications, those classes have to be explicitly used, because there are no interfaces for them. Having interfaces for all the SDMX model would will allow client programs to base their logic on the interface and not to the implementation of the interface. This would offer the possibility to switch between several implementations of the SDMX model without affecting the client program logic. For instance, a different CodelistBean implementation could be provided for large Codelists so as to store them in a file in order to minimize memory requirements. Useful only as SDMX information placeholder; no utility methods In most of the cases the SDMX classes serve as a mere placeholder of the information that they represent. This means that they only provide the getters and setters for the information. The model does not provide utility methods that could be handy to a client class when it is needed to extract more refined information out of the bean. Through utility methods, the code on the client classes will be less and clearer. For instance, for the DSD, a utility method would be to get all the artefact references used in the DSD i.e. with one call to get the “id,agency,version” of all the ConceptSchemes and Codelists used within the DSD. If a utility method is not available, the developer should browse all the components of the DSD, using the appropriate getters and gathering referenced artefacts information. SDMX RI Development Roadmap 2012, unit B-3

10 SDMX-RI Intermediate solution
Why an Intermediate solution Overview of different components What changed Benefits Impact to SRI installations SDMX RI Development Roadmap 2012, unit B-3

11 Why an Intermediate solution
To solve identified problems Decreased performance From increased memory allocation resulting into long response times. “Out Of Memory” errors From increased memory allocation resulting into inability to serve large data requests.

12 SDMX-RI Eurostat’s 1st implementation
getGenericData getCompactData getCrossSectionalData queryStructure Web Service Provider Web Service (1)Structure Retriever (2)Query Parser (3)Data Retriever (4)SDMX Data Generator (5)SDMX Model This slide is here for the transition from the first implementation to the intermediate solution. PC-Axis Mapping Store Dissemination DB

13 SDMX-RI Eurostat intermediate solution
getGenericData getCompactData getCrossSectionalData queryStructure Web Service Provider Web Service (3)Data Retriever (streaming) (1)Structure Retriever (3)Data Retriever Web Service Provider This component is responsible for exposing the data using a Web Service interface that provides SDMX-ML messages. It follows the SDMX v2.0 WS Guidelines. (1) Structure Retriever This component is responsible for serving the “queryStructure” calls, i.e. to handle SDMX Structure related queries. Currently, SRI responds to queries for DSDs, Codelists, Concept Schemes, Category Schemes and Dataflows. (3) Data Retriever This component is responsible for querying the dissemination database, getting the respective recordset and populating the sdmx data model with the data retrieved, which is then returned. (5) SDMX Model/IO This component contains objects for storing data and metadata based on the SDMX information model. Since the intermediate solution, it provides methods for reading and writing from/to SDMX-ML messages. This component now includes what was previously mentioned as: - (2) Query Parser - (4) SDMX-ML Data Generator It is already used in several Eurostat SDMX SW components/tools, e.g. SDMX Converter, Euro SDMX Registry, DSW. Mapping Store This component (database) is responsible for keeping the mappings between the SDMX structural metadata and the native format (a file or a DB schema). The mappings are created and edited off-line by the Mapping Assistant. In other words, the Mapping Store is responsible for creating the mappings between an SDMX Data Structure Definition (DSD) and a DB schema (dissemination database) or a set of dissemination data files (PC-Axis files). It maps the DB schema from the database to the SDMX DSD. Dissemination database This is the final storage data warehouse maintained by the Data Provider. It stores data that can be published to potential Data Consumers. PC-Axis files This is the PC-Axis dissemination environment file format (aka px-files). A custom driver has been implemented for loading the data from px-files into a temporary in-memory database so as to be queried by the Data Retriever. (5)SDMX Model/IO (revised) PC-Axis Mapping Store Dissemination DB

14 SDMX-RI Intermediate solution What changed
Streaming of data in the service Usage of JAX-WS in Java (Axis 1.0 could not support streaming), SDMX Model/IO (5) revised with Streaming Writers QP (2) is now part of SDMX Model/IO (5) library DG (4) functionality is now included in the SDMX Model/IO (5) library DR API (3) changes due to streaming Additional technical information on the changes: Before streaming, the Service stored the SOAP response payload in DOM elements. The java implementation was based on Axis1 before moving to JAX-WS API. The SDMX IO library was revised to include the streaming writers for each SDMX-ML dataset message supported (Generic, Compact, XS). The difference is that now the data retrieved from the DDB is not stored at all in SDMX model objects. As the information is read from the DDB record by record, the appropriate writer call is used to write series and observations. The DR API was changed due to streaming. Prior to the intermediate solution, the Dataset object from the SDMX model was used to return all the data of the request. Next, the object was passed to the DG to be written into an SDMX-ML file. The intermediate solution changed this approach. DR does not return something. It takes the query and a streaming writer object and uses them to stream the data to the destination and format specified by the caller. QP functionality always existed in the SDMX IO thus it was a rather logical separation of modules. Now, the same functionality exists only in the SDMX IO. Moreover the DG is obsolete since it has been replaced by the streaming writers in the SDMX IO. SDMX RI Development Roadmap 2012, unit B-3

15 SDMX-RI Intermediate solution Benefits
Better performance Improvement of approximately 75% in concurrent users scenarios Solution to “Out Of Memory” problems for large datasets No memory constraints. The 74,77% improvement was observed for the .NET platform for a query returning 200k observations (size of message ~22MB) with 5 concurrent users making the same request. The before and after response times are ms and 9285 ms. For the Java platform, a 85.44% improvement was observed for a query returning 200k observations (size of message ~22MB) with 5 concurrent users making the same request. The before and after response times are ms and ms. SDMX RI Development Roadmap 2012, unit B-3

16 SDMX-RI Intermediate solution Impact on existing installations
Organizations that have installed the first implementation Only re-install the Web Service. Existing clients of Web Service are not affected The SDMX 2.0 SOAP interface remains Organisations using the SRI components APIs APIs has changed due to streaming support Migration will be required. Organisations that have already done modifications to the source code Will have to make the changes again if they want to use the intermediate solution.

17 SDMX-RI “Ultimate” solution
Why a “Ultimate” solution Overview of different components What changes Benefits Impact to SRI installations

18 Why a “Ultimate” solution
Eurostat’s decision for a Common SDMX API Implementation of components covering all aspects of the API Support for SDMX 2.1 New messages (data representation, queries) New Web Service interfaces (SOAP/REST) The common SDMX API is intended to be used in a wider scope i.e. in all organizations that use SDMX and not only in the scope of ESTAT’s modules. Benefits of a common SDMX API: It allows interchangeable API implementations It ensures reusability of common building blocks like the reading and writing of SDMX-ML messages New SDMX-ML building blocks will accept beans from the new API that can be automatically integrated to other systems. Independency of SDMX version. This has been designed solely in the Information model and not in the schemas. Moreover, for the same reason it will be easier to move to a new version if there will be one in the future. When a new message is provided in the future, it will be supported without any impact to the user programs because they will depend only on the API. A new message will only imply making a new implementation of the Reader and Writer interface. In ensures clearer code in the client programs. It will be easier to be used by the developer because it hides the complexity of the SDMX messages. SDMX RI Development Roadmap 2012, unit B-3

19 SDMX-RI Eurostat’s “Ultimate” solution
getGenericData getCompactData getCrossSectionalData queryStructure Web Service Provider Web Service (1)Structure Retriever (1)Structure Retriever SR API DR API (3)Data Retriever (streaming) (3)Data Retriever (streaming) (6)Common SDMX API Components with dashed line are API’s, i.e. Application Programming Interfaces. Components with normal lines are solid implementations. Web Service Provider This component is responsible for exposing the data using a Web Service interface that provides SDMX-ML messages. It will be capable of exposing three different interfaces, i.e. SDMX 2.0 SOAP, SDMX 2.1 SOAP & REST. (1) Structure Retriever This component is responsible for serving the “queryStructure” calls, i.e. to handle SDMX Structure related queries. Currently, SRI responds to queries for DSDs, Codelists, Concept Schemes, Category Schemes and Dataflows. In the context of the “Ultimate” solution, its API is packaged separately. (3) Data Retriever This component is responsible for querying the dissemination database, getting the respective recordset and populating the sdmx data model with the data retrieved, which is then returned. (6) Common SDMX API A set of interfaces for handling data and metadata based on the SDMX information model. It provides methods for reading and writing from/to SDMX-ML messages. Eurostat’s plans are to use this in several SDMX SW components/tools, e.g. SDMX Converter, Euro SDMX Registry, DSW. (7) SDMX API Implementation This component is an implementation of the interfaces specified in (6) Common SDMX API; combined with the latter, it replaces the component (5) SDMX Model/IO. In the “Ultimate” solution it is based on a Metadata Technology implementation. Mapping Store This component (database) is responsible for keeping the mappings between the SDMX structural metadata and the native format (a file or a DB schema). The mappings are created and edited off-line by the Mapping Assistant. In other words, the Mapping Store is responsible for creating the mappings between an SDMX Data Structure Definition (DSD) and a DB schema (dissemination database) or a set of dissemination data files (PC-Axis files). It maps the DB schema from the database to the SDMX DSD. Dissemination database This is the final storage data warehouse maintained by the Data Provider. It stores data that can be published to potential Data Consumers. PC-Axis files This is the PC-Axis dissemination environment file format (aka px-files). A custom driver has been implemented for loading the data from px-files into a temporary in-memory database so as to be queried by the Data Retriever. (5)SDMX Model/IO (revised) <implements> PC-Axis Mapping Store (7)SDMX API Implementation Dissemination DB

20 Ultimate solution What changes (1)
All modules will be modified to use the SDMX Common API (6) The SDMX Model/IO (5) will no longer be used For Java the MT API implementation (7) will be used For .NET the API implementation (7) will be developed SRI Components APIs will be changed Due to SDMX Common API SDMX 2.1 messages and new query features will be supported Common SDMX API(6) vs SDMX Model/IO(5). Why? The benefits of using the (6) as mentioned in a previous slide: The common SDMX API is an API intended to be used in a wider scope i.e. in all organizations that use SDMX and not only in the scope of ESTAT’s modules. Benefits of a common SDMX API: It allows interchangeable API implementations It ensures reusability of common building blocks like the reading and writing of SDMX-ML messages New SDMX-ML building blocks will accept beans from the new API that can be automatically integrated to other systems. Independency of SDMX version. This has been designed solely in the Information model and not in the schemas. Moreover, for the same reason it will be easier to move to a new version if there will be one in the future. When a new message is provided in the future, it will be supported without any impact to the user programs because they will depend only on the API. A new message will only imply making a new implementation of the Reader and Writer interface. In ensures clearer code in the client programs. It will be easier to be used by the developer because it hides the complexity of the SDMX messages. The shortcomings of SDMX Model/IO(5) as provided also in slide 6: Tight to SDMX Schema v2.0 The current SDMX model has been designed using the SDMX Schema v2.0 and the SDMX Information Model. More specifically, from SDMX IM, it uses only the inheritance of the artefacts (e.g. the abstract classes Identifiable, Maintainable, ItemScheme etc), since in v2.0 XSDs the class hierarchy was not present . The rest of the classes were based on the XSDs and therefore their design is close to the SDMX v2.0 XSDs. For this reason, it is difficult to make them SDMX version independent in the future. It does not provide programmatic interfaces The SDMX model contains only a set of specific classes (aka beans) for keeping the information. In order to be integrated in various applications, those classes have to be explicitly used, because there are no interfaces for them. Having interfaces for all the SDMX model would will allow client programs to base their logic on the interface and not to the implementation of the interface. This would offer the possibility to switch between several implementations of the SDMX model without affecting the client program logic. For instance, a different CodelistBean implementation could be provided for large Codelists so as to store them in a file in order to minimize memory requirements. Examples of an interface from the Common API (Codelist and DSD) public abstract interface CodelistBean extends ItemSchemeBean { public abstract boolean isPartial(); public abstract CodeBean getCodeById(String arg0); public abstract CodelistBean getStub(); public abstract CodelistMutableBean getMutableInstance(); } public abstract interface KeyFamilyBean extends MaintainableBean, ConstrainableBean { public abstract DimensionListBean getDimensionList(); public abstract AttributeListBean getAttribtueList(); public abstract MeasureListBean getMeasureList(); public abstract List getDimensions(SDMX_STRUCTURE_TYPE... arg0); public abstract DimensionBean getFrequencyDimension(); public abstract boolean hasFrequencyDimension(); public abstract DimensionBean getDimension(String arg0); public abstract ComponentBean getComponent(String arg0); public abstract List getGroups(); public abstract GroupBean getGroup(String arg0); public abstract DimensionBean getTimeDimension(); public abstract PrimaryMeasureBean getPrimaryMeasure(); public abstract List getAttributes(); public abstract List getDatasetAttributes(); public abstract List getGroupAttributes(); public abstract List getGroupAttributes(String arg0); public abstract List getDimensionGroupAttributes(); public abstract List getSeriesAttributes(String arg0); public abstract List getObservationAttributes(); public abstract List getObservationAttributes(String arg0); public abstract AttributeBean getGroupAttribute(String arg0); public abstract AttributeBean getDimensionGroupAttribute(String arg0); public abstract AttributeBean getObservationAttribute(String arg0); public abstract KeyFamilyBean getStub(); public abstract KeyFamilyMutableBean getMutableInstance(); Useful only as SDMX information placeholder; no utility methods In most of the cases the SDMX classes serve as a mere placeholder of the information that they represent. This means that they only provide the getters and setters for the information. The model does not provide utility methods that could be handy to a client class when it is needed to extract more refined information out of the bean. Through utility methods, the code on the client classes will be less and clearer. For instance, for the DSD, a utility method would be to get all the artefact references used in the DSD i.e. with one call to get the “id,agency,version” of all the ConceptSchemes and Codelists used within the DSD. If a utility method is not available, the developer should browse all the components of the DSD, using the appropriate getters and gathering referenced artefacts information. Common SDMX API get the references of the KeyFamily bean: Set<CrossReferenceBean> crossReferences = keyFamilyBean.getCrossReferences(); In the current model, none of the above examples is possible. Only the user instantiates the bean and uses the getters and the setters to browse through the components and other information. There are interfaces to plug-in implementations, but there are no domain objects and no high level utility method. Current Estat’s SDMX Model // for getting the references, the following Lists should be iterated and find from each component the referenced ConceptScheme and Codelist. Also it should be checked not getting one Codelist twice. List dimensions = keyFamilyBean.getDimensions(); PrimaryMeasureBean primaryMeasure = keyFamilyBean.getPrimaryMeasure(); List attributes = keyFamilyBean.getAttributes(); SDMX RI Development Roadmap 2012, unit B-3

21 Ultimate solution What changes (2)
Web Service will be extended to support SDMX 2.1 standardized SOAP and RESTful APIs New Web Service endpoints will be added above the Controller. New endpoints will co-exist with SDMX 2.0 endpoint. Will support SDMX 2.1 error handling.

22 WS extension to SDMX 2.1 new interfaces
SOAP Request 2.0 SOAP Request 2.1 REST Request 2.1 NSI_Service_2.0 NSI_Service_2.1 NsiRestService Web Service Provider Controller (1)Structure Retriever (6)Common SDMX API (3)Data Retriever (streaming) Web Service Provider This module is responsible for exposing the data using a Web Service interface that provides SDMX-ML messages. It offers 3 Web Service interfaces: SOAP SDMX v2.0, SOAP v2.1, REST v2.1 NSI_Service_2.0 It is a module of the Web Service Provider component. It implements the Web Service SOAP interface according to the SDMX v2.0 Web Service guidelines. It is responsible for serving such requests, that are passed to the Controller. NSI_Service_2.1 It is a module of the Web Service Provider component. It implements the Web Service SOAP interface according to the SDMX v2.1 Web Service guidelines (SDMX v2.1 provides a standardised WSDL). It is responsible for serving such requests, that are passed to the Controller. NsiRestService It is a module of the Web Service Provider component. It implements the Web Service Restful API according to the SDMX v2.1 Web Service guidelines. It is responsible for serving such requests, that are passed to the Controller. Controller It is a module of the Web Service Provider component that has all the logic of the Web Service provider. It coordinates the calls to the rest of the modules (SR, DR, common SDMX API reader/writers) in order to carry out the request so as its result is streamed back to the interface that was called i.e. v2.0, v2.1 and Rest services. Data Retriever This module is responsible for querying the dissemination database and getting the respective recordset, which is then streamed to the caller. The DR is provided with the query to process and a Streaming Writer that depends on the type of the message (i.e. Generic, Compact, XS). Common SDMX API This is an API that provides interfaces of objects for storing data and metadata based on the SDMX information model. Also, it provides interfaces of methods for reading and writing from/to SDMX-ML messages. It is the common SDMX API that is intended to be used in a inter-organisation scope in order to foster reusability of components. The SRI Web Service uses from this API the reading of SDMX-ML Data Query, reading writing of the SDMX-ML RegistryInterface for structure query request/responses and finally the streaming writers of SDMX-ML Datasets. SDMX API implementation This is the implementation of the Common API used in the context of the SRI. The modules are dependent on the API that provides the interfaces - however they should use an implementation of the API that provides the actual functionality. The MT implementation for Java will be used in the SRI. An implementation for the .NET is pending yet. (7)SDMX 2.0 Implementation (7)SDMX API Implementation (7)SDMX 2.1 Implementation SDMX RI Development Roadmap 2012, unit B-3

23 Web Service request sequence
:ServiceImpl GetCompact Controller() :Controller WS Client HandleRequest ( request, OutputStream) DataQueryParseManager() :DataQueryParseManager buildDataQuery(request) DataQueryBean DataRetriever() :DataRetriever getDsdForDataQuery(DataQueryBean) DataStructureBean CompactWriter :CompactData WriterEngine CompactWriter(OutputStream, DataStructureBean) This sequence diagram presents a simplified execution of a Web Service response to a request for Compact Data. The color used in the objects and classes shown here, map to those used in the previous slide, i.e. light green for WS Provider, purple for Common SDMX API/Implementation and blue for Data Retriever. The different activities identified here are: 1. The appropriate ServiceImpl is initiated according to the call, i.e. on SDMX 2.0 SOAP, SDMX 2.1 SOAP or REST interfaces. Input: Client Request Output: N/A 2. Initiating the Controller in order to handle the incoming request. 3. Initiating the appropriate parser to parse the incoming SDMX-ML into a DataQueryBean object. Input: SDMX-ML Query Output: DataQueryBean 4. Initiating the DR and getting the relevant DSD in order to validate and produce the Dataset Input: DataQueryBean Output: DataStructureBean 5. Initiating the appropriate writer, i.e. the CompactWriter 6. Using the Compact Writer within the DR in order to write the response to the output stream provided by the ServiceImpl Input: DataQueryBean, CompactWriter (step 5) Please note that the process of writing data from the dissemination database to the response stream is not shown here in detail due to its complexity and the lack of space. What is framed in the red round-cornered square is this process and the asterisk (*) denotes that the Write{Groups/Series/Obs} set of methods are called iteratively while the DR goes through the database response and writes (via the CompactWriter) the response to the OutputStream. As soon as the writing process finishes, the control returns back to the ServiceImpl in order to finalise the response. RetrieveData(DataQueryBean, CompactWriter) Write* {Groups/Series/Obs} SDMX RI Development Roadmap 2012, unit B-3

24 SDMX-RI Ultimate solution Benefits
Usage of Common SDMX API Interchangeable implementation. Foster component reusability. Support of data streaming Support of SDMX 2.1 New query capabilities. New message formats. Support of RESTful API

25 SDMX-RI Ultimate solution Impact for existing installations (1)
Organisations with Mapping Store in production, will have to: Install new Mapping Assistant Upgrade Mapping Store automatically within MA Organisations that have a Web Service installation in place, will have to: Install the new Web Service package Existing clients of Web Service will not be affected The SDMX 2.0 SOAP interface will remain

26 SDMX-RI Ultimate solution Impact for existing installations (2)
Organisations using the SRI components APIs Migration will be required Migration guidelines will be provided Organisations that have already done modifications to the source code Will have to make the changes again using the “Ultimate” solution

27 SDMX-RI - Introduction


Download ppt "SDMX Reference Infrastructure"

Similar presentations


Ads by Google