CSPA Specifications Overview

Slides:



Advertisements
Similar presentations
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Advertisements

GSBPM and GSIM as the basis for the Common Statistical Production Architecture Steven Vale UNECE
Enabling Flexible Integration of Business and Technology Using Service-based Processes Jelena Zdravkovic, University of Gävle/Royal Institute of Technology.
Common Statistical Production Architecture An statistical industry architecture will make it easier for each organisation to standardise and combine the.
The future of Statistical Production CSPA. We need to modernise We have a burning platform with: rigid processes and methods; inflexible ageing technology;
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Background Data validation, a critical issue for the E.S.S.
GSIM Stakeholder Interview Feedback HLG-BAS Secretariat January 2012.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Generic Statistical Information Model (GSIM) Thérèse Lalor and Steven Vale United Nations Economic Commission for Europe (UNECE)
Standardisation Informal summary of ABS Perspective.
The future of Statistical Production CSPA. 50 task team members 7 task teams CSPA 2015 project.
Aim: “to support the enhancement and implementation of the standards needed for the modernisation of statistical production and services”
Generic Statistical Information Model (GSIM) Jenny Linnerud
2013 HLG Project: Common Statistical Production Architecture.
GSIM, DDI & Standards- based Modernisation of Official Statistics Workshop – DDI Lifecycle: Looking Forward October 2012.
On Implementing CSPA Specifications for Editing and Imputation Services Donato Summa, Monica Scannapieco, Diego Zardetto, Istat, Italy Istituto Nazionale.
The future of Statistical Production CSPA. This webinar on CSPA (common statistical production architecture) is part of a series of lectures on the main.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
United Nations Economic Commission for Europe Statistical Division GSBPM and Other Standards Steven Vale UNECE
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
United Nations Economic Commission for Europe Statistical Division CSPA: The Future of Statistical Production Steven Vale UNECE
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Common Statistical Production Architecture
Generic Statistical Data Editing Models (GSDEMs)
DDI and GSIM – Impacts, Context, and Future Possibilities
Achievements in 2016 Data Integration Linked Open Metadata
Towards connecting geospatial information and statistical standards in statistical production: two cases from Statistics Finland Workshop on Integrating.
Distribution and components
CHAPTER 3 Architectures for Distributed Systems
Navigating the application of Modernisation Frameworks when using Commercial Of The Shelf products. This presentation will provide a walkthrough of.
GSIM Implementation at Statistics Finland Session 1: ModernStats World - Where to begin with standards based modernisation? UNECE ModernStats World Workshop.
Chapter 10: Process Implementation with Executable Models
Measuring Data Quality and Compilation of Metadata
SDMX Reference Infrastructure Introduction
GSBPM, GSIM, and CSPA.
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Logical information model LIM Geneva june
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
The Generic Statistical Information Model
Modernising Official Statistics
Chapter 7 –Implementation Issues
HingX Project Overview
The problem we are trying to solve
X-DIS/XBRL Phase 2 Kick-Off
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
CSPA: The Future of Statistical Production
Introducing the GSBPM Steven Vale UNECE
Parallel Session: BR maintenance Quality in maintenance of a BR:
Presentation to SISAI Luxembourg, 12 June 2012
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Dealing with confidential data Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION.
Generic Statistical Information Model (GSIM)
The future of Statistical Production
Treatment of statistical confidentiality Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE.
Introduction to the Common Statistical Production Architecture Alice Kovarikova High-Level Workshop on Modernization of Official Statistics, Nizhny Novgorod,
CSPA Common Statistical Production Architecture Motivations: definition and benefit of CSPA and service oriented architectures Carlo Vaccari Istat
Data Architecture project
CSPA Common Statistical Production Architecture Motivations: definition and benefit of CSPA and service oriented architectures Carlo Vaccari Istat
CSPA Templates for sharing services
ESS Enterprise Architecture
CSPA Templates for sharing services
Palestinian Central Bureau of Statistics
High-Level Group for the Modernisation of Official Statistics
Hands-on GSIM Mauro Scanu ISTAT
Presentation transcript:

CSPA Specifications Overview Marco Silipo THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Outline CSPA Specifications The PoC Lessons learned 2

CSPA Specifications 3

Statistical Services The level of reusability promised by the adoption of a SOA is dependent on standard definitions of the services. CSPA has three layers to the description of any service: Service Definition Service Specification Service Implementation Description

Statistical Services

Statistical Services The capabilities of a Statistical Service are described in terms of the GSBPM sub process that it relates to, the business function that it performs and GSIM information objects which are the inputs and outputs.

Statistical Services The capabilities of a Statistical Service are fleshed out into business functions that have GSIM implementation level objects as inputs and outputs. This document also includes metrics and methodologies.

Statistical Services The functions of the Statistical Service are refined into detailed operations whose inputs and outputs are GSIM implementation level objects.

Statistical Services This layer fully defines the service contract, including communications protocols, by means of the Service Implementation Description. It includes a precise description of all dependencies to the underlying infrastructure, non-functional characteristics and any relevant information about the configuration of the application being wrapped, when applicable.

Statistical Services In general, there will be one Service Specification corresponding to a Service Definition, to ensure that standard data exchange can occur. At the implementation level, services may have different implementations reflecting the environment of the supplying organization. Each implementation must rigidly adhere to the data format specified in the Service Specification.

CSPA Statistical Services: Templates

CSPA Service Definition The CSPA Service Definition is at a conceptual level. In CSPA, the capabilities of a Statistical Service are described in terms of the GSBPM sub process that it relates to, the business function that it performs and GSIM information objects which are the inputs and outputs.

Service Definition Template Name   GSBPM Business Function Outcomes Restrictions GSIM Inputs GSIM Outputs Service dependencies

CSPA Service Specification The CSPA Service Specification is at a logical level. In this layer, the capabilities of a CSPA Service are fleshed out into business functions that have GSIM implementation level objects as inputs and outputs. This document also includes metrics and methodologies.

Service Specification Template

CSPA Service Implementation The CSPA Service Implementation Description is at an implementation (or physical) level. In this layer, the functions of the CSPA Service are refined into detailed operations whose inputs and outputs are GSIM implementation level objects. This layer fully defines the service contract, including communications protocols. It includes non-functional characteristics and any relevant information about the configuration of the application being wrapped.

Service Implementation Template 1/2

Service Implementation Template 2/2

CSPA Service Design & Implementation

CSPA Roles The use of CSPA is based on functional roles, to be assigned within a statistical organization Investor Designer Service Builder Service Assembler Configurer User The functional roles are presented in the following in the form of user stories

Investor Story The Investor receives demand for a new data collection and needs to compare the cost of running a collection using traditional methods, with the cost of using a set of components as per CSPA. The Investor identifies existing Statistical Services to be used and identifies gaps where no Statistical Service already exists. The Investor weighs up creating a fully bespoke processing solution for the collection against having to build a new Statistical Service that fits into a set of existing Services. This would be done in consultation with other roles.

Designer Story The Designer has been given a set of business requirements at a high level from the Investor what data is needed, what are the parameters of the process? In order to determine what functionality is available, the Designer will consider internal and external capabilities by searching in the Statistical Services Catalogue

Designer Story - 2 When a possible internal candidate is found, there will be a decision made as to whether the existing functionality should be wrapped and exposed as a Statistical Service, or whether a new Statistical Service should be built. In the latter case, potential collaborators should be identified and negotiated with the Investor.

Designer Story - 3 Once development has been decided, or in the case where existing functionality must be heavily modified, for each needed Statistical Service the Designer will specify the needed functionality to meet requirements. The Statistical Service is defined on a conceptual and logical level by a Service Definition using GSIM information objects and a Service Specification using GSIM implementation objects

Designer Story - 4 On an implementation level, decisions must be made about how to realize needed functionality using technology approaches and these are documented in a Service implementation definition by the Builder. Service design across these levels includes information design, technology design, work-flow, interface design, and other relevant aspects. The Designer will address Service dependencies, Service contracts, extensibility, and all data, metadata and performance metrics. Capabilities for configuration by users will also be determined, as well as the degree of configuration to be implemented by the Builder.

Designer Story - 5 An alternative scenario is one where Statistical Services are already available, having been found in the Statistical Services Catalogue, and meeting all identified requirements. In this case, the Designer specifies which Statistical Services are to be used, and specifies this for the Assembler to work with directly.

Builder Story The Service Builder receives Statistical Service definition and a Statistical Service specification from the Designer. The Statistical Service is then implemented by the Builder, by creating a Service implementation definition. The Builder will also implement features of the Service so that the Assembler can integrate it into the local environment so it can be deployed. The Builder tests the components to support the specified functionality.

Assembler Story The Assembler will take the Statistical Service and integrate it according to the understanding of the needed business process, as expressed in the design documentation. There are two cases for the Assembler Statistical Services entirely assembled within the local environment, which provides a high degree of confidence in their compatibility. Use of external Statistical Services, which might require extension or modification. In this latter case, issues would be communicated to the Designer and Builder for further development.

Configurer Story The Configurer takes the assembled process, and makes it suitable for use in the intended statistical domain. Parameters are specified according to the domain knowledge of the Configurer. Any issues with the assembled Service are communicated to the Designer, Builder, or Assembler.

User Story There is no single user but a chain of users along the Statistical process. The user chain covers everyone from the designers of surveys, through the conduct of data collection operations, through to those who process the collected data. The User does not need to know where the data and metadata are stored - in particular the user does not need to actively manage how data flows between parts of the processing environment.

Catalogues Catalogues of reusable resources have a key role within CSPA. They provide lists and descriptions of standardized artefacts, and, where relevant, information on how to obtain and use them. The catalogues can be at many levels, from global to local. For example, it is envisaged that each statistical organization will have catalogues of processes, information objects and Statistical Services.

CSPA Roles Review

CSPA Roles Review

CSPA Use Cases Sharing and reusing of statistical software among NSIs

Proof of concept 35

CSPA Proof Of Concept The Proof of Concept produced the first CSPA Statistical Services. The work was progressed in parallel to the work undertaken to develop the architecture. The purpose of doing this was to test the architecture and provide quick feedback into the development of the architecture.

CSPA Proof Of Concept Given the short timeframe in which to complete the Proof of Concept, it was decided that the Statistical Services for the Proof of Concept could not be built from scratch. Instead, the organisations involved in the project were consulted to find suitable candidate tools/applications that could be wrapped and exposed as Statistical Services.

CSPA PoC: the Tools Blaise Editrules A data collection, data editing and data processing tool developed by Statistics Netherlands. For the Proof of Concept only the collection function was involved. Editrules An error localization tool developed by a staff member at Statistics Netherlands and made available under GPL and can be obtained through the CRAN website. CANCEIS (CANadian Census Edit and Imputation System) An editing tool used for error localization and imputation developed by Statistics Canada. GCode A generalized automated and assisted coding tool developed by Statistics Canada. Statistical Coding Service A coding tool developed by Statistics New Zealand.

Designer the Designer role creates the Service Definitions and Service Specifications.

Designer - 2 The five tools which were to be wrapped for the Proof of Concept performed four business functions: Run Collection (Blaise) Error Localization (Edit Rules) Editing and Imputation (CANCEIS) Autocoding (GCode and Statistical Classification Service) GCode and the Statistical Classification Service performed the same business function – so at the conceptual and logical level they are the same service.

Builders Organizations involved in the wrapping of one of the candidate tools performed the role of "Service Builders". Five statistical organizations performed this role during the Proof of Concept Australia: Run Collection Statistical Service (Blaise) Italy: Error Localization Statistical Service (EditRules) Canada: Editing and Imputation Statistics Service (CANCEIS) Netherlands: Autocoding Service 1 (GCode) New Zealand: Autocoding Service 2 (Statistical Classification Service)

Building a Service: Autocoding 1

Assemblers Within each statistical organization, there needs to be an infrastructural environment in which the generic services can be combined and configured to run as element of organization specific processes. This environment is not part of CSPA. CSPA assumes that each statistical organization has such an environment and makes statements about the characteristics and capabilities that such a platform must have in order to be able to accept and run Statistical Services that comply with CSPA

Assemblers - 2 The Statistical Services were implemented (in various combinations, as shown in Figure 5) into three statistical organizations (Italy, New Zealand and Sweden). These organizations performed the role of Service Assembler for the Proof of Concept.

Statistical Service Definition Example

Statistical Service Specification Protocol for invoking the service This service is invoked by calling a function called "CodeDataset". There are the following seven parameters (all of them are expressed as URI's, i.e. all data is passed by reference) 1) Location of the codelist; 2) Location of the input dataset; 3) Location of the structure file describing the input dataset 4) Location of the mapping file describing which variables in the input dataset to be used 5) Location of the output dataset generated by the service 6) Location of the structure file describing the output dataset generated by the service 7) Location of the process metrics file generated by the service. All parameters are required. The protocol used to invoke this function is SOAP, and is in compliance with the guidance provided for developing Statistical Service by CSPA.

Statistical Service Specification Input messages The first four parameters for the service refer to input files. In GSIM terms, the inputs to this service are: 1) a NodeSet consisting of Nodes, which bring together CategoryItems, CodeItems, and other Designations (synonyms). 2) a Unit data set – the texts to be coded for a particular variable 3) a Data structure, describing the structure of the Unit data set 4) a set of Rules, describing which variables the service should use for which purpose.

Statistical Service Specification Input messages The codelist to be passed in must be expressed as a DDI 3.1 instance, using the following structure. The table shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1

Statistical Service Specification Input messages The unit data set is a fixed-width ASCII file containing at least a case ID (50 characters maximum) and a variable containing text strings to be coded. Each entry should be on a single line. The corresponding GSIM objects:

Statistical Service Specification Input messages The structure of the unit data set must be expressed as a DDI 3.1 instance, using the following structure. The table below shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1:

Statistical Service Specification Output messages The output of the service contains of three files. In GSIM terms, the outputs of this service are: 5) a Unit data set containing the coded data for the variable concerned; 6) a Data structure, describing the structure of this Unit data set 7) a Process Metric, containing information about the execution of the service. These generated files will be placed at the locations indicated by the 5th, 6th and 7th input parameters. No return parameter will be generated by the service.

Statistical Service Specification Output messages The unit data set will be a fixed-width ASCII file containing (for the successfully coded entries) the case ID (50 characters maximum) followed by the Code. Each entry should be on a single line.

Statistical Service Specification Output messages The structure of the unit data set will be expressed as a DDI 3.1 instance, using the following structure. The table below shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1:

Statistical Service Specification Output messages The Process metrics will be expressed as an XML file structured in the following way:

Statistical Service Specification Error messages When the coding process cannot be executed or is aborted due to some error, the service will return an error message. The following error messages can be generated by the service.

What was proved? CSPA is practical and can be implemented by various agencies in a consistent way Having tested the architecture, some of the real issues are now known and there is a tested foundation to move forward from. One quote from a business perspective on the Proof of Concept was: "The proof-of-concept form of working with these concepts is in itself very interesting. We can quickly gain insight to both problems and possibilities"

What Was Proved

Lessons Learned International collaboration is a trade to be mastered The on-going contact with colleagues over the globe is stimulating and broadens the understanding. The discussion forum on the CSPA wiki was useful for discussing and progressing issues. However, the ability to undertake trouble shooting through the installation / configuration period was made difficult by the time zone differences. It meant that simple problems often took a number of days to resolve.

Lessons Learned - 2 The separation of roles in design, build, assemble functions worked very well. However, due to the limited time spent focused on the Design role (limited to the 1 week design sprint), there was a blurring of the Designer and Builder roles. The Service Builders found in some cases that they had to tighten up the design specifications that they were given in order to complete the build work.

Lessons Learned - 3 Each of the Service Builders and Service Assemblers needed licences for the tools that were wrapped. This was both a challenge and an opportunity. Obtaining the licences took some time and caused (small) delays in starting work. This was not a big problem given the small scale of the Proof of Concept. However, in the future, if an organisation that owns a Statistical Service has to provide licences for every party who wants to try the service, this could be become onerous. Some organisations had processes in place to provide licences and some did not. At least one organisation created a process that they will be able to use for future collaborations.

Lessons Learned - 4 The Proof of Concept chose to wrap existing tools into Statistical Services for pragmatic reasons. The wrapping did introduce some complexity. In some cases, the tools being wrapped by Service Builders were not developed by the organisation performing the role of Service Builder. Building service wrappers with meaningful interfaces requires in depth knowledge of the tool being wrapped. The Service Assemblers also needed in depth knowledge of Service that they were implementing. Support is required to implement a Statistical Service built by another organization.

Lessons Learned - 5 The Proof of Concept was one of the first real world uses of GSIM Implementation. Support was provided to Service Builders by DDI experts as well as participants in the HLG Frameworks and Standards for Statistical Modernisation project. However, Service Builders needed knowledge of GSIM and GSIM Implementation standards (DDI in the case of the Proof of Concept). That’s where LIM comes in. In some cases, DDI needed to be extended. It took time explore the how these extensions should be done.