CSPA Specifications Overview

CSPA Specifications Overview
Marco Silipo THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Outline CSPA Specifications The PoC Lessons learned 2

CSPA Specifications 3

Statistical Services The level of reusability promised by the adoption of a SOA is dependent on standard definitions of the services. CSPA has three layers to the description of any service: Service Definition Service Specification Service Implementation Description

Statistical Services

Statistical Services The capabilities of a Statistical Service are described in terms of the GSBPM sub process that it relates to, the business function that it performs and GSIM information objects which are the inputs and outputs.

Statistical Services The capabilities of a Statistical Service are fleshed out into business functions that have GSIM implementation level objects as inputs and outputs. This document also includes metrics and methodologies.

Statistical Services The functions of the Statistical Service are refined into detailed operations whose inputs and outputs are GSIM implementation level objects.

Statistical Services This layer fully defines the service contract, including communications protocols, by means of the Service Implementation Description. It includes a precise description of all dependencies to the underlying infrastructure, non-functional characteristics and any relevant information about the configuration of the application being wrapped, when applicable.

Statistical Services In general, there will be one Service Specification corresponding to a Service Definition, to ensure that standard data exchange can occur. At the implementation level, services may have different implementations reflecting the environment of the supplying organization. Each implementation must rigidly adhere to the data format specified in the Service Specification.

CSPA Statistical Services: Templates

CSPA Service Definition
The CSPA Service Definition is at a conceptual level. In CSPA, the capabilities of a Statistical Service are described in terms of the GSBPM sub process that it relates to, the business function that it performs and GSIM information objects which are the inputs and outputs.

Service Definition Template
Name GSBPM Business Function Outcomes Restrictions GSIM Inputs GSIM Outputs Service dependencies

CSPA Service Specification
The CSPA Service Specification is at a logical level. In this layer, the capabilities of a CSPA Service are fleshed out into business functions that have GSIM implementation level objects as inputs and outputs. This document also includes metrics and methodologies.

Service Specification Template

CSPA Service Implementation
The CSPA Service Implementation Description is at an implementation (or physical) level. In this layer, the functions of the CSPA Service are refined into detailed operations whose inputs and outputs are GSIM implementation level objects. This layer fully defines the service contract, including communications protocols. It includes non-functional characteristics and any relevant information about the configuration of the application being wrapped.

Service Implementation Template 1/2

Service Implementation Template 2/2

CSPA Service Design & Implementation

CSPA Roles The use of CSPA is based on functional roles, to be assigned within a statistical organization Investor Designer Service Builder Service Assembler Configurer User The functional roles are presented in the following in the form of user stories

Investor Story The Investor receives demand for a new data collection and needs to compare the cost of running a collection using traditional methods, with the cost of using a set of components as per CSPA. The Investor identifies existing Statistical Services to be used and identifies gaps where no Statistical Service already exists. The Investor weighs up creating a fully bespoke processing solution for the collection against having to build a new Statistical Service that fits into a set of existing Services. This would be done in consultation with other roles.

Designer Story The Designer has been given a set of business requirements at a high level from the Investor what data is needed, what are the parameters of the process? In order to determine what functionality is available, the Designer will consider internal and external capabilities by searching in the Statistical Services Catalogue

Designer Story - 2 When a possible internal candidate is found, there will be a decision made as to whether the existing functionality should be wrapped and exposed as a Statistical Service, or whether a new Statistical Service should be built. In the latter case, potential collaborators should be identified and negotiated with the Investor.

Designer Story - 3 Once development has been decided, or in the case where existing functionality must be heavily modified, for each needed Statistical Service the Designer will specify the needed functionality to meet requirements. The Statistical Service is defined on a conceptual and logical level by a Service Definition using GSIM information objects and a Service Specification using GSIM implementation objects

Designer Story - 4 On an implementation level, decisions must be made about how to realize needed functionality using technology approaches and these are documented in a Service implementation definition by the Builder. Service design across these levels includes information design, technology design, work-flow, interface design, and other relevant aspects. The Designer will address Service dependencies, Service contracts, extensibility, and all data, metadata and performance metrics. Capabilities for configuration by users will also be determined, as well as the degree of configuration to be implemented by the Builder.

Designer Story - 5 An alternative scenario is one where Statistical Services are already available, having been found in the Statistical Services Catalogue, and meeting all identified requirements. In this case, the Designer specifies which Statistical Services are to be used, and specifies this for the Assembler to work with directly.

Builder Story The Service Builder receives Statistical Service definition and a Statistical Service specification from the Designer. The Statistical Service is then implemented by the Builder, by creating a Service implementation definition. The Builder will also implement features of the Service so that the Assembler can integrate it into the local environment so it can be deployed. The Builder tests the components to support the specified functionality.

Assembler Story The Assembler will take the Statistical Service and integrate it according to the understanding of the needed business process, as expressed in the design documentation. There are two cases for the Assembler Statistical Services entirely assembled within the local environment, which provides a high degree of confidence in their compatibility. Use of external Statistical Services, which might require extension or modification. In this latter case, issues would be communicated to the Designer and Builder for further development.

Configurer Story The Configurer takes the assembled process, and makes it suitable for use in the intended statistical domain. Parameters are specified according to the domain knowledge of the Configurer. Any issues with the assembled Service are communicated to the Designer, Builder, or Assembler.

User Story There is no single user but a chain of users along the Statistical process. The user chain covers everyone from the designers of surveys, through the conduct of data collection operations, through to those who process the collected data. The User does not need to know where the data and metadata are stored - in particular the user does not need to actively manage how data flows between parts of the processing environment.

Catalogues Catalogues of reusable resources have a key role within CSPA. They provide lists and descriptions of standardized artefacts, and, where relevant, information on how to obtain and use them. The catalogues can be at many levels, from global to local. For example, it is envisaged that each statistical organization will have catalogues of processes, information objects and Statistical Services.

CSPA Roles Review

CSPA Use Cases Sharing and reusing of statistical software among NSIs

Proof of concept 35

CSPA Proof Of Concept The Proof of Concept produced the first CSPA Statistical Services. The work was progressed in parallel to the work undertaken to develop the architecture. The purpose of doing this was to test the architecture and provide quick feedback into the development of the architecture.

CSPA Proof Of Concept Given the short timeframe in which to complete the Proof of Concept, it was decided that the Statistical Services for the Proof of Concept could not be built from scratch. Instead, the organisations involved in the project were consulted to find suitable candidate tools/applications that could be wrapped and exposed as Statistical Services.

CSPA PoC: the Tools Blaise Editrules
A data collection, data editing and data processing tool developed by Statistics Netherlands. For the Proof of Concept only the collection function was involved. Editrules An error localization tool developed by a staff member at Statistics Netherlands and made available under GPL and can be obtained through the CRAN website. CANCEIS (CANadian Census Edit and Imputation System) An editing tool used for error localization and imputation developed by Statistics Canada. GCode A generalized automated and assisted coding tool developed by Statistics Canada. Statistical Coding Service A coding tool developed by Statistics New Zealand.

Designer the Designer role creates the Service Definitions and Service Specifications.

Designer - 2 The five tools which were to be wrapped for the Proof of Concept performed four business functions: Run Collection (Blaise) Error Localization (Edit Rules) Editing and Imputation (CANCEIS) Autocoding (GCode and Statistical Classification Service) GCode and the Statistical Classification Service performed the same business function – so at the conceptual and logical level they are the same service.

Builders Organizations involved in the wrapping of one of the candidate tools performed the role of "Service Builders". Five statistical organizations performed this role during the Proof of Concept Australia: Run Collection Statistical Service (Blaise) Italy: Error Localization Statistical Service (EditRules) Canada: Editing and Imputation Statistics Service (CANCEIS) Netherlands: Autocoding Service 1 (GCode) New Zealand: Autocoding Service 2 (Statistical Classification Service)

Building a Service: Autocoding 1

Assemblers Within each statistical organization, there needs to be an infrastructural environment in which the generic services can be combined and configured to run as element of organization specific processes. This environment is not part of CSPA. CSPA assumes that each statistical organization has such an environment and makes statements about the characteristics and capabilities that such a platform must have in order to be able to accept and run Statistical Services that comply with CSPA

Assemblers - 2 The Statistical Services were implemented (in various combinations, as shown in Figure 5) into three statistical organizations (Italy, New Zealand and Sweden). These organizations performed the role of Service Assembler for the Proof of Concept.

Statistical Service Definition Example

Statistical Service Specification Protocol for invoking the service
This service is invoked by calling a function called "CodeDataset". There are the following seven parameters (all of them are expressed as URI's, i.e. all data is passed by reference) 1) Location of the codelist; 2) Location of the input dataset; 3) Location of the structure file describing the input dataset 4) Location of the mapping file describing which variables in the input dataset to be used 5) Location of the output dataset generated by the service 6) Location of the structure file describing the output dataset generated by the service 7) Location of the process metrics file generated by the service. All parameters are required. The protocol used to invoke this function is SOAP, and is in compliance with the guidance provided for developing Statistical Service by CSPA.

Statistical Service Specification Input messages
The first four parameters for the service refer to input files. In GSIM terms, the inputs to this service are: 1) a NodeSet consisting of Nodes, which bring together CategoryItems, CodeItems, and other Designations (synonyms). 2) a Unit data set – the texts to be coded for a particular variable 3) a Data structure, describing the structure of the Unit data set 4) a set of Rules, describing which variables the service should use for which purpose.

The codelist to be passed in must be expressed as a DDI 3.1 instance, using the following structure. The table shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1

The unit data set is a fixed-width ASCII file containing at least a case ID (50 characters maximum) and a variable containing text strings to be coded. Each entry should be on a single line. The corresponding GSIM objects:

The structure of the unit data set must be expressed as a DDI 3.1 instance, using the following structure. The table below shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1:

Statistical Service Specification Output messages
The output of the service contains of three files. In GSIM terms, the outputs of this service are: 5) a Unit data set containing the coded data for the variable concerned; 6) a Data structure, describing the structure of this Unit data set 7) a Process Metric, containing information about the execution of the service. These generated files will be placed at the locations indicated by the 5th, 6th and 7th input parameters. No return parameter will be generated by the service.

The unit data set will be a fixed-width ASCII file containing (for the successfully coded entries) the case ID (50 characters maximum) followed by the Code. Each entry should be on a single line.

The structure of the unit data set will be expressed as a DDI 3.1 instance, using the following structure. The table below shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1:

The Process metrics will be expressed as an XML file structured in the following way:

Statistical Service Specification Error messages
When the coding process cannot be executed or is aborted due to some error, the service will return an error message. The following error messages can be generated by the service.

What was proved? CSPA is practical and can be implemented by various agencies in a consistent way Having tested the architecture, some of the real issues are now known and there is a tested foundation to move forward from. One quote from a business perspective on the Proof of Concept was: "The proof-of-concept form of working with these concepts is in itself very interesting. We can quickly gain insight to both problems and possibilities"

What Was Proved

Lessons Learned International collaboration is a trade to be mastered
The on-going contact with colleagues over the globe is stimulating and broadens the understanding. The discussion forum on the CSPA wiki was useful for discussing and progressing issues. However, the ability to undertake trouble shooting through the installation / configuration period was made difficult by the time zone differences. It meant that simple problems often took a number of days to resolve.

Lessons Learned - 2 The separation of roles in design, build, assemble functions worked very well. However, due to the limited time spent focused on the Design role (limited to the 1 week design sprint), there was a blurring of the Designer and Builder roles. The Service Builders found in some cases that they had to tighten up the design specifications that they were given in order to complete the build work.

Lessons Learned - 3 Each of the Service Builders and Service Assemblers needed licences for the tools that were wrapped. This was both a challenge and an opportunity. Obtaining the licences took some time and caused (small) delays in starting work. This was not a big problem given the small scale of the Proof of Concept. However, in the future, if an organisation that owns a Statistical Service has to provide licences for every party who wants to try the service, this could be become onerous. Some organisations had processes in place to provide licences and some did not. At least one organisation created a process that they will be able to use for future collaborations.

Lessons Learned - 4 The Proof of Concept chose to wrap existing tools into Statistical Services for pragmatic reasons. The wrapping did introduce some complexity. In some cases, the tools being wrapped by Service Builders were not developed by the organisation performing the role of Service Builder. Building service wrappers with meaningful interfaces requires in depth knowledge of the tool being wrapped. The Service Assemblers also needed in depth knowledge of Service that they were implementing. Support is required to implement a Statistical Service built by another organization.

Lessons Learned - 5 The Proof of Concept was one of the first real world uses of GSIM Implementation. Support was provided to Service Builders by DDI experts as well as participants in the HLG Frameworks and Standards for Statistical Modernisation project. However, Service Builders needed knowledge of GSIM and GSIM Implementation standards (DDI in the case of the Proof of Concept). That’s where LIM comes in. In some cases, DDI needed to be extended. It took time explore the how these extensions should be done.

CSPA Specifications Overview

Similar presentations

Presentation on theme: "CSPA Specifications Overview"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSPA Specifications Overview

Similar presentations

Presentation on theme: "CSPA Specifications Overview"— Presentation transcript:

Similar presentations

About project

Feedback