Presentation on theme: "“Mapping the GSBPM on a SDW architecture”"— Presentation transcript:
1“Mapping the GSBPM on a SDW architecture” National Institute of Statistics – Italy“Mapping the GSBPM on a SDW architecture”Antonio Laureti PalmaIT - Structural Business Statistics UnitWorkshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION22 & 23 september 2011
2OverviewThe aim of this study is to define and contextualize a statistical data warehouse in order to define a framework to assist the development and definition of “data warehousing and data linking”.The data warehousing architecture presented can be considered as an IT-conclusion of the activities of the first year of the ESSnet. While, the modelling approach proposed it would indicate the roadmap for the future IT representation on the context. It will be described by:Data Warehousing as a Single Coherent Statistical production SystemStatistical Data Warehousing an Architecture schemaModeling the Business Domain - Designer’s view of the GSBPM on DWA schemaModeling the Data/Metadata DomainConclusion
3The Data Warehouse IT definition: In computing, a data warehouse is a database used for reporting.…the concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse" (from Wikipedia)....as Bill Inmon says - “the data warehouse is at the center of the corporate information factory, which provides a logical framework for decision support environments and business management capabilities”....in essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to delivering business intelligence.
4Data Warehousing for Enterprise DW centrality in an enterprise is obtained trough a IT infrastructure transversal to all the operational systems.The data from operational systems are Extracted Transformed and Loaded (ETL) into the DW and then they are available for the DSS and MIS.MARKETINGETLDATAWAREHOUSEDSSDecisionSupport SystemRESOURCESETLPRODUCTIONETLMISManagementInformation SystemDISTRIBUTIONETLSALESETLENTERPRISE PRODUCTION LINE
5Data Warehousing for Statistics In a NSI, if the DW is mainly used for improving production efficiency, like for an enterprise, it is transversal to the statistical production line:REGULATIONSETLDATAWAREHOUSEDSSDecisionSupport SystemRESOURCESETLSURVEYSETLADMIN DATAETLMISManagementInformation SystemELABORATIONETLOUTPUTETLSTATISTICAL PRODUCTION LINE
6Data Warehousing for Statistics In a NSI, if the DW is used for “improving the production efficiency” (DSS-MIS) and for “creating the statistical product” (SD), then the DW is part of the production line.…in this case, the DW could be considered as a single logical repository, the center of the information factory, of all information generated from the NSI:REGULATIONSDATAWAREHOUSESDStatisticalDisseminationETLRESOURCESETLDDSSTATISTICAL PRODUCTION LINESURVEYSETLMISADMIN DATAETL
7From the survey, two issues arise: Single coherent system (questions 6 to 13)15 counties declare they do not have a single coherent system, even if 11 out of them are planning to change it... this situation will probably largely change in the next five years...Current output requirements are not integrated into data systems for 10 countries and the situation will probably change for half of them...Those who have a single coherent system do not want to change it, metadata and data-input are totally integrated in the data system as well as admin data.Motivation to start DW (question 14)The main motivations are linked to the ways to (re)use data, the improvement of the efficiency and the process integration in business statistics production...Adjunct motivations are integrating the project in the organization processing model, reducing the burden (cost and time) on survey responders and increasing consistency and quality.
8Disadvantages of a stove-pipe-like production In a stove-pipe production system every single production line corresponds to a specific domain of statistics, together with the corresponding production system. For each domain, the whole production process from survey design to dissemination, takes place independently of other domains, and each has its own data suppliers and user groups: Structural Business StatisticsShort Term business StatisticsInformation Societyelaboration statistical outputScience Technology Innovationdata integrationSBSSBS….STSSTSsurvey dataadministrative dataISISSTISTII/OI/OBusinessRegister
9Data Warehousing as a Single Coherent System In a NSI, a single coherent Data Warehousing System (DWSys) is finalized to improve the production efficiency and to create the statistical products, in a full integrated way.From this view, the DWSys becomes the “effective” Information System of the full statistical production line. Then, the DWSys should be used to refer to the interaction between: People, Business Processes, Data and Technology.The Statistical Data Warehouse (SDW) then can be seen as a central statistical data store, regardless of the data’s source, for managing all available data of interest, improving the NSI’s ability to:(re)use data to create new data/new outputs;perform reporting;execute analysis;produce the necessary information.
10DWSys Architectural description A DWSys Architecture (DWA) for statistics is a rigorous description of the structure of the NSI production, which comprises DWSys components (business entities or sub-process), the externally visible properties of those components, and the relationships (e.g. the behavior) between them.The DWA should be a framework for a NSI which defines how to organize the DWSys:provide the mechanisms for communicating information about the relationships that are important in the architectureprovide the discipline to gather and organize the data and construct the views in a way that helps ensure integrity, accuracy and completenesssupport the application of method and use of tools
11Layers of the enterprise architecture In the context of the creation of enterprise architecture it is common, to recognize four types of architecture, each corresponding to its particular architectural domain.
12DWA – Business DomainTo provide a DWA as detailed as possible, in the context of statistics production, we could articulate the business domain in four functional layers:data source layer,integration layer,interpretation and data analysis layer,access layer.Each layer has its data domain structure:operational data, for data warehousingmeta data, the description data of the SDW, usually used to manage, describe and monitor the information systems.
13DWA layered business architecture SOURCEINTEGRATIONINTERPRETATION& DATA ANALYSISACCESSREGULATIONSSTAGINGAREAPRIMARYDATADATAMARTRESOURCESDISSEMINATIONSTATISTICALSURVEYS 1DATAMARTSURVEYS nBUSINESSREGISTERDSSADMIN DATA 1DATAMARTADMIN DATA nMISMETA DATA MANAGEMENT
14DWA - functional Layer Source Database Layer: This level is responsible for, physically or virtually, storing the data from internal (surveys) or external (archives) sources for statistical purpose.Typical data sources, in the context of business statistics, are data from :specific surveys, like STS, ICT, CIS, SBS,Customs Agency,Revenue Agency,Chambers of Commerce,National Social Security Institute.
15DWA - functional Layer Integration layer: It is used for all integration and reconciliation activities of data sources. Into this layer we have the set of applications that perform the main ETL, which manages:inconsistent coding for the same object, the consistency is obtained by coding defined by the data warehouse;adjustment of the different units of measurement and inconsistent formats;alignment of inconsistent labels, same object named differently. Usually the data are identified according to the definition contained in the metadata of the system.incomplete or incorrect data; in this case operation may require human intervention to resolve issues not predictable a priori. data linking, in which different sources enable the creation of extended, or new, units of analysis.
16DWA - functional Layer Interpretation and data analysis layer: The basic functions performed at this level are advanced analysis and interpretation of data-elaborations, both based on statistical algorithms. Here “statistical expert users” operate to produce strategic value information, working with the maximum granularity data. Only a reduced number of users are allowed to access the data, in order to prevent lack of servers performance.This strategy of “process of information delivery”, where the demand for new statistical information does not involve the construction of new statistical production lines, but rather the creation of other data marts. Results of these activities are unplanned aggregate data for the next access layer or to develop software rules for next iteration, through data marts, regarded as subsets of the DW, usually oriented to a specific business line or team.
17DWA - functional Layer Access Layer: It is the layer for the final presentation of the information sought, addressed to a wide typology of users, not necessarily expert on business statistics, or informatics instruments. They are:Specialized Business Intelligence tools: in this extensive category, in terms of solutions on the market, we find tools to build queries, navigational tools (OLAP viewer) including Web browsers;- Graphics and publishing tools: the Business Intelligence tools are able to generate graphs and tables for its users, this solution consists essentially in just a couple of steps to avoid inefficiency.Office Automation tools: this is a reassuring solution for users who come for the first time to the data warehouse context, as they are not forced to learn new complex instruments. The problem is that this solution while adequate with regard to productivity and efficiency, is very restrictive in the use of the data warehouse, since these instruments, have significant architectural and functional limitations;
18DWA – Modeling the Business Domain The designer's view of business is also known as the analytical view and there are various standards for modeling this view. One mostly commonly used modeling standard is the Generic Statistical Business Process Model (GSBPM).The GSBPM definition by UNECE is (vers.4):“The original intention was for the GSBPM to provide a basis for statistical organizations to agree on standard terminology to aid their discussions on developing statistical metadata systems and processes. The GSBPM should therefore be seen as a flexible tool to describe and define the set of business processes needed to produce official statistics”.So, in order to define a general and comprehensive architecture for statistical production, it may be useful to identify and locate the different phases of a generic statistic production process on the different DWA’s functional levels.
20DWA - Mapping the GSBPM on DWA The analysis of sub-processes locations on a SDW architecture is graphically represented in the next slides, with: SDW functional layers on the horizontal axis and the nine GSBPM phases on the vertical axis. Each element inside the graph is a sub-process, we will consider from the 4td to the 7td GSBPM phases.That is only an example of Model Processing. Each case must be validated and discussed on the different operational context this is just a basis for setting and starting the modelling work for the next two year of the ess-net.In the context, each sub-process must be regarded from either a:methodological,planning,technological,operational,point of view. Blank sub-processes are related to methodological, or planning, metadata definitions, meanwhile brown sub-processes are related to operational, or technological, function for data elaboration.
21Designer's view - Mapping the GSBPM on DWA Sub-Process of the GSBPM allocated on the functional layers of the DWA.Interpretation and analysis LayerSource LayerIntegration LayerAccess Layer7Disseminate7.1-update output systems7.2-produce dissemination7.3-manage release of dissemination products7.4-promote dissemination7.5-manage user support6Analyze6.1-prepare draft output6.4-apply disclosure control6.3-scrutinize and explain6.5-finalize outputs6.2-validate outputs
22Designer's view - Mapping the GSBPM on DWA Sub-Process of the GSBPM allocated on the functional layers of the DWA.Interpretation and analysis LayerSource LayerIntegration LayerAccess Layer5Process5.1-integrate data5.2-classify & code5.3-review, validate & edit5.4-impute5.6-calculate weights5.7-calculate aggregate5.5-derive new variables and statistical units5.8-finalize data files4Collect4.4-finalize collection4.1-select sample4.2-set up collection4.3-run collection
23Designer's view – Modeling the Data Domain Graphic scheme of layered architecture with a focus on “statistical data”:
24SDA – Modeling the Meta Data Domain Our purpose is to refer to an IT infrastructure of SDW, so we should consider only structured metadata articulated as:Structural Metadata (SM), they are used for description, identification and retrieval of statistical and quality information. Moreover they could link the various different components of the SDW;Process Metadata (PM), they are used to store the data usage and maintenance of process administration, as well as the proper information for automatic execution of work flows or management systems.Both of them can be Active, when they enables operational use, manual or automated, for one or more processes, or Passive in all other uses.
25Designer's view - Modeling the Meta Data Domain Graphic scheme of layered architecture with a focus on “meta data”:
26ConclusionWe have contextualized the statistical production in a Data Warehousing Architecture.So, we have introduced a general Enterprise Architecture vision for a SDW production system.We have showed as the GSBPM representation can be used for modelling the business domain of the SDW layered architecture, for a complete operational view for the deploy of statistical production cases.Finally, we have showed the corresponding four level data-domain of the architecture for a Statistical Data Warehouse.