End-to-End Management of the Statistical Process An Initiative by ABS

End-to-End Management of the Statistical Process An Initiative by ABS
Bryan Fitzpatrick Rapanea Consulting Limited and Australian Bureau of Statistics Work Session on Statistical Metadata (METIS) March 2010, Geneva

The Objectives Business transformation aimed at
reducing cost improving effectiveness and ability to respond A holistic approach to managing and improving the entire statistical life-cycle International collaboration ABS does not want to go it alone aim is for a shared approach sharing of ideas, interfaces, tools but with acceptance of national differences Build on recent progress in international statistical community standards (SDMX, DDI), GSBPM aim is to make them work in practice A new program – IMTP Information Management Transformation Program

End-to-End Management of the Statistical Process
Metadata is always the key to better approaches and process improvements it has been in all previous ABS improvement programs ABS has a long history in trying to manage metadata (with modest successes) Metadata means all the information we use in and around the processes and the data to improve things we need to understand it, rationalise it, share it, and use it to automate and drive processes and make the outputs more integrated and usable Previous improvement programs have generally been much more limited Focused on few areas in a few projects Narrow metadata focus

SDMX and DDI They are useful standards
they are not the focus of ABS interest in the exercise the focus is optimising the statistical processes and improving the results from the processes but we need to describe and manage all aspects the statistical process and that is their target domain they are international standards sponsored and used by the community ABS is part of for purposes that are relevant to IMTP to discuss the issues internally and with other organisations we need models SDMX and DDI are in use, relevant, and fit for purpose IMTP aims to apply these standards (along with some others – ISO 11179, ISO 19115) and make them work build on recent work in the international statistical community

IMTP and Metadata Management
Metadata Management will be a major part of IMTP storing it, rationalising it, making it available for sharing and easy use, presenting it in different ways and integrating with existing stores such as Input Data Warehouse, Data Element Repository, ABS Information Warehouse we talk of a “Metadata Bus” and “Metadata Services” some technical jargon it means the metadata is easily available to all systems running in the ABS environment we are still figuring out precisely what we mean and how it should look we need to get “use cases” – examples of what business areas and their systems need to do with the metadata but the services will deliver various sorts of metadata in XML formats conforming to schemas from DDI and SDMX

IMTP and Metadata Management
IMTP focus will be on metadata that is “actionable” it means we want it in a form that both people and systems can use that can be easily stored and passed around that can be used easily to generate whatever format is required in any particular case including web pages, PDFs, manuals, other human-readable forms SDMX and DDI both represent the metadata in XML Major focus on metadata management version and maintained as in SDMX and DDI “confrontation” across collections and processes aim is consistent, standard, metadata across the organisation and consistent with international use wherever sensible

What sorts of metadata? Current ABS metadata management has many shortcomings much metadata in corporate stores in too many stores, and often documentary rather than actionable often not used to drive systems even where it is available and actionable the systems predated the stores but much metadata is still embedded in individual systems there are cases of good managed shared approaches but often narrowly focused eg around dissemination End-to-end management of the process requires a comprehensive, consistent approach questions, question controls, interviewer instructions coding, editing and derivation metadata data relationship metadata table structures classification evolution and history alternative hierarchies in geography and other classifications …

SDMX and DDI SDMX comes from the international agencies (OECD, IMF, Eurostat, UNSD, World Bank, ECB, BIS) they get aggregate statistical tables from many countries regularly over time they want to automate and manage the process they need standard agreed definitions and classifications, standard agreed table structures, standard agreed formats for both data and metadata They commissioned SDMX in 2002 started a project, gathered use cases, employed consultants produced a standard and presented it to large numbers of international statistical forums started to use it and to pressure NSOs to use it SDMX is pretty good excellent for managing dissemination of statistical data very good tools for very impressive web sites based on data organised in the SDMX model also some good frameworks for managing evolution of classifications a framework for discussing agreements on concepts and classifications Metadata Common Vocabulary, Cross-Domain Concepts, Domain-specific Concepts

SDMX and DDI DDI (Data Documentation Initiative) comes from the data archive organisations across many countries trying to capture and store survey data for future use and to document it so future users can understand it and make sense of it mostly social science collections from researchers funding organisations are requiring such data to be preserved for further use mostly they had to grab data and try to salvage metadata after the event but DDI now aims to capture all metadata “at source” early versions were narrowly focused on an individual data set grew out of their documentation processes latest version (DDI V3) is much more extensive, better organised common analysis/designer support with SDMX an end-to-end model compatible with the Generic Statistical Business Process Model (GSBPM)

DDI Metadata DDI has Survey-level metadata Data Collection Metadata
Citation, Abstract, Purpose, Coverage, Analysis Unit, Embargo, … Data Collection Metadata Methodology, Sampling, Collection strategy Questions, Control constructs, and Interviewer Instructions organised into schemes Processing metadata Coding, Editing, Derivation, Weighting Conceptual metadata Concepts organised into schemes Including links Universes organised into schemes Geography structures and locations organised into schemes

DDI Metadata DDI has (cont) Logical metadata
Categories organised into schemes (categories are labels and descriptions for question responses, eg, Male, Unemployed, Plumber, Australia, ..) Codes organised into schemes and linked to Categories Codes are representations for Categories, eg “M” for Male, “Aus” for Australia) Variables organised into schemes Variables are the places where we hold the codes that correspond to a response to a question Data relationship metadata eg, how Persons are linked to Households and Dwellings NCube schemes descriptions for tables

DDI Metadata DDI has (cont) Physical metadata File instance metadata
record structures and layouts File instance metadata specific data files linked to their record structures Archive metadata archival formats, locations, retention times, etc Places for other stuff not elsewhere described Notes, Other Material References to “Agencies” which own artefacts but no explicit structure to describe them Inheritance and links embedded in most schemes but need to be ferreted out, not necessarily easily usable

SDMX Metadata SDMX has Organisations organised into schemes
Organisations own and manage artefacts, and provide or receive things Concepts organised into schemes| Codelists, including classifications a Codelist combines DDI Categories and Codes Data Structure Definitions (Key Families) a DSD describes a conceptual multi-dimensional cube used in a Data Flow and referenced in Datasets

SDMX Metadata SDMX has Data Flows Categories organised into schemes
described by a DSD, linked to registered data sets, and categorised Categories organised into schemes not the same as a DDI Category provide a basis for indexing and searching data Hierarchical Codelists a misnomer – maps relationships amongst inter-related classifications explicit, actionable representations of relationships Process metadata a Process has steps with descriptions, transition rules, computation information, inputs, outputs all actionable, linked to other SDMX artefacts or to external sources

SDMX Metadata SDMX has Structure Sets Reporting Taxonomies
additional linking of related DSD and Flows Reporting Taxonomies information about assembling reports or publications Reference Metadata, Metadata Structure Definitions, and Metadata Flows additional, probably useful, options for attaching metadata to data Annotations almost everywhere good options for managed, actionable extensions

What sorts of metadata? What are we interested in? Concepts
probably organised into schemes what are the use cases? Classifications broken up into Categories and Codes DDI-style? with links to related classifications SDMX Hierarchical Codelist-style? Questions and related metadata just how should it look? a DDI package but precisely what is useful

What sorts of metadata? What are we interested in?
Survey-level metadata? what are the use cases? Structure Definitions almost certainly, but we need use cases Variable, Relationship, and Record Structure metadata maybe, but we need use cases Processing metadata SDMX Process and/or DDI artefacts

What are the next steps? Basically we need use cases
How do we see our metadata being used? What are trying to support? What can we get from our pilot programs? we need to do our own abstraction from that We can then start to define a provisional set of services with parameters and schemas We can then think about existing sources and demonstration systems We can then think about repositories and stores

Timeframe and Process We are at the start of the process
a project team that is still forming several “satellite” projects small, sometimes significant projects attempting to apply ideas and provide use cases for design Have had substantial training and discussion around application of DDI and SDMX international experts providing training significant numbers of ABS staff involved more to come later this month Not a “big bang” new implementation rather a framework and environment for all new developments with some retro-fitting to existing systems some direct development of key components

International Collaboration
A definite part of the project most national agencies are feeling financial pressures and struggling to build everything themselves Need to discuss how collaboration might proceed some discussions have been held amongst heads of NSOs more planned agreed standards are important enabler need participation of NSOs in evolution of standards what are barriers to collaboration and how might we manage it probably do not want too large a group of collaborators at the start ABS (and others) will continue to report to international forums and meetings managerial and technical important part of fostering the collaboration and finding out what others are doing and getting feedback on our ideas

Questions?

End-to-End Management of the Statistical Process An Initiative by ABS

Similar presentations

Presentation on theme: "End-to-End Management of the Statistical Process An Initiative by ABS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

End-to-End Management of the Statistical Process An Initiative by ABS

Similar presentations

Presentation on theme: "End-to-End Management of the Statistical Process An Initiative by ABS"— Presentation transcript:

Similar presentations

About project

Feedback