Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007.

Slides:



Advertisements
Similar presentations
3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:
Advertisements

The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Status on the Mapping of Metadata Standards
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
Metadata to Support the Survey Life Cycle Alice Born, Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Geneva,
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Information Infrastructure: Foundations for ABS Transformation Stuart Girvan, Australian Bureau of Statistics MSIS Paris, April 2013.
1 Work session convened by the Friends of the Chair Group on Integrated Economic Statistics Bern, 6-8 June 2007 Session 3(c) DISSEMINATION STANDARDS (DATA.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
ISO as the metadata standard for Statistics South Africa
Future of MDR - ISO/IEC Metadata Registries (MDR) Larry Fitzwater, SC 32 WG 2 Convener Computer Scientist U.S. Environmental Protection Agency May.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Using ISO/IEC to Help with Metadata Management Problems Graeme Oakley Australian Bureau of Statistics.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
Judy Lee Enterprise Statistics Division Statistics Canada I 1 Developing Metadata Standards in an Integration Project at Statistics Canada United Nations.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
Dissemination to support Research & Analysis John Cornish.
NSLA Members ACT Library and Information Service National Library of Australia National Library of New Zealand Northern Territory Library State Library.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
SDMX and DDI working together Technical workshop, Luxembourg, June 2013 Use cases for DDI and SDMX.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Statistics Portugal/ Metadata Unit Monica Isfan « Joint UNECE/ EUROSTAT/ OECD Work Session on Statistical Metadata.
Metadata Architecture at StatCan MSIS 2008 Luxembourg, April 7-9, 2008 Karen Doherty Director General Informatics Branch Statistics Canada.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
ISO/IEC : Framework for a Metadata Registry By Daniel W. Gillman Bureau of Labor Statistics USA.
Statistics New Zealand’s End-to-End Metadata Life-Cycle ”Creating a New Business Model for a National Statistical Office if the 21 st Century” Gary Dunnet.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
ESSnet on microdata linking and data warehousing in statistical production: Metadata Quality in the Statistical Data Warehouse.
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
Implementation Experiences METIS – April 2006 Russell Penlington & Lars Thygesen - OECD v 1.0.
Session 1 4 June 2007 Mark Viney ICT Technologies.
STAND-PREP January 2011 Paris 1 Standardisation opportunities Wim Kloek Eurostat.
Editing of linked micro files for statistics and research.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
SDMX IT Tools Introduction
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
ABS Statistical Databases Session 6 Mark Viney Australian Bureau of Statistics 6 June 2007.
Joseph Lukhwareni Statistics South Africa Reengineering projects focusing on metadata and the statistical cycle Statistics South Africa, South Africa 3-5.
Eurostat 1.SDMX: Background and purpose 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Statistical Metadata Extensions to the X3.285 Metamodel By Daniel W. Gillman Chairman, NCITS/L8 U.S. Bureau of the Census.
GSIM, DDI & Standards- based Modernisation of Official Statistics Workshop – DDI Lifecycle: Looking Forward October 2012.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
Developments in the estimation of the value of human capital for Australia Presented by Hui Wei Australian Bureau of Statistics Australian Bureau of Statistics.
By Jeremy Burdette & Daniel Gottlieb. It is an architecture It is not a technology May not fit all businesses “Service” doesn’t mean Web Service It is.
Metadata requirements for archiving structured data Alice Born Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (9-11 April.
Metadata models to support the statistical cycle: IMDB
Interoperable data formats: SDMX
Tools of Software Development
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Metadata Framework as the basis for Metadata-driven Architecture
Session 2: Metadata and Catalogues
Max Booleman Statistics Netherlands
Metadata The metadata contains
Presentation to SISAI Luxembourg, 12 June 2012
The role of metadata in census data dissemination
1. SDMX: Background and purpose
Introduction to reference metadata and quality reporting
The Role of Metadata in Census Data Dissemination
Prevalent Dialysis and Transplants Australia (at 31st December) Number of Patients
Palestinian Central Bureau of Statistics
Presentation transcript:

Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

What is Data?  Data is a defined, measured quantity  Types of statistical data ƒ Raw data ƒ Microdata ƒ Macrodata  Owners convert data from one type to another by cleaning, editing, imputing and aggregating during the data processing cycle

Raw Data  Data as collected from respondent ƒ It may be:-  incomplete  inconsistent ƒ It may still require:-  cleaning  imputation  follow up with respondent

Microdata  raw data with initial problems removed  data coded to standard classifications  may still contain identification of respondent

Macrodata  Data resulting from the aggregation of microdata  May include new data items:- ƒ totals ƒ averages ƒ percentages ƒ seasonally adjusted/trend data ƒ chain volume indices

 Typically publishable data ƒ does not contain any respondent identification ƒ confidentialised Macrodata

Some Macrodata but what does it mean?

What is Metadata? Metadata can be defined simply as data about data - Bo Sundgren 1973

What is Metadata?  Data that describes ƒ statistical data ƒ describes processes ƒ describes resources and tools used in statistics production  Helps people interpret data  Directs systems to process data

Some Macrodata with Metadata 0.6 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Trend, Australia 1.0 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia 1.7 Terms of Trade %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia

Some Macrodata with Metadata

How is metadata used?  tool for comprehension and understanding ƒ provides meaning for numbers  tool for interpretation, facilitate acquisition of new knowledge  help find data and determine its fitness for use  help develop new and improved processes

Types of Metadata  Passive ƒ documentation  Active ƒ used by systems to define the processing rules to produce outputs ƒ can be re-used by several systems

Metadata - applying context to data  Describes attributes of data  Can describe:- ƒ footnotes ƒ Units ƒ Scale/precision ƒ Publication, products ƒ Data users / suppliers ƒ Collection concepts, sources and methods ƒ Form definitions and question texts ƒ Data Item definitions ƒ Quality

Metadata - applying context to data (cont)  Can describe:- ƒ Classifications ƒ processing rules  systems  programs  databases  processes  flows  services  interfaces

When Collected Units Who provided? Concept / meaning Collections Allowable values Who owns the definition Dataitem Dataitem Metadata Time Period Dataitem Metadata

Jan 2004 Years Mark Viney Age (of person) Employment, Health Australian Bureau of Statistics Age Dataitem Metadata 2003/2004 Dataitem Metadata (example)

Question Modules Topics Collection Instruments Populations Data Item Definitions Collections Classific- ations Products Datasets Macrodata & Annotations Data Items Dataset Metadata

Dataset Metadata (example) Approved Building Jobs (from BAPS) etc. Dwellings Housing Area (SLA+) Type of building Type of work Excludes any existing floor area or any part of building not bounded by walls Form (e.g. BACS4) Floor area created by the job (Square metres) Building Activity Collection Floor area commenced during quarter 2344, 17, 5, 165, 360, 165, n.a., n.p. Building Activity: Number, Value by State by...

Metadata Standards  ISO  Dublin Core  SDMX

ISO  Standard structure of metadata repository  Makes metadata accessible, visible and searchable  Provides understanding and reuse of data elements and definitions  System interoperability

SDMX (Statistical Data and Metadata Exchange)  XML based  model to facilitate the exchange of statistical data and metadata ƒ data combined with metadata  Data Cubes / Timeseries

Dublin Core  Developing metadata standards for discovery across domains  Defining frameworks for the interoperation of metadata sets

XBRL - eXtensible Business Reporting Language  XML based  used for reporting of business based data  Standard Business Reporting ƒ possible to produce respondent information direct from business software  reduced provider burden  more standard and consistent reporting from providers

What Metadata helps us achieve  Enforcement of standards to strategic inputs and outputs  Encourage planning and management of statistical activities  Reuse ƒ single source of concept ƒ reduced need to reinvent and manage ƒ reduced costs

What Metadata helps us achieve (continued)  Quality ƒ consistent usage ƒ common dialogue ƒ improved understanding  Flexibility and Productivity  Knowledge Management ƒ consistency ƒ comparability

Combining Data and Metadata Select CODE,LABEL_SEX from CL_SEX; CODE LABEL_SEX ******* ************** 10 Males 20 Females 30 Persons BASE TOTAL *******

Combining Data and Metadata Select CODE,LABEL_STATE from CL_STATE; CODE LABEL_STATE ******* ************** 1 New South Wales 2 Victoria 3 Queensland 4 South Australia 5 Western Australia 6 Tasmania 7 Northern Territory 8 Australian Capital Territory 0 Australia BASE TOTAL *******

Combining Data and Metadata Select * from MD_LABOUR; CODE_SEX CODE_STATE EMPLOYMENT_RATE *************** ******************* ***************************

Combining Data and Metadata SELECT LABEL_SEX,LABEL_STATE,EMPLOYEMNT_RATE FROM CL_SEX,CL_STATE,MD_LABOUR WHERE MD_LABOUR.CODE_SEX = CL_SEX.CODE AND MD_LABOUR.CODE_STATE = CL_STATE.CODE; LABEL_SEX LABEL_STATE EMPLOYMENT_RATE *************** ******************* *************************** MalesTasmania77.3 FemalesTasmania72.1 PersonsTasmania74.0

Using Metadata 10000Total income 2000Other Income 260Income from hiring of equipment 270Cartage and setup 1000Hire Services 140Other construction equipment 10Compaction equipment 20Cranes 30Earthmoving equipment 180Other income from hire services 60Event/exhibition goods and equipment 70Transport equipment

Using Metadata CODE LABEL_STATE ******* ****************** 10Compaction equipment 20Cranes 30Earthmoving equipment 60Event/exhibition goods and equipment 70Transport equipment 140Other construction equipment 180Other income from hire services 260Income from hiring of equipment 270Cartage and setup 1000Hire services 2000Other income 10000Total income BASE,DETAILED,SUBTOTAL,TOTAL

Metadata Driven Systems  These systems use metadata to direct and assist their functions ƒ Active Metadata  In general, this creates a huge advantage and level of flexibility over systems that do not do this.  The metadata may also be external to the system and used for other purposes and systems.

Reuse across systems Metadata

Reuse across systems  Keep one copy of metadata ƒ reduces confusion and ambiguity ƒ reduces opportunities to get it wrong ƒ reduces maintenance ƒ reduces complexity to end user

 invest in metadata and integrated metadata driven systems rather than point solutions  costs will be repaid many times over  avoid duplication as much as possible ƒor automate duplication to retain consistency and integrity Key points

Questions?