Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007.

Similar presentations


Presentation on theme: "Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007."— Presentation transcript:

1 Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

2 What is Data?  Data is a defined, measured quantity  Types of statistical data ƒ Raw data ƒ Microdata ƒ Macrodata  Owners convert data from one type to another by cleaning, editing, imputing and aggregating during the data processing cycle

3 Raw Data  Data as collected from respondent ƒ It may be:-  incomplete  inconsistent ƒ It may still require:-  cleaning  imputation  follow up with respondent

4 Microdata  raw data with initial problems removed  data coded to standard classifications  may still contain identification of respondent

5 Macrodata  Data resulting from the aggregation of microdata  May include new data items:- ƒ totals ƒ averages ƒ percentages ƒ seasonally adjusted/trend data ƒ chain volume indices

6  Typically publishable data ƒ does not contain any respondent identification ƒ confidentialised Macrodata

7 Some Macrodata 0.6 1.0 1.7 but what does it mean?

8 What is Metadata? Metadata can be defined simply as data about data - Bo Sundgren 1973

9 What is Metadata?  Data that describes ƒ statistical data ƒ describes processes ƒ describes resources and tools used in statistics production  Helps people interpret data  Directs systems to process data

10 Some Macrodata with Metadata 0.6 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Trend, Australia 1.0 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia 1.7 Terms of Trade %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia

11 Some Macrodata with Metadata

12 How is metadata used?  tool for comprehension and understanding ƒ provides meaning for numbers  tool for interpretation, facilitate acquisition of new knowledge  help find data and determine its fitness for use  help develop new and improved processes

13 Types of Metadata  Passive ƒ documentation  Active ƒ used by systems to define the processing rules to produce outputs ƒ can be re-used by several systems

14 Metadata - applying context to data  Describes attributes of data  Can describe:- ƒ footnotes ƒ Units ƒ Scale/precision ƒ Publication, products ƒ Data users / suppliers ƒ Collection concepts, sources and methods ƒ Form definitions and question texts ƒ Data Item definitions ƒ Quality

15 Metadata - applying context to data (cont)  Can describe:- ƒ Classifications ƒ processing rules  systems  programs  databases  processes  flows  services  interfaces

16 When Collected Units Who provided? Concept / meaning Collections Allowable values Who owns the definition Dataitem Dataitem Metadata Time Period Dataitem Metadata

17 Jan 2004 Years Mark Viney Age (of person) Employment, Health 1 - 99 Australian Bureau of Statistics Age Dataitem Metadata 2003/2004 Dataitem Metadata (example)

18 Question Modules Topics Collection Instruments Populations Data Item Definitions Collections Classific- ations Products Datasets Macrodata & Annotations Data Items Dataset Metadata

19 Dataset Metadata (example) Approved Building Jobs (from BAPS) 8752.0 8752.1 etc. Dwellings Housing Area (SLA+) Type of building Type of work Excludes any existing floor area or any part of building not bounded by walls Form (e.g. BACS4) Floor area created by the job (Square metres) Building Activity Collection Floor area commenced during quarter 2344, 17, 5, 165, 360, 165, 162.47 n.a., n.p. Building Activity: Number, Value by State by...

20 Metadata Standards  ISO 11179  Dublin Core  SDMX

21 ISO 11179  Standard structure of metadata repository  Makes metadata accessible, visible and searchable  Provides understanding and reuse of data elements and definitions  System interoperability www.iso.org www.metadata-standards.org/11179

22 SDMX (Statistical Data and Metadata Exchange)  XML based  model to facilitate the exchange of statistical data and metadata ƒ data combined with metadata  Data Cubes / Timeseries www.sdmx.org

23 Dublin Core www.dublincore.org  Developing metadata standards for discovery across domains  Defining frameworks for the interoperation of metadata sets

24 XBRL - eXtensible Business Reporting Language  XML based  used for reporting of business based data  Standard Business Reporting ƒ possible to produce respondent information direct from business software  reduced provider burden  more standard and consistent reporting from providers www.xbrl.org

25 What Metadata helps us achieve  Enforcement of standards to strategic inputs and outputs  Encourage planning and management of statistical activities  Reuse ƒ single source of concept ƒ reduced need to reinvent and manage ƒ reduced costs

26 What Metadata helps us achieve (continued)  Quality ƒ consistent usage ƒ common dialogue ƒ improved understanding  Flexibility and Productivity  Knowledge Management ƒ consistency ƒ comparability

27 Combining Data and Metadata Select CODE,LABEL_SEX from CL_SEX; CODE LABEL_SEX ******* ************** 10 Males 20 Females 30 Persons BASE TOTAL ******* 10 30 20 30

28 Combining Data and Metadata Select CODE,LABEL_STATE from CL_STATE; CODE LABEL_STATE ******* ************** 1 New South Wales 2 Victoria 3 Queensland 4 South Australia 5 Western Australia 6 Tasmania 7 Northern Territory 8 Australian Capital Territory 0 Australia BASE TOTAL ******* 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0

29 Combining Data and Metadata Select * from MD_LABOUR; CODE_SEX CODE_STATE EMPLOYMENT_RATE *************** ******************* *************************** 10 6 77.3 20 6 72.1 30 6 74.0

30 Combining Data and Metadata SELECT LABEL_SEX,LABEL_STATE,EMPLOYEMNT_RATE FROM CL_SEX,CL_STATE,MD_LABOUR WHERE MD_LABOUR.CODE_SEX = CL_SEX.CODE AND MD_LABOUR.CODE_STATE = CL_STATE.CODE; LABEL_SEX LABEL_STATE EMPLOYMENT_RATE *************** ******************* *************************** MalesTasmania77.3 FemalesTasmania72.1 PersonsTasmania74.0

31 Using Metadata 10000Total income 2000Other Income 260Income from hiring of equipment 270Cartage and setup 1000Hire Services 140Other construction equipment 10Compaction equipment 20Cranes 30Earthmoving equipment 180Other income from hire services 60Event/exhibition goods and equipment 70Transport equipment

32 Using Metadata CODE LABEL_STATE ******* ****************** 10Compaction equipment 20Cranes 30Earthmoving equipment 60Event/exhibition goods and equipment 70Transport equipment 140Other construction equipment 180Other income from hire services 260Income from hiring of equipment 270Cartage and setup 1000Hire services 2000Other income 10000Total income BASE,DETAILED,SUBTOTAL,TOTAL 10 140 1000 10000 20 140 1000 10000 30 140 1000 10000 60 180 1000 10000 70 180 1000 10000 260 260 2000 10000 270 270 2000 10000

33 Metadata Driven Systems  These systems use metadata to direct and assist their functions ƒ Active Metadata  In general, this creates a huge advantage and level of flexibility over systems that do not do this.  The metadata may also be external to the system and used for other purposes and systems.

34 Reuse across systems Metadata

35 Reuse across systems  Keep one copy of metadata ƒ reduces confusion and ambiguity ƒ reduces opportunities to get it wrong ƒ reduces maintenance ƒ reduces complexity to end user

36  invest in metadata and integrated metadata driven systems rather than point solutions  costs will be repaid many times over  avoid duplication as much as possible ƒor automate duplication to retain consistency and integrity Key points

37 Questions?


Download ppt "Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007."

Similar presentations


Ads by Google