Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Slides:



Advertisements
Similar presentations
Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation.
Advertisements

OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Workshop on Metadata Standards and Best Practices November th, 2007 Session 2 Metadata specifications for socio-economic science and supporting initiatives.
11th Annual Federal CASIC Workshops Washington, DC, March 6 - 8, 2007 Session WP4 Metadata challenges and solutions for socio-economic data Pascal Heus.
10th Annual Open Forum for Metadata Registries New York, NY, July 9-11, 2007 Track 3 – Future Directions Metadata challenges and solutions for socio-economic.
3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Welcome to ODaF Europe 2009 International Data Service Center Institute for the Study of Labor Bonn, Germany April 2nd-3rd 2009.
Status on the Mapping of Metadata Standards
ODaF Europe 2008 Colchester, UK, April 14-15, 2008 Metadata in social science and the Open Data Foundation Pascal Heus Open Data Foundation
ODaF Europe 2009 Virtual Research and Collaborative Center Pascal Heus, Open Data Foundation Tim Mulcahy, National Opinion Research Center
13 September 2012 SDMX Technical Working Group1 Report of the SDMX Technical Standards Working Group SDMX Expert Group Meeting, Paris, September 2012.
National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
International Household Survey Network (IHSN) Microdata Management Toolkit Trevor Croft MICS3 Data Archiving, Dissemination and Further.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Microdata dissemination best practice Draft note prepared by the World Bank Development Data Group for the CCSA twenty-second session, Ankara, September.
Reproductions of this material, or any parts of it, should refer to the IMF Statistics Department as the source. IMF Statistics Department Louis Marc Ducharme.
Introduction to SDMX Seminar Eurostat/ECLAC 02 October 2012 August Götzfried Head of Unit, Eurostat B5 Management of statistical data and metadata.
1 CES IASSIST 2002, June 2002 University of Connecticut MetaNet: Standardising Statistical Metadata Methodology Karen Brannen University of Edinburgh,
1 ISO/RTO Council Wholesale Demand Response Projects & OpenADR David Forfia.
Background Data validation, a critical issue for the E.S.S.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Overview of SDMX: Statistical Data and Metadata eXchange Technical and Content Standards for Statistical Data Ann McPhail, Division Chief Statistics Department,
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
AHIMA & PHDSC A Transformational Alliance. CONFIDENTIAL AHIMA Background  Professional association founded in 1928 as the Association of Record Librarians.
Restricted 13/14 September Building a Data Portal with SDMX The BIS SDMX Sandbox exercise 1 Gabriele Becker, Massimo Bruschi Bank for International.
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University.
Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Statistics Sweden Results from operations in 2006: 146 publications 356 press releases commissions 3,7 million visitors at
SDMX Overview NSF Accounting Interoperability Workshop May Washington DC Arofan Gregory Rene Piche
B A C K G R O U N D B R I E F I N G A N D N E X T S T E P S METIS Geneva, February 2004 Statistical Data and Metadata Exchange Initiative.
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
Secure Epidemiology Research Platform (SERPent) Kick Start Meeting - April 15 th, 2010 Pascal Heus
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
SDMX IT Tools Introduction
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
Open Geospatial Consortium Overview and why we are adopting the standards.
The Data Documentation Initiative (DDI) Fostering Community Engagement and Adoption Breakout 9 RDA Sixth Plenary, Paris Mary Vardigan, ICPSR, University.
Aim: “to support the enhancement and implementation of the standards needed for the modernisation of statistical production and services”
Eurostat 1.SDMX: Background and purpose 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
1 Joint UNECE/EUROSTAT/OECD METIS Work Session (Geneva, March 2010) The On-Going Review of the SDMX Technical Specifications Marco Pellegrino, Håkan.
United Nations Economic Commission for Europe Statistical Division GSBPM and Other Standards Steven Vale UNECE
1 Geospatial Standards for Canada Proposed blueprint for Jean Brodeur and Cindy Mitchell.
ISWG / SIF / GEOSS OOS - August, 2008 GEOSS Interoperability Steven F. Browdy (ISWG, SIF, SCC)
United Nations Economic Commission for Europe Statistical Division Standards-based Modernisation Steven Vale UNECE
International Planetary Data Alliance Registry Project Update September 16, 2011.
Building a Data Portal with SDMX
Summit 2017 Breakout Group 2: Data Management (DM)
Interoperable data formats: SDMX
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
GSBPM, GSIM, and CSPA.
SDMX Visualisation.
IASSIST 2007 Montreal, May , 2007 Session A2 Open Data and the Common Good Walking the Wire: How Technology helps us Achieve the Correct Balance.
2. An overview of SDMX (What is SDMX? Part I)
Statistical Information Technology
SDMX : General introduction H. Linden, Eurostat, Unit B5
The role of metadata in census data dissemination
Item 7.11 SDMX Progress report
1. SDMX: Background and purpose
Palestinian Central Bureau of Statistics
Presentation transcript:

Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation

Something Really Amazing Spaceships arent that amazing… Aliens arent that amazing… Mobile telephones arent that amazing… These devices have access to the complete set of human (well, Federation) knowledge, via ships computer - Thats AMAZING! An Epic Feat of Data Standardization! Tasers arent that amazing…

A Big Idea It might seem too outrageous to imagine that every data source could be accessible and usable via a global network, but… –Consider all the domain grids which are emerging –Consider the number of modern technologies for leveraging data across networks –Consider the tools we have for solving problems of semantic interoperability Maybe Star Trek was only a few decades ahead of its time!

Something Missing… Technology alone cannot solve this problem For centuries, scientists, librarians, and archivists have worked to perfect taxonomies and classifications for organizing and accessing human knowledge –Technologists cant replace the disciplines which have evolved from this work with technology alone –They can only automate it Having an ontology doesnt mean you have an agreed, tried, and workable standard classification system! –A thousand little ontologies still produce chaos!

Why Now? The idea of a global data infrastructure is practical today because… –We have good, standards-based, networked technology –We have a highly sophisticated population of archivists and librarians who understand the challenges of large-scale classification, for all types of media –We have an emerging culture of data producers and users who are beginning to understand the potential offered by modern technology

The Open Data Movement From Wikipedia: Open Data is a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from copyright, patents or other mechanisms of control. It has a similar ethos to a number of other "Open" movements and communities such as Open Source and Open Access.

The Open Data Foundation (ODaF) Although we respect this traditional goal of the Open Data movement, we feel that the technology issues, as opposed to the legal ones, have a different focus: –Much public data is inaccessible or unusable –Confidential data is less accessible than it could be –The collection and publication of some critical data is lacking, notably in the Developing World It is not enough to put the rights to data into the public domain – it must also be practically accessible to all potential users

What Do We Mean by Data? Official statistics collected by government agencies and international organizations –Usually aggregates and time-series data –Covers a huge range of social, scientific, and economic topics Numeric research data supporting social sciences and hard sciences –Often lower-level microdata –May be gathered by survey or sourced from registers Qualitative data used in social sciences research –Not research papers, but source data (eg, interviews)

ODaFs Mission To bring together individuals from the statistics community, the research community, and the technology standards community To promote the creation of a global infrastructure for data and metadata by providing open-source tools and supporting the adoption of a coordinated set of open technology standards To promote the creation and use of knowledge, and fact-based decision-making, through improved access to data and metadata

ODaF - Timeline The idea started at IASSIST 2006 in Edinburgh Incorporated in 2006 as a US scientific non-profit First face-to-face meeting in Washington DC in December 2006 at the National Opinion Research Center (NORC) September 2007: next face-to-face meeting in St. Helena, California Next face-to-face meeting: NORC in DC, December 2007, followed by a European meeting (UK, Netherlands, or Germany) in early 2008 NOTE: We are a virtual organization – we dont rely on face-to-face meetings for conducting work (Thanks, Skype!)

ODaF - Directors –Bob Glushko – head of the UC Berkeley Center for Document Engineering and member of OASIS Board of Directors –Julia Lane – Vice President, NORC and world-class expert in data confidentiality issues –Ernie Boyko, former President of IASSIST – Rune Gloersen – head of IT at Statistics Norway

ODaF - Executive Managers Arofan Gregory – background in SGML/XML, technology standards (notably ebXML, UBL, UN/CEFACT, ISO TC154, DDI, and SDMX) Pascal Heus - lead developer for World Bank and International Household Survey Network, much experience with field-work in Africa, DDI implementor Chris Nelson – veteran OMGer (CWM), worked with many technology standards (UN/EDIFACT, GESMES, ebXML, SDMX, DDI), consummate UML modeler Jostein Ryssevik – former CEO of Nesstar North America, now with Ideas2Evidence, associated with Gallup Europe; longtime DDI implementor

ODaF - Advisors Sandra Cannon - Board of Governors of the Federal Reserve System Gilles Collette- Visual Communications, Pan- American Health Organization (WHO) Daniel Gillman - US Bureau of Labor Statistics Eduardo Gutentag – Chair, OASIS Board of Directors Paul Johanis - Statistics Canada Graeme Oakley - Australian Bureau of Statistics Dr. Andrew Nelson - Joint Research Centre of the European Commission

ODaF – Advisors (cont.) Ken Miller- UK Data Archive / Economic and Social Data Service Duane Nickull- Chair, OASIS SOA Reference Architecture TC Juraj Riecan - United Nations Economic Commission for Europe (UNECE) Gerard Salou - European Central Bank Professor Bo Sundgren, Ph.D - Statistics Sweden Wendy Thomas - Minnesota Population Center, University of Minnesota Wendy Watkins - Data Centre Coordinator, Maps, Data and Government Information Centre, Carleton University Library

ODaF - Organization We are project-oriented: –Any member can participate in projects May be paid consultants for specific work, or volunteers –Project proposal is put before Directors by Management team in consultation with Board of Advisors for approval –Work is conducted by specified project team, using specified resources –All Directors, Managers, and Advisors are volunteers Work is focused on coordination of projects, with resources coming from other participating organizations

The Problem Space The flows of data can be seen as forming a type of supply chain –Collected data are aggregated and reported/disseminated to other organizations –The points where data are exchanged can be problematic: Loss of metadata No automated integration into receiving systems Time- and resource-intensive This exchange of data and metadata must be managed in an efficient, standard fashion if we are to build a global infrastructure

International Organisations Regional Organisations accounts statistics Banks, Corporates Individual Households trans- actions accounts National Statistical Organisations accounts statistics Countries Internet, Search, Navigation

Data Lifecycle Model Within each level of the information chain, we see a process: –Data sourcing or collection –Data processing (re-coding, harmonization, aggregation) –Data dissemination and archiving –Data reporting and re-purposing Throughout this cycle, each step generates important metadata which can be captured to provide better downstream processing and understanding of the data Today, this metadata is often lost –Between steps of the lifecycle –When the final data product is exchanged in the information chain

Data Lifecycle Model

An Observation on Organizations Governmental, supra-governmental, and research organizations which produce data have as a primary mission the collection of data –To support policy making –To support research –To support regulatory activities They do not have a primary mission to focus on the exchange of data with other organizations –This is often perceived as a burden rather than a part of the primary mission of the organization They are often not well-skilled in the latest technology for data exchange and interoperability Standards organizations tend to be too busy promoting their own standards to be worried about how users might combine them with other standards in implementations

Issues Issues with public data: –Public data which is not released: "Users won't understand it - Too little metadata! –Public data which is unusable: formats are bad, too little metadata about formats, terminology, methodology, coding, and concepts –Public data which cannot be accessed because its location/existence is not known –Public data which loses value because it cannot be published and accessed in a timely manner

Issues (cont.) Issues with confidential data: –Public data sets derived from confidential data have been damaged by anonymization –Confidential data which are not seen because access produces unacceptable disclosure risk There are secure Research Data Centers for allowing access to confidential data to qualified researchers –These are not as accessible or as open as they could be, due to their physical nature and the fact that they generally are not in communication with each other –Better metadata management and shared metadata leads to a better understanding of disclosure risk, and thus improved access for researchers

Note on Data Confidentiality You might think proponents of Open Data would disapprove of confidential data –Response rates are falling for all types of survey data collection due to fears of disclosure –There are many new ways of collecting data about individuals (RFID chips, security cameras, cell phones, etc.) –The standards for data confidentiality are there for a good reason – to protect individuals! We believe that confidential data should be as open as possible and not more!

Issues (cont.) Issues with data in the Developing World: –Absent data due to inefficient or nonexistent data collection/publication –Unsustainable data collection/publication produces insufficient continuity of data Once educated, IT workers get jobs in Europe and America Funding is typically not on-going, but only for a limited period The vast majority of the worlds population is in the Developing World, and the trend is increasing –To understand our world and make good policy, we must support sustainable data collection and publication about this huge segment of the population!

How Can We Solve These Problems? Many of these issues can be solved with modern technology –Better documentation using standard metadata formats –Better mechanisms for data discovery and access between organizations of all types –Better mechanisms for managing semantic interoperability –Free or inexpensive tools for metadata capture and data/metadata exchange –Improved mechanisms for sustainable collection and publication of data in the Developing World

ODaFs Vision A network of standard, federated registries provide the ability to discover data and metadata globally Standard data and metadata formats and models provide the basis for automated use and integration between applications Standard semantic registries and mappings to standard classifications/ontologies allow for semantic interoperability All of these standards would be coordinated to work together predictably in an open architecture Domains are self-governing – each has its own registries, classifications, etc. There must be minimum governance at the center for operation of the entire network. –Interoperability through mapping to the standards-based open architecture

Which Standards? ISO Statistical Data and Metadata Exchange (SDMX) Data Documentation Initiative (DDI) ISO/IEC Metadata Registries ISO Digital Geographic Data Metadata Encoding and Transmission Standard (METS) Extensible Business Reporting Language (XBRL) Many others (SOA, ebXML, Web Services, Semantic Web, Dublin Core)

ISO SDMX Produced by official statistics organizations (BIS, ECB, Eurostat, IMF, OECD, World Bank, UN/SD) Now available as a 2.0 version –Supports all aggregate data & time-series –Supports all types of metadata (structural & reference metadata) –Provides standard registry interfaces for data sourcing and exchange (not specific to SDMX formats) Based on a formal meta-model (similar to OMGs Common Warehouse Metamodel, but more focused) Data and metadata formats and classifications are completely configurable Also provides recommendations for concepts, codes, and classifications for official statistics

Data Documentation Initiative (DDI) Produced by a consortium of members (data archives and libraries, national statistical organizations, universities, etc.) Now in 3.0 candidate version which supports full data lifecycle (release Q1 2008) Fine-grained metadata for describing: –Data collection (surveys, registers, etc.) –Data processing (for recodes, harmonization, data comparison) –Data archiving and dissemination –Data can be stored inline or in native file formats –Supports microdata and n-dimensional cubes Aligned with SDMX, ISO/IEC 11179, METS, ISO 19115, and Dublin Core

ISO/IEC Metadata Registries Model for managing semantics of a data dictionary and the lifecycle of concepts/terms There is a separate ISO specification under development for providing bindings in XML, C, and other languages In widespread use in many other standards, as well as for terminology management within large organizations

ISO Digital Geographic Data Provides the standard metadata model for describing geographies Implemented in several XML standards, including DDI (there is also a standard ISO XML) Well-accepted within the technology community and among communities of use (geographers, etc.)

METS A packaging standard for digital libraries/archives –Pulls together associated sets of files and establishes their relationship to one another Can carry metadata payloads in their native XML namespaces as metadata sections Cooperatively developed with DDI –METS left the description of data to DDI –DDI supports METS for archival packaging

XBRL XML standard from the accounting world for describing business reports Widely used by banking supervisory organizations –Major source of financial statistics Well marketed and widely supported Ongoing alignment project with SDMX

ODaF Vision - Standards Federated Registries (Based on SDMX, ebXML, web services) Aggregated Data/Metadata (SDMX) XBRL Business Reports DDI Microdata Sets ISO Geographies Dublin Core Citations Used in registered References to source data Standard classifications Organized using ISO Semantic definitions METS Packaging

ODaF Activities We are early in our efforts to create such an infrastructure –To establish a sufficient set of well-aligned standards –To build open-source tools to support the use of these standards –To otherwise support the adoption and use of standard models, formats, and registries

ODaF Projects Standards Alignment Project: on-going effort to establish an agreed mapping between the mentioned standards SDMX Registry Hosting: Host SDMX registries on our own servers for those wishing to do prototype implementations DDI Development Support: provide hosting and infrastructure to support the use and development of DDI 3.0 DDI Foundation Tools Program: providing technical coordination and infrastructure for a multi-institution effort to build an Eclipse-based open-source toolkit for working with DDI 3.0, including transforms to/from SAS, SPSS, and STATA SDMX Browser: Developing an open-source tool (using Adobe Flex) for collecting, updating, and viewing statistical data in SDMX format – working in informal collaboration with ECB and OECD

ODaF Project (cont.) DeXtris Browser: beta end-user tool for viewing and searching DDI 1/2.* and 3.0 metadata files – supports version transformations UKDA QuDEX Draft Standard: Working as technical support for UKDA in their development of a standard for qualitative metadata (may become part of DDI) Canadian RDC Network: Providing technical advice to the Canadian RDC network on metadata management and implementation in support of DDI 3.0. NORC Virtual Data Enclave: Working to help develop and deploy the first virtual RDC in the US with data from NIST, others Also involved in proposals to build a European virtual RDC

ODaF Projects (cont.) Have contributed to the creation of training materials and online support for DDI 3.0, for general use White papers: DDI & SDMX (a comparison), guidelines for open-source tools development, others Member, DDI Alliance Sponsored IASSIST 2007 in Montreal (planned also for IASSIST 2008 in Palo Alto, CA)

ODaF - Where We Are Today New organization, lots of interest and support thus far Interesting projects are emerging, some early deliverables have been finished Looking for participation from interested, serious individuals Still at the stage of supporting and promoting a coordinated set of standards

To Learn More… ODaF: SDMX: DDI: ISO/IEC 11179: METS: ISO 19115: c/catalogue_detail.htm?csnumber= c/catalogue_detail.htm?csnumber=26020 XBRL:

Tools and Training For some free SDMX tools, implementation support site, and SDMX and DDI training courses:

Questions?