Presentation on theme: "ODaF Europe 2008 Colchester, UK, April 14-15, 2008 DDI Landscape Pascal Heus Open Data Foundation"— Presentation transcript:
ODaF Europe 2008 Colchester, UK, April 14-15, 2008 DDI Landscape Pascal Heus Open Data Foundation firstname.lastname@example.org http://www.opendatafoundation.org
Background Concept of DDI and definition of needs grew out of the data archival community Established in 1995 as a grant funded project initiated and organized by ICPSR Members: –Social Science Data Archives (US, Canada, Europe) –Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance –Membership based alliance –Formalized development procedures
http://www.opendatafoundation.org DDI Timeline / Status 2000 – DDI 1.0 –Simple survey –Archival data formats –Microdata only 2003 – DDI 2.0 –Aggregate data (based on matrix structure) –Added geographic material to aid geographic search systems and GIS users 2004 – Acceptance of a new DDI paradigm –Lifecycle model –Shift from the codebook centric / variable centric model to capturing the lifecycle of data –Agreement on expanded areas of coverage 2005 –Presentation of schema structure –Focus on points of metadata creation and reuse 2006 –Presentation of first complete 3.0 model –Internal and public review 2007 –Vote to move to Candidate Version (CR) –Establishment of a set of use cases to test application and implementation –October 3.0 CR2 2008 –February 3.0 CR3 –March 3.0 CR3 update –April 3.0 CR3 final –May: anticipated vote to publish DDI 3.0 at DDI Meeting (after IASSIST) 2009 –DDI 2.2? –DDI 3.1?
http://www.opendatafoundation.org DDI 3.0 and the Survey Life Cycle A survey is not a static process: It dynamically evolved across time and involves many agencies/individuals DDI 2.x is about archiving, DDI 3.0 across the entire life cycle 3.0 focus on metadata reuse (minimizes redundancies/discrepancies, support comparison) Also supports multilingual, grouping, geography, and others 3.0 is extensible
http://www.opendatafoundation.org DDI User Base (1) National Statistical Offices Line Ministries and other governmental agencies Data archives and libraries world-wide Research data centers Health Canada, Statistics Canada, HRSDC Canada Transport for London, Gallup-Europe Etc.
http://www.opendatafoundation.org DDI User Base (2) International Household Survey Network (IHSN) –Major international organizations involved –Coordination of activities –Adopted DDI 1/2.x as standard –Developed the Microdata Management Toolkit and related tools / guidelines –http://www.surveynetwork.orghttp://www.surveynetwork.org Accelerated Data Program (ADP) –World Bank / Paris 21 –Implement IHSN activities in developing countries Task 1. Documentation and dissemination of existing survey microdata. –Has introduced DDI in national statistical agencies in over 50 countries –http://www.surveynetwork.org/adphttp://www.surveynetwork.org/adp
http://www.opendatafoundation.org DDI Alliance Membership based organization –Agencies: ICPSR, World Bank, Open Data Foundation –National data archives: Danish, Finish, Dutch, Norway, Swiss, UK –Germany: Centre for Survey Research and Methodology (ZUMA), German Socio-Economic Panel Study (SOEP), Institute for Study of Labor (IZA), Zentralarchiv fuer Empirische Sozialforschung (University of Koeln) –Universities: Alberta, Berkeley, Guelph, Harvard/MIT, Minnesota, etc. Steering and Expert Committee Meets annually at IASSIST http://www.ddialliance.org
http://www.opendatafoundation.org ICPSR The Interuniversity Consortium for Political and Social Research One of the world's largest archive of digital social science data –Acquire and preserve social science data –Provide open and equitable access to these data –Promote effective data use Home of the DDI Alliance http://www.icpsr.umich.edu
http://www.opendatafoundation.org International Household Survey Network Partnership of international organizations seeking to improve the availability, quality and use of survey data in developing countries Steering Committee: –United Kingdom Department for International Development (DfID), International Labor Organization (ILO), Partnership for Statistics in the 21st Century (PARIS21), United Nations Children Fund (UNICEF), United Nations Statistics Division (UNSD), World Health Organization and the Health Metrics Network (WHO/HMN), World Bank Plays a major role in the adoption of DDI around the globe, active in many developing countries Developer of the Microdata Management Toolkit http://www.surveynetwork.org
http://www.opendatafoundation.org Open Data Foundation US based non-profit organization Adoption of global metadata standards and the development of open-source solutions promoting the use of statistical data Coordination of development efforts Board of directors, advisors and management group Open to individual membership, institutional association is through projects http://www.opendatafoundation.org
Metadata Technology UK based private company Consulting services and development of tools based on open standards and open source Training services, registry services, metadata repositories, hosting Focus on SDMX, DDI and related standards http://www.metadatechnology.com
http://www.opendatafoundation.org IASSIST International Association for Social Science Information Service & Technology IASSIST is an international organization of professionals working in and with information technology and data services to support research and teaching in the social sciences. Individual based membership Primary platform for DDI community Annual conference –2008: Stanford, CA, 2009: Tampere, Finland –DDI Alliance annual meeting http://www.iassistdata.org/
http://www.opendatafoundation.org DDI Foundation Tools Program Initiative aiming at the development of a Foundation Framework and a Toolkit to support the implementation of DDI applications and utilities (open source) MOU established September 2007, 2-year program (renewable on a annual basis afterwards) Canada Research Data Centre Network, Danish Data Archive, DDI Alliance, GESIS-ZUMA, National Opinion Research Center (NORC), Open Data Foundation (ODaF), and the UK Data Archive (UKDA) Web site coming soon
http://www.opendatafoundation.org UKDA Data Exchange Tools (DExT) Aim to develop, refine and test models for data exchange for both survey data and qualitative research data based on XML/RDF schema and will develop tools for data import and export Research the feasibility of developing automated conversion procedures for legacy formats Collaborative efforts underway (w/ODaF) for data conversion tool (DExT) and qualitative metadata (QuDExT) http://www.data-archive.ac.uk/dext/
http://www.opendatafoundation.org NORC Data Enclave National Opinion Research Center Provides a secure environment within which authorized researchers can access sensitive microdata remotely from their offices or onsite Data from National Institute for Standards and Technologys (NIST) Technology Innovation Program (TIP), the Ewing Marion Kauffman Foundation, and the Economic Research Service at the US Department of Agriculture Virtual data enclave Using DDI and exploring innovative methods to link producer and researcher knowledge (collaborative spaces, source code analysis, researcher provided metadata) Technical support by ODaF http://dataenclave.norc.org
http://www.opendatafoundation.org Canada RDC Project Consists of 14 Research Data Centres Centres, 6 branch RDCs and the Federal Research Data Centre in Ottawa Data provided by Statistics Canada RDC are now connected through a high speed secure network Project to adopt a DDI 3.0 based metadata framework for survey documentation and research work and sponsor development of tools ODaF providing technical assistance http://www.statcan.ca/english/rdc/index.htm
http://www.opendatafoundation.org EU 7 th Research Framework Program Under Socio-economic Sciences and Humanities – related specific 2007 objectives: to bring together existing research infrastructures to support the efficient provision of essential research services INFRA-2008-126.96.36.199: promoting European wide access to microdata sets of official statistics for research and leading to a European statistical system open to researchers. – INFRA-2008-188.8.131.52 (through the development, harmonization and optimal use of indicators and data for economic and innovation research) –INFRA-2008-184.108.40.206 (Developing improved access to historical archives and cultural collections for research purpose). European Access to Statistical Information (EURASI) –proposal was completed end of February for European RDC networking/remote access, data disclosure and metadata –Netherlands, Italy, Germany, Spain, Slovenia, Sweden, Hungary, Austria, Swiss, Denmark, UK, Bulgaria
http://www.opendatafoundation.org Other DDI Projects GESIS-ZUMA –Tools for mapping from SPSS and SAS save files to DDI 3.0 metadata –Requires a copy of SPSS for those transforms CASES –Has said they will support DDI 3.0 Algenta –New company producing survey design tools using the DDI 3.0 model as their internal data structure Blaise –Currently looking at DDI 3.0 –Was involved in its creation CSPro –Has support for earlier version of DDI 3.0 (Public Comment) –Integration performed by a third party Nesstar –Currently evaluating DDI 3.0 ???
http://www.opendatafoundation.org DDI Editor for archivists? Currently gathering requirements / wish list Likely to start with light, entry level editor (based on Flex) then move towards robust products (but likely on a case by case / project basis) For on archiving (DDI 1/2/3), different editors will need to be developed fo other purposes (but could use same codebase)
http://www.opendatafoundation.org Editor for Archivist Wish List User Friendly: platform independent, multiple languages, No knowledge of XML required Metadata import: Read from data files, instrument design tools, DDI Data import: Read common formats & Nesstar, save to ASCII (preservation) Metadata template: DDI Profile Metadata editing: Survey groups (catalogs), Survey description, File description and relationships, Variable-level metadata, Variable groups, Cubes?, ability to highlight text in an untagged document and tag it (external utility?) Metadata validation: Internal validation based on DDI 3.0 parser, Template-based validation, Plugin- based validation (external transforms), Ability to check spelling in various languages for all free text fields Repositories: Concepts, Classifications, Universes, Variables / Questions (with definition, question, interviewer instructions) Change tracking: versioning, reviewer's comments Metadata export: DDI 3.0, DDI 1/2.x (1.2.2 for backward compatibility with Nesstar), Mappings to Dublin Core, MARC, SDMX, etc. Metadata reporting: Generic facility to produce XSLT-based reports (fully customizable), Ability to schedule reporting, Option to disseminate report output through email, PDF, FTP, etc. Data export: Write ASCII data files + setup files for various software packages Extensions: Ability to add extension plugins (Nesstar?) or call external tools for processing