The IPY Data and Information Service—How do we get there? IPY Data Workshop Cambridge, England 3 March 2006 World Data Center for Glaciology, Boulder Facilitating the international exchange of snow and ice data Mark A. Parsons IPY Data Policy and Management Sub-committee IPY Data and Information Service Electronic Geophysical Year
IPY1 IPY2 IGY (IPY3) IPY 4
IPYDIS; Mark A. Parsons, 3 March What will IPY4 bring? The Challenge! Will researchers be able to find all the data relevant to their research and see relationships between data sets. Access Will they be able to merge and integrate different data sets across experiments and disciplines? Interoperability Will they be able to subset, visualize, and transform the data? Usability Will they be able to retrieve and understand IPY4 data in 2050? Preservation IPY4
IPYDIS; Mark A. Parsons, 3 March Organization of IPY Data Management IPY Joint Committee Data Policy & Management Subcommittee scientists data managers funding agencies Programme Office Data & Information Service eGY Projects Data Centers, Virtual Observatories, etc. Users
IPYDIS; Mark A. Parsons, 3 March 20065
6 Alternate Views of the DIS DIS?
IPYDIS; Mark A. Parsons, 3 March Systems and Innovation The Standish Group’s “CHAOS report”. An assessment of over 40,000 IT application projects Succeeded “Challenged” Failed “We're entering a new world in which data may be more important than software.” - Tim O'Reilly
IPYDIS; Mark A. Parsons, 3 March The People Part Service counts. “A striking proportion of project difficulties stem from people in both customer and supplier organisations failing to implement known best practice.” — Oxford University/Computer Weekly survey of public and private sector IT projects (emphasis added) However, people are much more able to adapt to change, uncertainty, and messy systems
IPYDIS; Mark A. Parsons, 3 March The People Part: Science and Data Management Many have stated the need to involve scientists in data management, but… It is also important to involve data managers in conducting science. Field Experiments: 20% increase in data quality (Parsons, et al. 2004) 70% of experiment cost is data assembly (Bernhardsen 1992, Longley, et al. 2001) Observing systems
IPYDIS; Mark A. Parsons, 3 March Preservation and Access—Two Peas in a Pod Scientific Data Stewardship: “preservation and responsive supply of reliable and comprehensive data, products, and information for use in building new knowledge to…” —US Global Climate Research Program, 1998 “the long-term preservation of the scientific integrity, monitoring and improving the quality, and the extraction of further knowledge from the data” — H. Diamond et al., NOAA/NESDIS, 2003
IPYDIS; Mark A. Parsons, 3 March Access. What is it? Preservation requirements are well defined in the Open Archive Information System (OAIS) Reference Model, but No similar model for access requirements Not even a common definition of “access” and what restricts it Unique access requirements for social science data and non-digital collections (physical samples, photographs, audio, etc.) “Facts are terrible things if left sprawling and unattended…” - Norman Cousins
IPYDIS; Mark A. Parsons, 3 March Standards—Essential but Cumbersome Some Possibilities: ISO19115 metadata standard OAIS Reference Model OGC data transfer standards Other OGC Standards “Web Services” (WSDL, SOAP) Other XML-based standards (GML, OAI-PMH, RSS,…) Etc, etc, No New Standards! “We must not … start from any and every accepted opinion, but only from those we have defined — those accepted by our judges or by those whose authority they recognize.” —Aristotle c. 350 BC
IPYDIS; Mark A. Parsons, 3 March Issues with the Data Itself Formats: Archives and users may have different needs Consider four themes (Raymond, 2004) Transparency Interoperability Extensibility Storage or transaction economy “We often get blinded by the forms in which content is produced, rather than the job that the content does.” - Tim O’Reilly
IPYDIS; Mark A. Parsons, 3 March Other Questions and Issues How interoperable can we be? What does “portal” mean to you? How do maximize use of existing data systems and structures? CODATA? WDCs? How does IPY data fit into current operational systems? What about GEOSS—can IPY be a prototype? Which technological trends can help us? (ontologies, virtual observatories, portals, etc.) How do we incorporate historical data? Need a solid business model esp. for the long-term
IPYDIS; Mark A. Parsons, 3 March Breakout Groups Methods for Data discovery—portals Paul Berkman, room 370 Ensuring data submission and publicatio-- carrots and sticks Jim Moore, room 303b Semantics, ontologies, and language Heather Lane, main room
IPYDIS; Mark A. Parsons, 3 March Charge to Breakout Groups 1.Determine rapporteur 2.Explicitly define problem(s) 3.Identify options to solve problem 4.Recommend steps to solve problem 5.Present to whole group for feedback 6.Revise 7.Write up results to be part of a larger workshop report. Include outstanding issues, next steps, etc. Workshop report will be presented to broader IPY research community for feedback and buy in.
IPYDIS; Mark A. Parsons, 3 March Data Systems Today © N. Carr 2006
IPYDIS; Mark A. Parsons, 3 March What they need to become © N. Carr 2006