Presentation on theme: "Report to WOAP-4 from the WOAP Task Group on Data Management (TGDM) Howard Cattle International CLIVAR Project Office, National Oceanography Centre, Southampton,"— Presentation transcript:
Report to WOAP-4 from the WOAP Task Group on Data Management (TGDM) Howard Cattle International CLIVAR Project Office, National Oceanography Centre, Southampton, UK Bob Keeley Integrated Science Data Management, Department of Fisheries and Oceans, Canada
TGDM – background Origin – WOAP-2 meeting, Ispra, August Membership & ToRs established early Composition – 1 member from each of the core projects, paired, where appropriate, by an IPO member; CEOP & WMP representation Initial mandate to review (a) the current status and management of observational data & model output archives, including associated web sites within WCRP, and (b) WCRP data policy
TGDM Initial ToRs - summary 1.Determine what is being done in each project & project office and WCRP Working Group related to data management and information and data access; seek to provide unifying guidelines for future operations; 2.Review data policies of sponsors & projects; recommend if WCRP should have an overarching policy and if so draft; 3.Review the status and management of &/or access to and stewardship and archival of observational data and model output archives within core projects
TGDM – current membership Chair & CLIVAR: Howard Cattle. Vice-Chair: Robert Keeley SPARC: Christian von Savigny (Univ of Bremen) GEWEX: Bill Rossow CLiC: Jim Moore (UCAR) & Taco D'Bruin (NL) CEOP: Steve Williams (UCAR) WMP: Jim Kinter (COLA) WCRP: Dr Ghassem Asrar GEO: Michael Rast (Geo Sec) JPS coordination: Catherine Michaud (Paris Office) Need to review
TGDM - activities to date Carried out initial survey of dataset archives & management practices within WCRP Core Projects Reviewed data policies of WCRP sponsors & projects, where available Led WCRP policy statement (copy on WOAP-4 web page) Provided way forward in paper initiated by Bob Keeley (TGDM report to WOAP-4, on WOAP-4 web page)
TGDM survey Designed to identify: a)Data information management practices of projects b)Project or sub-project data centres c)Regional or special purpose datasets assembled to serve the needs of specific sub-projects or tasks within core projects d)Global datasets (including model output) e)Links to datasets Limited to documenting archived data currently generally accessible
What did completed survey forms reveal? –Approach to data & data information management dependent on project structure –Wide variety of data centres/archives used; some at major data centres; others in various locations under project auspices. –Extensive and varied datasets and links including global and regional observational datasets (satellite, in situ and & derived), and model output datasets, data from campaigns, field studies and laboratory studies.
What did completed survey forms reveal? –WCRP projects are originators and/or custodians of archives of model output or observational datasets of key importance to the recent IPCC and Ozone assessments. –No clear redundancies in the WCRP data archives are immediately apparent but this needs further investigation –Compressed version of survey attached to TGDM Report to WOAP-4
Issues addressed in TGDM document to WOAP-4 WCRP has varied and dispersed data management challenge (observation, synthesis and modelling datasets) Should we update and develop earlier survey and publish on web? How do we ensure long term management of WCRP datasets, especially in relation to data discovery, access and archiving (and transition across to a new WCRP structure)?
Issues addressed in TGDM document to WOAP-4 1.Update the list of datasets arising from the survey 2.Explore with projects the potential for rationalizing the naming convention of WCRP datasets 3.Managing WCRP datasets in the long term - seek to –Identify how long datasets need to be archived for –Ensure documentation is available on how data were collected, analysed or modelled for each dataset –Secure the future of datasets in respect of archiving, discovery and access –Encourage data management as part of planning process of new projects
Issues addressed in TGDM document 4.Providing access (& discovery) –Identify how datasets are to be and made accessible and discoverable via standardized discovery metadata (ISO standard becoming more prevalent –Core Project Offices should seek to ensure their projects follow through with the chosen solution for data discovery and access. –Seek to ensure datasets available in commonly used formats (e.g. NetCDF) with common abbreviations. Suggested strategy for providing access & discovery is to describe datasets through records in the WIGOS and to provide the data in one of the conventions of NetCDF or BUFR (others could be used)
Issues for WOAP Should we further scope the items in the Bob Keeley document and seek to implement? Resourcing is a problem. At minimum should we seek to update the survey and post on web? (WCRP web pages?). Initially could resource through ICPO but longer term maintenance an issue. Activity needs Core IPOs to provide support and take responsibility for their datasets and resourcing for overall management support What should the overall future of this activity and the TGDM be? Who to Chair in next phase? Should the TG meet?
Actions on survey of WCRP datasets 1.Check URLs in survey - unresponsive URLs to be corrected by appropriate core project leads 2.Each project to revisit survey to confirm completeness/add other information where data are available. Highlight datasets collected and managed by the projects 3.In course of updating, each project to provide the name and address of a first point of contact for the datasets 4.Where there are references to multiple versions of the data, ensure descriptions to explain the attributes of each of these 5.For each dataset, request data holders to indicate if/when their support for its DM might expire.
Action on rationalizing names 6.Projects to consider the advantages of developing a scheme for naming files. If decided on then a document needs to be developed and posted with the files to explain the naming convention and files renamed to conform.
Managing datasets in the long term 7.Each project should identify for what period of time data should be held in archives (forever needs good arguments) 8.Each archived dataset should have readily available documentation clearly associated with it, describing how the data were collected,analysed or modelled. 9.For projects ceasing operations, archive facilities should be sought to secure the future of the data generated and provide data discovery and access facilities 10.Data management must be factored into the planning for new projects.
Providing access 10.WCRP must decide how their datasets are to be discoverable and accessible (options in paper) 11.Core project offices should seek to ensure their projects follow through with the chosen solution for data discovery and access. 12.Projects should be encouraged to make their datasets available in commonly used forms with common abbreviations.