Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:

Slides:



Advertisements
Similar presentations
Std-doi Publication of Climate Data at WDCC DataCite Summer Meeting 7./8. June 2010 Publication of climate data Heinke Höck World Data Center for Climate.
Advertisements

Introduction to DataCite Adam Farquhar PhD Head of Digital Library Technology, The British Library President, DataCite June 2010.
ESGF and ES-DOC Documenting climate models and their simulations ES-DOC current and future plans Working with ESGF Eric Guilyardi, Balaji, Cecelia DeLuca,
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
Climate Analytics on Global Data Archives Aparna Radhakrishnan 1, Venkatramani Balaji 2 1 DRC/NOAA-GFDL, 2 Princeton University/NOAA-GFDL 2. Use-case 3.
Authentication of the Federal Register Charley Barth Director, Office of the Federal Register United States Government.
M. Lautenschlager, H. Ramthun 1 Metafor Review 5 / 2010.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin World Data Center Cluster.
Review on 5 Years DataCite and 10 Years DOI Registration for Data DataCite Annual Conference 2014 Nancy, August 25th – 26th Michael Lautenschlager (DKRZ.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
1 Eric Guilyardi and the Metafor team Common Metadata for Climate Modelling Digital Repositories Metafor Dissemination Workshop Abingdon, 14 March 2011.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
CIM – The Common Information Model in Climate Research
Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits crib sheet Graham Parton With many thanks to Dr.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
- EGU 2010 ESSI May Building on the CMIP5 effort to prepare next steps : integrate community related effort in the every day workflow to.
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.
Data formats and requirements in CMIP6: the climate-prediction case Pierre-Antoine Bretonnière EC-Earth meeting, Reading, May 2015.
WP6/SA2: Access to IS-ENES Data Federation SA2 is a European distributed data infrastructure providing access to data from ESM simulations produced in.
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
1 Gateways. 2 The Role of Gateways  Generally associated with primary sites in ESG-CET  Provides a community-facing web presence  Can be branded as.
M. Stockhause 1, G. Levavasseur 2, K. Berger 1 1 Deutsches Klimarechenzentrum (DKRZ) 2 Institute Pierre Simon Laplace (IPSL) ESGF-QCWT Quality Control.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
|| Barbara Hirschmann1 Establishing a DOI service for Switzerland’s university and research sector.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
The Modeling Circle Courtesy M. Lautenschlager, DKRZ.
19-20 October 2010IT Directors’ Group Meeting 1 Item 3.3.g of the agenda Vision Infrastructure Project on Secure Infrastructure for CONfidential data access.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
Application of RDF-OWL in the ESG Ontology Sylvia Murphy: Julien Chastang: Luca Cinquini:
Data Citation Implementation Pilot Workshop
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Jost von Hardenberg ISAC-CNR, Torino, Italy with Paolo Davini, Susanna Corti, and many others EUDAT User Forum, Rome,Italy 3-4 February, 2016.
Joseph Antony, Andrew Howard, Jason Andrade, Ben Evans, Claire Trenham, Jingbo Wang Production Petascale Climate Data Replication at.
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
PIDs in EUDAT Webinar, 15 Februari 2013
Russian Academy of Sciences
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
DIAS & DIAS data release 2 years DIAS-GCI Cooperation Hiroko KINUTANI DIAS (Data Integration and Analysis System in Japan) , St. Petersburg.
Data Citation Service for CMIP6 and IPCC DDC Aspects
An Overview of Data-PASS Shared Catalog
A step-by-step guide to DOI registration
CMIP6 / ENES Data TF Meeting: DKRZ
Research data in library catalogues and the joint initiative of European technical libraries for data registration Jan Brase Workshop Primary data for.
RDA uptake activities and plans: ESGF
Presentation transcript:

Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ: CMIP5: cmip-pcmdi.llnl.gov/cmip5; CMIP5 Quality Control: purl.org/org/cmip5/qc EGU Martina Stockhause, Michael Lautenschlager, Heinke Hoeck, and Frank Toussaint CMIP5 Quality Control WorkflowDistributed Quality Control ApproachData Publication Procedure Future Perspective CMIP5 Quality Control (QC) Three Quality Control (QC) Levels are defined for CMIP5 data: Quality Levels for CMIP5 QC Level 1: Metadata: Technical checks on METAFOR questionnaire input data Data: CMOR2 and ESG publisher conformance checks QC Level 2: Metadata: METAFOR questionnaire metadata checked by scientist Data: Technical checks e.g. on the reliability of variable ranges and the consistency checks between data and data requirements QC Level 3 / DOI: Data approved by author and published as DOI Data assigned a DOI is formally citable and is granted persistent access. The final DOI data publication procedure is in agreement with the regulations of the DataCite consortium: DOI Publication Process Scientific Quality Assurance: performed by the data author and documented via a publication service GUI (atarrabi) Technical Quality Assurance: cross- and double checks of data and metadata integrity DOI Publication: DataCite DOI metadata and DOI are separately send to the registration agency, a member of the DOI Foundation. Data and DOI remain unchanged and persistent. PCMDI / LLNL: data and security infrastructure (ESG) BADC (British Atmospheric Data Centre): metadata infrastructure (METAFOR / CIM) WDCC (World Data Center Climate) / DKRZ: quality control, data publication (DataCite DOI) DN TDS QC L2 DN TDS QC L2 ESG Gateways QC Repository CIM MD Repository DN TDS QC L2 PCMDI/BADC/WDCC CIM Questionnaire QC L3 / STD-DOI WDCC: DOI Publication Agency MD on Model / Simulation MD on Data MD on Quality Store, Analyse, and Plot Results DOI Data Long-term Archive DOI Catalogue IDF MD on Data / Quality / Model / Simulation Quality Results TQA Atarrabi SQA of Data Author CERA2 MD Register STD-DOI / URL of DOI Target Page Harvest all MD Harvest all MD of long-term archive DOI Target Page DOI Access Parts of Data / MD on Simulation CMIP5 Organization & Infrastructure Components For distribution of data connected to the next IPCC report, the Earth System Grid Federation (ESGF) was founded. Its members have different responsibilities within the data infrastructure: For CMIP5 ca. 3 PB of officially requested data are expected to be archived. About 1 PB of that data will likely be of especially high interest and will be replicated by the three ESGF partners. Because of the high data volume the QC checks up to level 2 are performed distributed among the ESGF. The final QC level 3 checks for the DOI assignment are carried out by WDCC. Afterwards CMIP5 data is formally citable and remains persistent..org The different QC levels are connected with different access rights for registered users: Quality Control Workflow CMIP5 DOI Publication GUI atarrabi Actors in DOI Publication ProcessWorkflow of Distributed QC in CMIP5 QC checks for data and metadata are performed, separately, for levels 1 and 2. During the cross-checks of QC L3 checks their results are reviewed. QC is accomplished on DRS Atomic Dataset level. The QC results are aggregated on DRS experiment level. In the gateways data discovery is supported down to the level of Ensemble versions (ESG dataset). Granularity of Quality Control Restricted Access (QC Level 1 Data): After ESG publication the access is restricted under control of the modelling centre. Scientific Access (QC Level 2): The scientific community is granted access of data of QC L2. QC Level 3 / DOI: With the DOI assignment the data archive is opened for access by non-scientific users. For high volume data such as climate data quality assurance has to be carried out at the data storage centre before opening the repository for data access. Data distributed in a Data Grid with its decentralized data repositories have to be checked at different sites with comparable QC procedures. The cross- and double checks of the Technical Quality Assurance make use of the QC result of the preceding levels. Data as well as metadata is reviewed and data accessibility checked. QC L3 / DOI Process in CMIP5 More Information: More Information: More Information: The final author approval step is supported by the GUI atarrabi. Authors check basic metadata and add information about their own quality assurance (Scientific QA). A DOI is assigned and registered at the International DOI Foundation (IDF: dx.doi.org) via the Registration Agency DataCite. DOI Construction Rule for CMIP5: doi: /WDCC/CMIP5. Overall CMIP5 QC Workflow Data NOT formally citable Modelling group control access manually. Data NOT formally citable Automatic access granted after filling in ESGF registration page. DOI Assigned: Data formally citable Data can appear in IPCC-DDC Automatic access granted after filling in ESGF registration page Data Published > 10 PB Data and Metadata QC L On globally distributed data nodes Metadata QC L2 passed? To be replicated among ESGF? Data QC L2 passed? QC L3 passed? Discard data (Informal citation still requested where formal citation not available) NO YES Replicated: Copied to PCMDI, BADC, WDCC & elsewhere ~ 1PB Granularity of QC in CMIP5 context The current DOI publication procedure is comparable to the publication of grey literature in scientific print media. For the integration of a peer review process quality procedures accepted and agreed on by the earth system modelling community are necessary. The distributed quality control approach could be reduced in complexity by the integration of the QC Repository into the CIM Metadata Repository. Distributed QC Approach Distributed QC Procedure in CMIP5 CMIP5 data is delivered to one of the three ESGF partners, where it is ESG published and thus QC L1 Data checked. Afterwards QC L2 Data consistency checks are performed, before a data subset is replicated among the ESGF. QC L2 results are stored in a central QC Repository. During QC L3 / DOI checks the QC results are accessed by the DOI Publication Agency WDCC. Other sources for cross- and double-checks are the CIM Metadata Repository, the Thredds Data Server (TDS), and the metadata stored in the long- term archive at WDCC. Thus, the effort of the QC L2 Data checks is shared among the ESGF. But the QC L3 / DOI checks are performed at one site making use of the QC L2 results stored in a central QC Repository. Thus a QC procedure/tools have to be developed, maintained and distributed centrally and agreed upon within the scientific community. Our distributed QC approach consists of different software components: ___________________________________________________________________ ESG: Earth System Grid, MD: Metadata, DN: Data Node, TDS: Thredds Data Server, TQA: Technical Quality Assurance, SQA: Scientific Quality Assurance. Registration Agency Scientist Publication Agency Permission: QC L2 DOI- Publication TIME Scientific Q. Assurance Technical Q. Assurance QC Run Service: QC tool run and Repository ingests of configuration and results QC Services for data analyses and exception statistics QC Plotting Service and plot ingest in Repository QC L2 assignment QC checks: QC Tool QC Services DOI Publication Agency with long-term archive Multiple Sites performing local QC checks Project Metadata Repository QC Tool QC Services Central Repository QC Tool QC Services DOI process Export QC results for DOI publication process DOI publication: Organisation of CMIP5 Data Federation