Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
ESGF and ES-DOC Documenting climate models and their simulations ES-DOC current and future plans Working with ESGF Eric Guilyardi, Balaji, Cecelia DeLuca,
Earth System Curator Spanning the Gap Between Models and Datasets.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
M. Lautenschlager, H. Ramthun 1 Metafor Review 5 / 2010.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
Eric Guilyardi (LOCEAN/IPSL and Univ. Reading) and the Metafor team Common Metadata for Climate Modelling Digital Repositories IS-ENES kick-off meeting.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
1 Eric Guilyardi and the Metafor team Common Metadata for Climate Modelling Digital Repositories Metafor Dissemination Workshop Abingdon, 14 March 2011.
IS-ENES [ees-enes] InfraStructure for the European Network for Earth System Modelling IS-ENES will develop a virtual Earth System Modelling Resource Centre.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
The Earth System Curator Metadata Representations Prototype Portal in Collaboration with ESMF and ESG Rocky Dunlap Spencer Rugaber Georgia Tech.
CIM – The Common Information Model in Climate Research
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Bryan Lawrence on behalf of BADC, BODC, CCLRC, PML and SOC An Introduction to NDG concepts [ ]=
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Questionnaire Project Plan Alpha development Beta development Release for CMIP5 approval Production Phase After CMIP5 Questionnaire 01 Jan 2010 – 30 Dec.
Sarah Callaghan 1, Eric Guilyardi 2, Charlotte Pascoe 3 and the Metafor Project Team 1 BADC- UK, 2 University of Reading, UK.
1 Earth System Modeling Framework Documenting and comparing models using Earth System Curator Sylvia Murphy: Julien Chastang:
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
Data formats and requirements in CMIP6: the climate-prediction case Pierre-Antoine Bretonnière EC-Earth meeting, Reading, May 2015.
WP6/SA2: Access to IS-ENES Data Federation SA2 is a European distributed data infrastructure providing access to data from ESM simulations produced in.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Sarah Callaghan 1, Gerry Devine 2, Eric Guilyardi 3, Bryan Lawrence 1, Charlotte Pascoe 1, Lois Steenman-Clark 2 and the Metafor Project Team 1 NCAS-BADC;
1 Gateways. 2 The Role of Gateways  Generally associated with primary sites in ESG-CET  Provides a community-facing web presence  Can be branded as.
M. Stockhause 1, G. Levavasseur 2, K. Berger 1 1 Deutsches Klimarechenzentrum (DKRZ) 2 Institute Pierre Simon Laplace (IPSL) ESGF-QCWT Quality Control.
Curator: Gap Analysis (from a schema perspective) Rocky Dunlap Spencer Rugaber Georgia Tech.
Create XML from a template Browse available records WDCC Metadata Generation with GeoNetwork Hans Ramthun, Michael Lautenschlager, Hans-Hermann Winter.
Semantic Web underpinnings of the IRI Data Library Semantic Web as a Framework for Multiple Metadata IRI Data Library: presenting Data in multiple frameworks.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Application of RDF-OWL in the ESG Ontology Sylvia Murphy: Julien Chastang: Luca Cinquini:
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
Metafor Year 3 EU Review CIM Component Services. Of itself an Ontology is an inert artefact, i.e. a dictionary. CON-CIM Conceptual CIM.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Approaches and Challenges in Managing Persistent Identifiers
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
DIAS & DIAS data release 2 years DIAS-GCI Cooperation Hiroko KINUTANI DIAS (Data Integration and Analysis System in Japan) , St. Petersburg.
Data Citation Service for CMIP6 and IPCC DDC Aspects
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
LOD reference architecture
CMIP6 use case and adoption of RDA outputs
Metadata Development in the Earth System Curator
Data Management Components for a Research Data Archive
Presentation transcript:

Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany

Overview  Metadata descriptions: sources, usage  data level, preservation level, model level, domain knowledge level  Metadata standards, IT-principles

A) A)Metadata descriptions: sources, usage  (I) Data Description Level: source: model run output format: gib, netcdf3/4 container formats (including basic metadata) metadata homogenization („Climate and Forecast Convention (CF)“ conformance, CMOR2 compliance, controlled vocabs) usage: analysis tools, data access script, data search (  „linked data principle“)  (II) Data Preservation Level: target: legacy data centers (e.g. WDCC) format: internal DB, various external formats, e.g. ISO 19139, DIF,.. usage: long term data storage and access, citation e.g. using DOIs

A) A)Metadata descriptions: sources, usage  (IIl) Model Description Level: source: Researcher interviews, online questionnaire format: CIM ( Climate Metadata for Climate Modelling Digital Repositories - Metafor FP7) Con-CIM: UML, APP-CIM: XSD + vocabs) usage: model intercomparison, scientific portals, information space browsing / search  (lV) Semantic Annotion Level: source: data metadata, model metadata, domain knowledge metadata format: OWL (RDF) usage: user navigation in portals, „faceted search“ etc. deployments: Earth System Grid CMIP5 portal, IS-ENES portal

.. Short Background Info.. The Fifth Coupled Model Intercomparison Project (CMIP5) – Sponsored by the WMO WGCM – Quality Controlled Data to (eventually) appear in the IPCC Data Distribution Centre… World Wide Data Management Infrastructure building effort, consistent wflow from producers to consumers... In Numbers: Simulations: ~90,000 years ~60 experiments ~20 modelling centres using ~30 major(*) model configurations ~2 million output datasets ~10's of petabytes of output ~2 petabytes of CMIP5 requested output ~1 petabyte of CMIP5 “replicated” output – Which will be replicated at BADC & DKRZ, to arrive in 2010/2011! ~10 TB of land-biochemistry (from the long term experiments alone).

B) Metadata standards, IT principles  (I) Data Description Level: Grib, netcdf data containers 10`s of PBytes Metadata Data File naming convention based on CVs building uniform URIs (DRS, Data Reference Syntax) Activity/Product/Institute/Model/Exp/frequ/realm/Variable/ensemble Data servers MD catalogue servers wget  Enabling „linked data“

B) Metadata standards, IT principles  (II) Data Preservation Level: CERA2 DB schema OWL conceptual model Tape Archive search API QC, DOI assignment,.. WDCC Metadata Concept CERA GUIIS-ENES Portal… Scalability Sustainability Common CV Flexibility User friendly GUIs OAI-PMH ISO …

B) Metadata standards, IT principles (III) Model Description Level: Metafor FP7 project: Common Information Model (CIM)  Formal metadata model of the climate modelling process  It includes descriptions of the experiments being undertaken, the simulations being run in support of these experiments, the software models and tools being used to implement the simulations and the data generated by the software.  CMIP5 use case: CV collection, CMIP5 questionnaire

CONCIM (UML) APPCIM (XSD) CIM Instances (interliked XML files) ISO, Geographic Markup Language (GML) series Automatic translation CMIP5 portal(s) IS-ENES portal Metafor catalogue Metafor CIM overview

Metadata collection

Automatic XML  RDF translation CMIP5 gateway(s) IS-ENES 1 portal 1 Infrastructure for the European Network for Earth System Modelling ESG OWL instances

(CON)CIM Overview Quality ISO Shared Data Activity: simulations in support of experiments Software (hierarchical model components, Coupled together) Grids

B) Metadata standards, IT principles  (IV) Semantic Annotation Level CIM XML RDF Data object XML Community content Content Management System RDF Triple Store Portal(s) ESG Gateways IS-ENES Portal Evolving OWL model Triple Store OWL ontologies: Rel. DB

CMIP5 Quality Control Files Data MetadataCIM Metadata Data in prescribed DRS Syntax Data Quality Checks L2 MD Quality Checks L2 THREDDS Data Server MD on data Metafor / CIM Questionnaire MD on model+simulation QC DB Quality MD Metadata Repository Data MD Information MD

CMIP5 STD-DOI Publication TIB:DOI Registration Agency Data NodeMetadata THREDDS Data Server MD on data QC DB Quality MD Data MDInformation MD Filesystem Data Longterm Archive Data Quality Checks L3 double check, cross checks STD-DOI Catalogue STD-DOI MDInformation MD WDCC:DOI Publication Agent DOI Target Page access to data and metadata Metafor / CIM MD on model+simulation +data+quality

B) Metadata standards, IT principles  (IV) Semantic Annotation Level CIM XML RDF Data object XML Community content Content Management System RDF Triple Store Portal(s) ESG Gateways IS-ENES Portal Evolving OWL model Triple Store OWL ontologies: Rel. DB

IS-ENES Info Portal

:49:13 INFO triplestorefill.utility Adding item with ID echam at :49:13 INFO triplestorefill.sesameconnector Storing RDF... (1118 byte) :49:13 INFO triplestorefill.sesameconnector RDF isenes:. isenes:echam rdf:type isenes:ComponentModel. isenes:echam foaf:page. foaf:topic isenes:echam. isenes:echam dc:title "ECHAM". isenes:echam rdfs:label "ECHAM". isenes:echam rdfs:comment "Global circulation model". isenes:dkrz isenes:isResponsibleFor isenes:echam. isenes:echam isenes:hasResponsible isenes:dkrz. isenes:joachim-biercamp rdfs:label "Joachim Biercamp". isenes:joachim-biercamp rdf:type foaf:Person. isenes:dkrz rdfs:label "DKRZ". isenes:dkrz rdf:type foaf:Organization. isenes:joachim-biercamp isenes:isMemberOf isenes:dkrz. isenes:dkrz isenes:hasMember isenes:joachim-biercamp. isenes:dkrz dc:title "DKRZ". isenes:joachim-biercamp foaf:mbox „save“ Triple Store

(B) From a user`s perspective Bildchen: Plone seite mit „related info“ portlet

(B) From a user`s perspective Bildchen: Plone Seite nach Klick auf „related“ link: faceted search

Summary international CMIP5 / IPCC effort is key driver for collection / standardization of CVs, Metadata, conceptual models (Ontologies) Metadata mainly used for model intercomparison, uniform data search / access + data processing  Prepare for Climate Impact Community use cases !!

..workshop reminder.. - Usage and quality of descriptive keyword type of metadata used in your domain to manage data. - Types of usages of this metadata (management, retrieval, research statistics, machine processing, etc). - The standards used for your metadata descriptions (structure, elements, vocabularies). - Adherence to common IT principles (explicit syntax, registered semantics, use of PIDs, etc). - Compliance with the recommendations to be found in the report of the e-IRG task force on Data Management we would like the presenters to focus on a few points allowing all of us to draw conclusions at the end:

Methodology to create CMIP 5 CIM instancaes

 Producers: providers of models, tools, model results, HPC ecosystem, Grid.., community Motivation  Consumers: ENES community, impact community Virtual Earth System Modeling Resource Centre Portal E-infrastructure components Governance Agreements, Commitments, Sociology,.. Ticketing Collaboration Metadata (CIM,..) Protocols APIs AAI CMIP5/AR5/+ data services

IS-ENES vERC Portal (A) Community info presentation (models, tools, descriptions,..) Content Management Sytem (CMS, Collab.Tool) RequirementE-Infra componentTechnology used Plone + IS-ENES „content-types“ (C) Data portal to AR5 archivesWeb Framework Zope/Plone plugin(s) (F) Additional value provisioning „Cross-selling“ Semantic interlinking RDF triple store (Sesame) (D) CIM metadata (external) Metafor service(s) (external) ESG-gateway (E) External content / metadata collection Web service (proxies) Info (XML) harvester Python info collector based using Atom, OAI-PMH,.. protocols (B) Community development support Project Management / Ticketing Tool Redmine