Presentation on theme: "Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder."— Presentation transcript:
Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder CO
CMIP5 Metadata and the Metafor project Sarah Callaghan (Metafor project manager) firstname.lastname@example.org With many thanks particularly, but not limited to: V. Balaji, Philip Bentley, Cecelia DeLuca, Sebastien Denvil, Gerry Devine, Mark Elkington, Rupert W. Ford, Eric Guilyardi, Michael Lautenschlager, Bryan Lawrence, Mark Morgan, Marie-Pierre Moine, Sylvia Murphy, Charlotte Pascoe, Hans Ramthun, Paul Slavin, Lois Steenman-Clark, Frank Toussaint, Allyn Treshansky,and Sophie Valcke and many other colleagues from the The Global Organisation for Earth System Science Portals, Earth System Curator and the Earth System Grid Federation
Global community activity under the auspices of the World Meteorological Organisation (WMO) via the World Climate Research Programme (WCRP) Aim: to address outstanding scientific questions that arose as part of the IPCC AR4 process, improve understanding of climate, and to provide estimates of future climate change that will be useful to those considering its possible consequences. Method: standard set of model simulations in order to: evaluate how realistic the models are in simulating the recent past, provide projections of future climate change on two time scales, near term (out to about 2035) and long term (out to 2100 and beyond), and understand some of the factors responsible for differences in model projections, including quantifying some key feedbacks such as those involving clouds and the carbon cycle CMIP5: Fifth Coupled Model Intercomparison Project
Climate model and experiment documentation What is it ? List of climate model properties Whys and wherefores of simulations Conformance to experimental protocol Standard to describe and compare within a Model Intercomparison Project aka “metadata”: data describing data What for ? Archive, locate, assess, make sense of climate model data
Simulations: ~90,000 years ~60 experiments within CMIP5 ~20 modelling centres (from around the world) using ~several model configurations each ~2 million output “atomic” datasets ~10's of petabytes of output ~2 petabytes of CMIP5 requested output ~1 petabyte of CMIP5 “replicated” output Which will be replicated at a number of sites (including ours), arriving now! Of the replicants: ~ 220 TB decadal ~ 540 TB long term ~ 220 TB atmos-only ~80 TB of 3hourly data ~215 TB of ocean 3d monthly data! ~250 TB for the cloud feedbacks! ~10 TB of land-biochemistry (from the long term experiments alone). CMIP5 numbers! (May 2011: All these data output volumes probably a factor of 2 too low!!!)
Why the focus on metadata in CMIP5? From “Data Storage and Distribution: Lessons from the CMIP3” Karl Taylor, 2009 http://wcrp.ipsl.jussieu.fr/Workshops/Downscaling/Documents/Presentations/Taylor_CMIP3_lessons1.pdf http://wcrp.ipsl.jussieu.fr/Workshops/Downscaling/Documents/Presentations/Taylor_CMIP3_lessons1.pdf How can the process be improved? –ingest model documentation and expt. details into a searchable database Summary of lessons learned in previous MIP’s –Require some model documentation prior to accepting model output for distribution.
The CMIP5 questionnaire A year into the project, METAFOR became “a major international focal point for earth system modelling metadata definition” (Karl Taylor, PCMDI) Metafor was tasked by WGCM/CMIP to define, collect and provide the CMIP5 model metadata This is when life really started to get interesting! Metafor's original objective was: “... to develop a Common Information Model (CIM) to describe climate data and the models that produce it in a standard way, and to ensure the wide adoption of the CIM”
What is the CIM The CIM (Common Information Model) is a domain model of the concepts and relationships used in climate modeling –It includes descriptions not only of climate data, but also of the models that generated and/or used that data, the simulations that those models implemented, the experiments for which those simulations were run, the people/institutions that were involved and why they bothered –It tries to describe the full provenance of climate modeling artifacts It's a metadata model that can be paired with climate modeling artifacts It's an emerging standard It's the core of a related set of tools and services It's the structure around which the CMIP5 metadata is based
Controlled vocabularies and how they were created
Controlled vocabularies The CIM provides the structure for the questionnaire, while the controlled vocabularies provide the content. The controlled vocabularies can be customised for other domains, allowing the CMIP5 questionnaire to be reused for those domains.
Completing the questionnaire Suggested order for filling in the questionnaire
15 CMIP5 Users 24 modelling groups, 25 platforms being described, 44 models, 65 grids, and 223 simulations CAWCR - Centre for Australian Weather and Climate Research CCCMA - Canadian Centre for Climate Modelling and Analysis CCSM - Community Climate System Model CMA-BCC - Beijing Climate Center, China Meteorological Administration CMCC - Centro Euro-Mediterraneo per I Cambiamenti Climatici CNRM-CERFACS - Centre National de Recherches Meteorologiques - Centre Europeen de Recherche et Formation Avancees en Calcul Scientifique. EC-Earth - Europe FIO - The First Institute of Oceanography, SOA, China GCESS - College of Global Change and Earth System Science, Beijing Normal University GFDL - Geophysical Fluid Dynamics Laboratory INM - Russian Institute for Numerical Mathematics IPSL - Institut Pierre Simon Laplace LASG - Institute of Atmospheric Physics, Chinese Academy of Sciences China MIROC - University of Tokyo, National Institute for Environmental Studies, and Japan Agency for Marine- Earth Science and Technology MOHC - UK Met Office Hadley Centre MPI-M - Max Planck Institute for Meteorology MRI - Japanese Meteorological Institute NASA GISS- NASA Goddard Institute for Space Studies USA NCAR - US National Centre for Atmospheric Research NCAS - -UK National Centre for Atmospheric Science NCC - Norwegian Climate Centre NIMR - Korean Naitonal Institute for Meteorological Research QCCCE-CSIRO - Queensland Climate Change Centre of Excellence and Commonwealth Scientific and Industrial Research Organisation RSMAS - University of Miami - RSMAS
Getting help with the questionnaire Contact the questionnaire help team –email@example.com –We want to improve the questionnaire so please tell us how you are getting on and what you would like to change. Book an online training session for your team More help documentation is available in the questionnaire –soon this will include help videos
What other things do Metafor and the CIM do? CIM Viewer – given an ID, display a document CIM Query tool – given a query, return a result set CIM Differencing tool CIM Document tracking CIM Document validator Aim: compose these services into portals which navigate through these options in “user-community-friendly” ways. The aim is to have services and tools that can be integrated into institutional portals as well as accessed through the Metafor portal.
CIM Differencing differs from other types of feature comparison tools because... –There will be several variants of comparisons depending on the type of CIM instances being compared and the type of information being requested. –The set of features being compared is potentially orders of magnitude larger than those typically found in online catalogs (hundreds vs. tens). –CIM instances have a very rich structure to draw on. Sometimes this helps; Other times it is a hindrance. So... –Only small focused comparisons across the same document type will be supported –And users should be able to constrain how the results are presented to them in real-time Differencing Tool
What can be done in the future Questionnaire pages are totally driven from the mindmaps: – Update mindmaps, change content. – New mindmaps, new questions! Seeing other applications (beyond CMIP5) with: – Statistical Downscaling (beyond Metafor) – Ensembles (collecting EU Ensembles metadata) – Extending for Impact Assessment Models (UK MIRP project) – Other extensions: possible new US activity on dynamical cores Software Improvements: – Deployment within national infrastructure – Metafor portal, based on “cleaner” ingestion of CIM documents, and a RESTful web service layer providing search, validation etc. - Integration with ESGF
Life after Metafor Metafor formally finishes in September 2011 The CIM tools and services will be handed over to IS- ENES to develop and maintain –We want to develop a community and ecosystem for the tools to flourish Community governance for CIM and Controlled Vocabularies: –“Standards Committee” under WCRP/CMIP CIM use beyond CMIP5: –statistical downscaling CV –CMIP5 metrics –library to ingrain CIM generation within GCMs
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) CISL Seminar, Boulder, CO August 29, 2011
Outline Background (CIM within ESG): – What is Curator? – What is the Earth System Grid (ESG)? – Metadata and the Curator Project – Trackback display features Background (CIM within ESMF): – The Earth System Modeling Framework (ESMF) – How ESMF is implementing the CIM Live Demonstration of CMIP5 model metadata in ESG
What is Curator? The Curator project collaboratively develops software infrastructure to support end-to-end modeling in the Earth sciences. – Funded initially by NSF in 2005 – Now supported by NASA, NOAA GIP, and NSF CDI and TeraGrid funds Curator collaborates with many groups across the U.S. and internationally – ESG, the NOAA Geophysical Fluid Dynamics Laboratory (GFDL), the DOE Program for Climate Model Diagnosis and Intercomparison (PCMDI), METAFOR, and many others. Project Objectives: – Span the gaps between modeling and data services. – Use metadata to document models. – Automate routine processes with workflow software. – Develop software infrastructure that can facilitate the governance of community software projects within the Earth sciences. One important focus is preparing for CMIP5. Curator’s role in CMIP5 is to serve as a liaison between METAFOR and the Earth System Grid (ESG) and to implement the display of CMIP5 metadata in the ESG Gateway.
What is the Earth System Grid (ESG)? The Earth System Grid (ESG) is a network of nodes for federated data access and related services that supports research on Earth’s climate and its impacts. Goals – Make data more useful for researchers and policy makers. – Meet the needs of international climate projects for distributed databases, data access and data movement. – Provide a universal, Web-based data access portal for multi-model, observational, and reanalysis data collections. – Provide a wide range of climate data-analysis tools and diagnostic methods to international and U.S. climate centers. The Earth System Grid - Center for Enabling Technologies (ESG-CET) is funded by the U.S. Department of Energy as part of the SciDAC (Scientific Discovery through Advanced Computing) program. Content courtesy of Dean Williams (PCMDI) and Don Middleton (NCAR) from the ESG website and “Cyberinfrastructure and the Global Environmental Data Challenge", Feb 2011, e-Science Institute, Edinburgh
ESG’s Federated Architecture Image courtesy of Luca Cinquini (NASA/NOAA) and used with the permission of Don Middleton (NCAR) from “Cyberinfrastructure and the Global Environmental Data Challenge", Feb 2011, e- Science Institute, Edinburgh
Display Features: Tabs and Component Trees Curator display in ESG showing metadata from a CMIP5 run.
The Earth System Modeling Framework (ESMF) ESMF is high-performance software infrastructure that is used by a broad spectrum of weather, climate, and related models. It enables models to be organized as sets of components representing physical domains and processes, such as atmospheres, oceans, and land masses. The components can be reused in different contexts and shared by multiple research and operational centers. ESMF also provides toolkits for common modeling functions, so modelers don't need to develop those utilities independently. One of these utilities is a Attribute Class that can be used to make models self-describing. It represents metadata as name-value pairs, organized in packages that reflect current community standards (ISO, Climate and Forecase, CIM).
ESMF and the CIM ESMF is implementing the CIM as a series of nest-able Attribute Packages that can then be exported at model initialization as a CIM XML. This work is ongoing, but as of ESMF release 5.2.0r, which was released in July 2011, ESMF supports: – General component description – Simulation properties – Responsible parties – ISO citations – Platform descriptions – Couplings/Inputs – Field descriptions – Custom attributes These features are currently being implemented into the Community Earth System Model (CESM).
Automated Metadata Workflows in CESM The XML file that CESM generates using ESMF can automatically be ingested into ESG since it is in the same format as the metadata generated by the CMIP5 questionnaire. The advantage of having the model generate metadata is that it can be generated and customized more easily. Example output at right Screenshot
Future Work Finalize display of CMIP5 model metadata Explore more sustainable technologies for the CIM-to- display conversion Leverage Curator metadata capabilities in other projects (e.g. a shared data analysis and visualization workspace, the National Climate Prediction and Projections Platform, a dynamical core workshop) Explore a joint implementation of the CIM portal and Curator trackback interface in future versions of ESG
Live Demo of Curator Display in ESG View at: http://www.earthsystemgrid.org/home.htm