Presentation on theme: "From lab books to computational Earth science. Chris Hill, MIT – Edinburgh, July 2007."— Presentation transcript:
From lab books to computational Earth science. Chris Hill, MIT – Edinburgh, July 2007
Lab books A lab notebook is a primary record of research. Researchers use a lab notebook to document their hypotheses, experiments and initial analysis or interpretation of these experiments. The notebook serves as an organizational tool, a memory aid, and can also have a role in protecting any intellectual property that comes from the research. researchhypotheses experiments intellectual propertyresearchhypotheses experiments intellectual property The guidelines for lab notebooks vary widely between institution and between individual labs, but some guidelines are fairly common. The lab notebook is usually written in as the experiments progress, rather than a later date. Many say that lab notebook should be thought of as a diary of activities that are described in sufficient detail to allow another scientist to follow the same steps. To ensure that data cannot be easily altered, notebooks with permanently bound pages are often recommended. Researchers are often encouraged to write only with unerasable pen, to sign and date each page, and to have their notebooks inspected periodically by another scientist who can read and understand it. All of these guidelines can be useful in proving exactly when a discovery was made, in the case of a patent dispute. Several companies now offer electronic lab notebooks. This format has gained some popularity, especially in large pharmaceutical companies, which have large numbers of researchers and great need to document their experiments. electronic lab notebookselectronic lab notebooks wikipedia
Lab books physical, chemical and biological scientists are taught lab-book discipline from an early age. –reproducible results are the foundation of scientific and engineering disciplines e.g. Mickleson/Morley. –even an infamous Journal of Unreproducible Results in computational science the lab book discipline is not so ubiquitous – maybe because –program is a formal statement of applied mathematical axioms –axioms are deterministic –therefore reproducibility is not an issue –however, a programs i.e. a complex collection of simple elemental statements is hard to comprehend. If details are not recorded, reproducibility may well be an issue.
Some example computational Earth science experiments. Aqua-planet. Eddying North Atlantic. Global ocean with eddies and seaice. IPCC
A simple GFD configuration Some factors that affect the solution: –Initial conditions. –Atmosphere: Clouds, radiation, dynamics, boundary layer, temporal and spatial discretization…. –Seaice: Thermodynamics. Aging. Stress-strain relation…. –Ocean: Dynamics, coordinate system, vertical/horizontal friction and mixing…. –Coupling: Time stepping, emergetics. –External forcings: Solar insolation, reference profiles Jean-Michel Campin and David Ferreira Water covered planet. Atmosphere-ocean- seaice.
Red/blue shading: ocean heating/cooling. Cyan/magenta line: +/-17.5 O 200m. Streaks: Windstress. Green thickness: Ocean mixed layer depth. An eddying, ocean only configuration Ocean-only, forced with atmospheric reanalysis for Jan- Mar. Some factors that affect the solution: –Initial conditions. –Atmosphere fluxes: Planetary boundary layer scheme. –Ocean: Dynamics, coordinate system, vertical/horizontal friction and mixing…. –Coupling: Time stepping, emergetics. –External forcings: Solar insolation, reference profiles, atmospheric reanalysis. –Non-linear/turbulent flow, so bitwise reproducibility subject to FP round off, parallel reduction operatations etc…
IPCC ocean ACC transports Could I make this plot without too much difficulty – yes Could I rerun IPCC scenario (possibly with some parameter change) – no Diagnosing these results is possible today (PCMDI/ESG archives) for broad scientific community. Rerunning experiments (with or without small changes) is still very hard. Factors affecting solution range from bottom drag to land-surface formulation to emissions profiles. Couples atmosphere, ocean, seaice, land, vegetation, chemistry etc…
Examples summary To reproduce an experiment –significant quantity of information needs to be stored – spans broad big-picture information (water- covered planet, atmos+ocean+seaice) to minute details (bitwise reproducibility may require record of compiler, OS etc…) Way Forward hand record is not practical nor ideal (i.e. not as potentially useful as electronic record). Electronic information should be stored so as to be amenable to machine reasoning. –requires defined vocabularies, precise formal structure, pattern matching, rules etc.. – W3C/semantic web technologies - XML, RDF, In theory, using XML, RDF etc…< we could describe model systems using these and enable reruns for extra outputs (e.g. transport of S 3 by flow) or derived runs (e.g. modified air-sea coupling coefficient of formulation). In practice this is hardwork!
Baby steps toward a computational Earth science model repository. What is working today – PCMDI/ESG Steps toward future - ESC
PCMDI Archive of all IPCC model outputs. Stored in common format (netCDF with standard metadata). Stored on common mesh. Simplifies things, but can/does degrade information and even mislead (e.g. conservation in one coordinate system may be inexact in another). Very limited model metadata is held. Very successful and technically impressive – societal utility func. of model quality! Schmittner et al (2005, GRL)
Earth System Curator (ESC) Can we (for better or worse!) do for models what PCMDI does for datasets? PCMDI datasets are data wrapped in a common/standard container (netCDF). The PCMDI container is self- describing. This means we can query and even combine (to some degree) the PCMDI datasets. A container analogy for modeling technology is the component architecture supported by systems like ESMF.
Building a coupled model oriented solution – modeling system as a component tree Some mathematics – component M –no side-effects –possible persistent internal state Supports representation as DAG such that e.g
Example of actual component tree. Tree of components from the GEOS-5 modeling system. Each box is an ESMF component. Components adhere to DAG semantics. Suarez et. al
Individual components in ESMF ESC builds on an ESMF-like component model. –ESMF Component Container for sequence of computation that implements a particular algorithm (physics simulation e.g. Navier-Stokes solver or technical function e.g history manager). An ESMF component exposes its external interfaces through an ESMF state. –ESMF State Container data type to transport data between components –ESMF Field Container data type that can be used to push/pop n- dimensional data with an associated mesh from an ESMF State.
Given a component model, like the ESMF paradigm, ESC… Describes a component in terms of –parameters that control the computation sequence. –states and fields that are passed into/out of the component. Provides two levels of description –potential and specific. –Potential is a list of all possible parameters and fields. It is a virtualized description in that it is not describing a specific instance. –Specific is a description of an instantiated component in which parameters are bound to specific values and fields and states are bound to specific values.
ESC component descriptions are in terms of XML schema. Curator-NMM –Described numerical model parameters e.g. timestep, system requirements, Gridspec –Describes numerical mesh. Curator-CIAO –Describes components inputs and outputs Curator-complete –Describes wiring together of components –A coupled component is also a component i.e. schema is recursive. Some details (more at …..
Curator-NMM The Curator-NMM schema describes model components, their content, and their connections. It is a superset of the NMM schema. The main constructs in the Curator-NMM schema are component, potential model, and model. Components are "composable" pieces of code that can be coupled together in various arrangements to form different models. A potential model consists of a group of components, and describes the set of possible models that can be built from those components. A model is a fully specified application based on a potential model and configuration choices.
Mosaic Grid Specification The Mosaic Grid Specification is a standardized description of muti-patch, structured grids being developed in coordination with CF activities.
Mosaic Grid Specification
Component – component compatibility checking. ESC can describe coupled (multi-component) systems. In principle ESC could support recombination of components from coupled systems e.g. couple component A (atmosphere dynamics) with component B (land-surface). Ideally, for this, compatibility constraints need to be expressed in a standard way.
Service architectures Standards services –Developing standardized descriptions is a well-proven method toward a service oriented approach e.g.
Some useful (but an incomplete list of) URLs Component models Metadata & standards metadata/ /Main_Page
Summary Earth System Curator project is an activity developing schema and tools to capture semantic information about models. –Such information provides basis for formally recording numerical experiments – computational Earth science lab book. –It also provides the basis for a formal approach reproducible numerical results – fewer Journal of Irreproducible Results candidates. Other efforts SBML (systems biology), CML (chemistry) - already uploads to Science submissions. Maybe soon a computational Earth science challenge will become, how to stop people doing dumb things with easy to use modeling services, rather than how to get people to use obtuse legacy modeling systems - maybe!