ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF
WMO Metadata Workshop – Beijing Sep 2005 The SIMDAT/VGISC project SIMDAT EU funded GRID project 7 Technologies: Grid infrastructure, Virtual Organisation, Ontologies, Analysis Services, Workflows, Distributed data access, Knowledge Services 4 Activities: Automotive, Areospace, Pharmacy and Meteorology Meteorology activity: build a Virtual GISC (V-GISC) DWD UKMO MétéoFrance EUMETSAT ECMWF
ECMWF WMO Metadata Workshop – Beijing Sep 2005 V-GISC infrastructure
ECMWF WMO Metadata Workshop – Beijing Sep 2005 V-GISC Conceptual view Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization The Virtual Database Service provides a single view of partners databases
ECMWF WMO Metadata Workshop – Beijing Sep 2005 VGISC Distributed Architecture
ECMWF WMO Metadata Workshop – Beijing Sep 2005 VGISC Node Functional Design
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Why do we need metadata (in this project)? Create a catalogue (discovery metadata) Searchable (Keyword, Geographical location, Time range) Browsable (Directory hierarchy) Implement the V-GISC (service metadata) Describe where the data resides (physical location) Describe how to request the data Describe the data format (useful for offering list of transformations, e.g. sub-sampling of gridded data, plots or format conversions) Describe associated data policies
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Study of the WMO core Starting point XML files available on the WMO web site XML files from DWD earlier prototype Trying to describe ECMWF archive ( GRIB fields)
ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Root element or Namespaces are a nightmare to use (especially using XPath when there is a default namespace)
ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Keywords Russian Federation Moscow region Temperature Clouds Meteorology Observation Pressure Rainfall Snow Snowfall Weather Wind Phenomenon Or… EARTH SCIENCE > Cryosphere > Sea Ice EARTH SCIENCE > Atmosphere EARTH SCIENCE > Oceans EARTH SCIENCE > Solid Earth ocean, atmosphere, ice, land Or… METAR aviation hourly weather observation temperature dew point precipitation amount visibility cloud amount type height weather runway colour state
ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Geographical extent Or… CCCC2 Or…
ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Temporal extent monthly daily Or… T00:00: T06:00:00 Or… creationDate
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Repetition of XML elements (means extension) mb Global monthly daily
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Repetition of XML elements (means redefinition) Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector S Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector T
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Findings A flexible format, that leads to a lack of consistency Different way to encode geographical extent, keywords and temporal extents Missing information (for the V-GISC) To create a directory To locate the data To create retrieval requests To describe available transformations To implement data policies
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Findings (cont.) Seems to be designed for human consumption Free text in XML elements Not scalable Some document may change frequently (hourly?) Some documents are orders of magnitude larger than data itself Cannot represent very large archives with small granularity
ECMWF WMO Metadata Workshop – Beijing Sep 2005 SIMDAT/VGISC problem Each site has its own practices We have to be ready for variability in the XML We will have to handle XML from other WMO programmes We need to handle tens of thousands of documents Lot of repeated information We need fast search We need to automatically Index the keywords, the geographical extent and the temporal extent Create a browsable directory (similar the NCAR’s Community data portal) Locate and retrieve the data Implement the data policy
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Solution: split XML documents into fragments WMO core metadata is structured Some part are shared amongst many documents All metadata share the Core part All UKMO metadata share the Owner part All synops (should) share the same description All observations at Heathrow have the same location The date part is variable but is very small WMO UKMO Synop Heathrow Core Owner Data type Station (geographical extent) Date (temporal extent)
ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML fragments are hierarchically linked WMOUKMO SynopHeathrow Heathrow Synop Heathrow Synop
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragments: advantages Factorizing commonalities into static fragments Reduces size of XML documents Indexation done once Avoid redundancy of information Faster searches Frequently updated documents are small Manageable Scalable Complete XML document can be rebuilt For exchange outside the V-GISC
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Indexing of XML fragments WMOUKMO SynopHeathrow Heathrow Synop Heathrow Synop Keywords Geographical Extent Temporal Extent
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Prototype implementation XML Fragment are stored as “text” Fragment table Hierarchy table Indexed at insertion time Keywords table Locations table Periods table Directory table Implemented with MySQL With OpenGIS extension With text search extension Indexes are “inherited” OO approach
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Object Oriented Approach - Behaviours WMOUKMO SynopHeathrow Heathrow Synop Heathrow Synop Index as geography Index as keyword Index as period Index as keyword
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragment properties - Behaviours Only the owner of the data knows how to : Describe the data (Indexation information) Request the data (Create internal request) Extract a subset of the data (Define a interface to extract a subset) Associated to each fragments ancillary metadata can be defined to describe how to index, request and sub-select the data Behaviours are inherited Object oriented approach
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Behaviours example: indexing //identificationInfo/descriptiveKeywords //identificationInfo/dataExtent/geographicElement/boundingBox //identificationInfo/dataExtent/geographicElement/polygon //identificationInfo/referenceDate/date //identificationInfo/dataExtent/temporalElement //identificationInfo/referenceDate/period //identificationInfo/topicCategory
ECMWF WMO Metadata Workshop – Beijing Sep 2005 extension A element from the “ namespace is embedded in all the fragments It contains all information needed to implement the V-GISC that is not defined by the WMO core because they are not relevant outside the scope of the V-GISC Internal unique ID Hierarchy relationship Physical location (which V-GISC node holds the data) Information used to create data request Information used to create web pages It is removed when full XML document is recomposed for use outside the V-GISC
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragment example urn:akrotiri.synop.land.second.record urn:akrotiri urn:int.wmo.synop.land.second.record ecmwf.obs
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Variables and Requests Some datasets have two many items Impossible to describe every one of them But describing the whole dataset is simple Some datasets are very homogenous E.g. same parameters for a long period of time This can be described in a compact form ( and ) But we still need to specify that individual dates can be requested by the user
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Variables and requests (cont.) Associate two elements with an XML fragment: Hold information specific on how to generate a valid request to the data repository Holds information on how to create a web interface to let the user select items from the dataset Web portal We use WMO core for discovery We use the element to present selection dialogues to the user
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragment example: ECMWF Reanalysis urn:int.ecmwf.era40.sfc urn:int.wmo.core ecmwf.mars e4 sfc marser t msl ECMWF 40 Years reanalysis ERA40 ERA-40 in GRIB NWP Outputs > ECMWF > 40 years reanalysis …
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Directory structure Problem: create a browsable hierarchy of topics, as the “Google directory” (see NCAR’s community data portal) Not to be confuse with the internal “fragment hierarchy” which is not exposed to the end user Currently using the element NWP Outputs > ECMWF > 40 years reanalysis The same product can appear in several locations of the directory Observations > By Type > Profile > Temp Land Observations > By Region > Asia > China Usage should be recommended by WMO
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Conclusion The approach taken in the V-GISC should help us support the large variety of XML documents Nevertheless, the standard is too flexible Lot of programming is required to support all possible variations The WMO must provide “best practices” guidelines How to encode point in time, how to encode ranges, … A topic hierarchy must be defined, to create the directory WMO core metadata needs only contain sufficient information for discovery The rest can be implemented as a series of local extensions, as long as they are not exported or exchanged