Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF.

Similar presentations


Presentation on theme: "ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF."— Presentation transcript:

1 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

2 WMO Metadata Workshop – Beijing Sep 2005 The SIMDAT/VGISC project SIMDAT  EU funded GRID project  7 Technologies: Grid infrastructure, Virtual Organisation, Ontologies, Analysis Services, Workflows, Distributed data access, Knowledge Services  4 Activities: Automotive, Areospace, Pharmacy and Meteorology Meteorology activity: build a Virtual GISC (V-GISC)  DWD  UKMO  MétéoFrance  EUMETSAT  ECMWF

3 ECMWF WMO Metadata Workshop – Beijing Sep 2005 V-GISC infrastructure

4 ECMWF WMO Metadata Workshop – Beijing Sep 2005 V-GISC Conceptual view Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization The Virtual Database Service provides a single view of partners databases

5 ECMWF WMO Metadata Workshop – Beijing Sep 2005 VGISC Distributed Architecture

6 ECMWF WMO Metadata Workshop – Beijing Sep 2005 VGISC Node Functional Design

7 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Why do we need metadata (in this project)? Create a catalogue (discovery metadata)  Searchable (Keyword, Geographical location, Time range)  Browsable (Directory hierarchy) Implement the V-GISC (service metadata)  Describe where the data resides (physical location)  Describe how to request the data  Describe the data format (useful for offering list of transformations, e.g. sub-sampling of gridded data, plots or format conversions)  Describe associated data policies

8 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Study of the WMO core Starting point  XML files available on the WMO web site  XML files from DWD earlier prototype  Trying to describe ECMWF archive (1.3 10 10 GRIB fields)

9 ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Root element or Namespaces are a nightmare to use (especially using XPath when there is a default namespace)

10 ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Keywords Russian Federation Moscow region Temperature Clouds Meteorology Observation Pressure Rainfall Snow Snowfall Weather Wind Phenomenon Or… EARTH SCIENCE > Cryosphere > Sea Ice EARTH SCIENCE > Atmosphere EARTH SCIENCE > Oceans EARTH SCIENCE > Solid Earth ocean, atmosphere, ice, land Or… METAR aviation hourly weather observation temperature dew point precipitation amount visibility cloud amount type height weather runway colour state

11 ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Geographical extent 50.78 6.1 Or… CCCC2 Or… -126.3 39.9

12 ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML Temporal extent 0100-01-01 0299-12-31 monthly daily Or… 2004-02-05T00:00:00 2004-02-05T06:00:00 Or… 2004-01-28 creationDate

13 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Repetition of XML elements (means extension) 3.5 992.5 mb -180 +180 -90 +90 Global 1900-01-01 1999-12-31 monthly daily

14 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Repetition of XML elements (means redefinition) Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector S -180 -60 0 90 Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector T -60 60 0 90

15 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Findings A flexible format, that leads to a lack of consistency  Different way to encode geographical extent, keywords and temporal extents Missing information (for the V-GISC)  To create a directory  To locate the data  To create retrieval requests  To describe available transformations  To implement data policies

16 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Findings (cont.) Seems to be designed for human consumption  Free text in XML elements Not scalable  Some document may change frequently (hourly?)  Some documents are orders of magnitude larger than data itself  Cannot represent very large archives with small granularity

17 ECMWF WMO Metadata Workshop – Beijing Sep 2005 SIMDAT/VGISC problem Each site has its own practices  We have to be ready for variability in the XML  We will have to handle XML from other WMO programmes We need to handle tens of thousands of documents  Lot of repeated information  We need fast search We need to automatically  Index the keywords, the geographical extent and the temporal extent  Create a browsable directory (similar the NCAR’s Community data portal)  Locate and retrieve the data  Implement the data policy

18 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Solution: split XML documents into fragments WMO core metadata is structured Some part are shared amongst many documents  All metadata share the Core part  All UKMO metadata share the Owner part  All synops (should) share the same description  All observations at Heathrow have the same location  The date part is variable but is very small WMO UKMO Synop Heathrow 2005-10-12 Core Owner Data type Station (geographical extent) Date (temporal extent)

19 ECMWF WMO Metadata Workshop – Beijing Sep 2005 XML fragments are hierarchically linked WMOUKMO SynopHeathrow Heathrow Synop Heathrow Synop 2005-10-12

20 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragments: advantages Factorizing commonalities into static fragments  Reduces size of XML documents  Indexation done once Avoid redundancy of information  Faster searches Frequently updated documents are small  Manageable  Scalable Complete XML document can be rebuilt  For exchange outside the V-GISC

21 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Indexing of XML fragments WMOUKMO SynopHeathrow Heathrow Synop Heathrow Synop 2005-10-12 Keywords Geographical Extent Temporal Extent

22 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Prototype implementation XML Fragment are stored as “text”  Fragment table  Hierarchy table Indexed at insertion time  Keywords table  Locations table  Periods table  Directory table Implemented with MySQL  With OpenGIS extension  With text search extension Indexes are “inherited”  OO approach

23 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Object Oriented Approach - Behaviours WMOUKMO SynopHeathrow Heathrow Synop Heathrow Synop 2005-10-12 Index as geography Index as keyword Index as period Index as keyword

24 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragment properties - Behaviours Only the owner of the data knows how to :  Describe the data (Indexation information)  Request the data (Create internal request)  Extract a subset of the data (Define a interface to extract a subset) Associated to each fragments ancillary metadata can be defined to describe how to index, request and sub-select the data Behaviours are inherited  Object oriented approach

25 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Behaviours example: indexing //identificationInfo/descriptiveKeywords //identificationInfo/dataExtent/geographicElement/boundingBox //identificationInfo/dataExtent/geographicElement/polygon //identificationInfo/referenceDate/date //identificationInfo/dataExtent/temporalElement //identificationInfo/referenceDate/period //identificationInfo/topicCategory

26 ECMWF WMO Metadata Workshop – Beijing Sep 2005 extension A element from the “http://www.vgisc.org/” namespace is embedded in all the fragments It contains all information needed to implement the V-GISC that is not defined by the WMO core because they are not relevant outside the scope of the V-GISC  Internal unique ID  Hierarchy relationship  Physical location (which V-GISC node holds the data)  Information used to create data request  Information used to create web pages It is removed when full XML document is recomposed for use outside the V-GISC

27 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragment example http://www.vgisc.org/ urn:akrotiri.synop.land.second.record.20050629 urn:akrotiri urn:int.wmo.synop.land.second.record ecmwf.obs 2005-06-29

28 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Variables and Requests Some datasets have two many items  Impossible to describe every one of them  But describing the whole dataset is simple Some datasets are very homogenous  E.g. same parameters for a long period of time  This can be described in a compact form ( and )  But we still need to specify that individual dates can be requested by the user

29 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Variables and requests (cont.) Associate two elements with an XML fragment:  Hold information specific on how to generate a valid request to the data repository  Holds information on how to create a web interface to let the user select items from the dataset Web portal  We use WMO core for discovery  We use the element to present selection dialogues to the user

30 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Fragment example: ECMWF Reanalysis http://www.vgisc.org/ urn:int.ecmwf.era40.sfc urn:int.wmo.core ecmwf.mars e4 sfc marser 1980-01-01 1990-12-31 2t msl 0000 0600 1200 1800 ECMWF 40 Years reanalysis ERA40 ERA-40 in GRIB NWP Outputs > ECMWF > 40 years reanalysis 1980-01-01 1990-12-31 …

31 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Directory structure Problem: create a browsable hierarchy of topics, as the “Google directory” (see NCAR’s community data portal) Not to be confuse with the internal “fragment hierarchy” which is not exposed to the end user Currently using the element NWP Outputs > ECMWF > 40 years reanalysis The same product can appear in several locations of the directory Observations > By Type > Profile > Temp Land Observations > By Region > Asia > China Usage should be recommended by WMO

32 ECMWF WMO Metadata Workshop – Beijing Sep 2005 Conclusion The approach taken in the V-GISC should help us support the large variety of XML documents Nevertheless, the standard is too flexible  Lot of programming is required to support all possible variations The WMO must provide “best practices” guidelines  How to encode point in time, how to encode ranges, … A topic hierarchy must be defined, to create the directory WMO core metadata needs only contain sufficient information for discovery  The rest can be implemented as a series of local extensions, as long as they are not exported or exchanged


Download ppt "ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF."

Similar presentations


Ads by Google