Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata Semantics and the Earth System Curator Rocky Dunlap Earth System Curator Georgia Tech.

Similar presentations


Presentation on theme: "Metadata Semantics and the Earth System Curator Rocky Dunlap Earth System Curator Georgia Tech."— Presentation transcript:

1 Metadata Semantics and the Earth System Curator Rocky Dunlap Earth System Curator Georgia Tech

2 Earth System Curator 3 year NSF funded project Funded Collaborators: Cecelia DeLuca (NCAR, PI) Balaji (GFDL, Co-PI) Don Middleton (NCAR, Co-PI) Chris Hill (MIT, Co-PI) Spencer Rugaber (Ga Tech, Co-PI) Leo Mark (Ga Tech) Julien Chastang (NCAR) Sergey Nikonov (GFDL) Angela Navarro (Ga Tech) Me (Ga Tech) Also working with: Lois and Katherine (NMM) Sophie Valcke (PRISM/OASIS) Others...

3 Curator Doctrine Currently a gap in the way we treat models and datasets (are they really so different?) Best description of a dataset is a comprehensive description of the model run that created the dataset (+ post processing) Model components are data objects for exchange Metadata-centric view Don’t start with a dataset and try to find the metadata... Start with good metadata that leads you to the datasets you want—even if they don’t yet exist! (No, really, that’s how we think.) Haiku are a valid form of model metadata

4 Earth System Curator Applications (Proofs of Concept) Catalog of modeling components along with comprehensive metadata CDP Curator (Michael B., Don, Luca, Julien) Demonstrate compatibility checking of components Primarily “technical” compatibility: platforms, compilers, required fields, field data types, calendar/time Demonstrate auto-generation of coupler component based on metadata Demonstrate automation of workflow tasks Model assembly, execution, archive, post- processing

5 Schema Development Fun To accomplish these goals, we need: Comprehensive descriptions of climate models: model metadata Includes both “semantic” and “syntactic” elements (“discovery” vs. “use”) Semantic: component name, type, owner, description, source code location, component architecture of model, platform, framework Syntactic: parameter settings, input datasets, boundary conditions, coupling details, grid coordinates

6 Lots of schemata... Component (NMM) Potential Model (NMM/Curator) Model (NMM) PMIOD/SMIOC (PRISM coupling spec) CRE/Curator Complete (workflow) Application (NMM) Gridspec

7 Reminiscing on Metadata Development Observations: (It seems) much of the community is in support of metadata development Although there are different opinions on levels of comprehensiveness People using metadata for different reasons: Annotate large datasets for retrieval Inform analysis tools Archiving of modeling components Automation of workflow (runtime environ.) Exchange datasets Each application requires different (but often overlapping) metadata

8 How should we think about schemata? Schemata are typically written for applications: I have a particular task I want to accomplish What metadata do I need to accomplish it? Write a schema. But... Now we have lots of schemata sitting around They may contain overlapping information Different ways of expressing the same information Each schema is used for a small number of tasks and understood by a small number of applications May need to reference elements in another schema, or aggregate elements from multiple schemata

9 A Unified View of Metadata Given all of the current metadata development efforts, Curator is promoting a unified view of metadata Metadata reuse must be a priority Metadata aggregation is key: schemata built (generated!) from repository of existing metadata elements (let’s call them types) We must think conceptually first and then syntactically—ideally, all groups will agree at both levels

10 What’s In a Schema? XML Schema (e.g., gridspec.xsd) XML Type GridTile ContactRegion Boundary GridDescriptor These are syntactic and conceptual constructs

11 Re-using schema elements How do I best use/re-use metadata elements from (multiple) schema(ta) to accomplish my particular application? You need: A conceptual understanding of the “types” (concepts) in the schema  Glossary The syntactic representation of that type (so you can actually use it in implementations)  XML Type Library WE ARE HERE

12 Multi-Schema Semantic Glossary Community-wide glossary of metadata types/concepts from multiple schemata Concepts aggregated into a centralized glossary Schema authors and users can get explanations/definitions of metadata elements. Examples: What does the contact_region tag mean in the Gridspec schema? What goes under the intent tag in the PMIOD? What is a potential model anyway?

13 Multi-Schema Semantic Glossary For each metadata concept provide: Human-readable definition Source schema Example usage Change notes/provenance Semantic relationships with other concepts (e.g., broader than, narrower than, part of, parent of, synonym, etc.)

14 Glossary Design Schema authors embed descriptions directly inside each XML schema Keep the human-readable definitions close to the formal syntactic definitions When schema is updated, it is easy to update glossary Glossary entries from distributed schemata are harvested (nightly?) and placed into centralized glossary (alternatively, live access?) Simple interface allows users to query glossary for concepts

15 Glossary Design Simple Knowledge Organization Systems (SKOS) data model for glossary entries http://www.w3.org/2004/02/skos/ SKOS supports knowledge organization systems like glossaries, thesauri, taxonomies, etc. RDF based – move the community toward languages with higher semantics (eventually get down to dataset level)

16 Sample SKOS RDF (Basic) potential model A set of components at the source code level that can potentially form an executable model.... Where should glossary entries be stored?

17 Example Annotated Schema... potential model A set of components at the source code level that can potentially form an executable model....

18 Sample SKOS RDF Triples esc:PotentialModel skos:Concept ‘potential model’ ‘A set of components at the source code level that can potentially form an executable model. ’ rdf:type skos:prefLabel skos:definition

19 Other SKOS Fields model The root element of a NMM Model description. There is one model per xml file. This model can have one or more related component configurations. simulation job run UK Met Office Unified Model The label 'model' was changed from NMM_Model. Katherine Bouton 2007-02-02

20 Semantic Relationships esc:PotentialModel nmm:Component skosx:childOf skos:related nmm:Model skosx:childOf prism:Model skos:synonym

21 Putting it all Together More info: http://glossary.earthsystemcurator.org/ http://www.earthsystemcurator.org/index.php?option=com_content&task=view& id=54&Itemid=84

22 Glossary Interface Search Schemata to Include Concept List Concept Details Links to related concepts

23 Syntactic Metadata Re-use So, if we agree on the concepts, what about the syntax? (i.e., XML representation) Concept = XML Type How do we share XML types from multiple schemata across the community? One idea: XML Type Library (or Catalog or Repository) “Preliminary Research” This is NOT the same thing as a single complex schema that describes everything – types are first class objects and can be manipulated individually

24 How does an XML Type Library work? Operations (web service?) Submit an XML type Get a list of all types Query for types Validate a type (Is my XML fragment a valid X?) Type membership (What types does my XML fragment fit?) Generate an XML Schema

25 How does an XML Type Library work? What metadata is available per type? Definition (e.g., XML Schema complexType) SKOS Glossary entry (for queries) Example usage scenarios Dependencies on other types Versioning metadata Available operations/web services “If you have an XML fragment of type X, you can use the following services...”

26 Use Case: Submit Type potential model A set of components at the source code... potential model A set of components at the source code... potential model A set of components at the source code... potential model A set of components at the source code... potential model A set of components at the source code... Existing Schemata Extract Types Submit to Type Library

27 Use Case: Validation Type Library... XML Fragment Validate “Valid” or “Invalid”

28 Use Case: Find Services Type Library... XML Fragment Find Services Interpolate_Service() Extract_Variable() Massage_Data() Another_Operation() List of available services based on type of fragment

29 Some Conclusions With large amount of metadata activity already in progress, metadata re-use must be a priority Conceptual understanding is essential Adoption of a glossary of concepts Syntactic agreement is desirable Concepts assigned concrete XML types and stored in a library

30 Some Haiku Retile the Shower Tessellated Mosaic First Write a Gridspec Forever summer questions and answers Curator complete Potential Model Like a cool autumn breeze Potentially mad

31 Extra Slides...

32 Example Gridspec Applications Not written for one particular application – general grid metadata has many potential uses IPCC Model Documentation table Moving variables to common grid for analysis Regridding vertical from 24 to 40 levels There are two levels: conceptual and syntactic – ideally, we would agree at both of these levels! If we only have conceptual agreement—we can still interoperate, but must do transformations

33 Type Reuse Scenario Full Schema Partial Schemata

34 Application: NARCCAP Vertical Interpolation Gridspec.xsd Partial Schema Description of vertical coordinate scheme Metadata required for NARCCAP experiment: interpolate from 24 to 40 vertical levels }

35 Schema Aggregation Scenario Schema ASchema BSchema CSchema D XML Type Application Schema

36 Application: Component Compatibility Checking NMM Component Gridspec Coupling Spec (PMIOD) Application Schema Technical details (e.g., supported platforms) Required coupling fields Horizontal grid descriptor All metadata required for compatibility checking of two components }


Download ppt "Metadata Semantics and the Earth System Curator Rocky Dunlap Earth System Curator Georgia Tech."

Similar presentations


Ads by Google