Metadata Semantics and the Earth System Curator Rocky Dunlap Earth System Curator Georgia Tech.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Visual Scripting of XML
Earth System Curator Spanning the Gap Between Models and Datasets.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
Feature Analysis of Coupling Technologies for Climate Models Spencer Rugaber, Rocky Dunlap and Leo Mark College of Computing Georgia Institute of Technology.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
A Registry for controlled vocabularies at the Library of Congress
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Metadata Schema Registries: update on current activity Rachel Heery, UKOLN, University of Bath September 2005.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
The Earth System Curator Metadata Representations Prototype Portal in Collaboration with ESMF and ESG Rocky Dunlap Spencer Rugaber Georgia Tech.
Open Access, and more … Leo Mark, Ph.D. School of Computer Science Georgia Tech Blind Orion Searching for the Rising Sun. Nicolas Poussin (1594–1665) “If.
Software Engineering 2003 Jyrki Nummenmaa 1 CASE Tools CASE = Computer-Aided Software Engineering A set of tools to (optimally) assist in each.
Using Vocabulary Services in Validation of Water Data May 2010 Simon Cox, JRC Jonathan Yu & David Ratcliffe, CSIRO.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.
Introduction to MDA (Model Driven Architecture) CYT.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
ESMF Code Generation Rocky Dunlap Spencer Rugaber Leo Mark Georgia Tech College of Computing.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Domain Modeling In FREMA David Millard Yvonne Howard Hugh Davis Gary Wills Lester Gilbert Learning Societies Lab University of Southampton, UK.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The european ITM Task Force data structure F. Imbeaux.
“curator” DB design Curator meeting, GFDL, Sep 20.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Semantic Web Technologies Research Topics and Projects discussion Brief Readings Discussion Research Presentations.
PREMIS Controlled vocabularies Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
GREGORY SILVER KUSHEL RIA BELLPADY JOHN MILLER KRYS KOCHUT WILLIAM YORK Supporting Interoperability Using the Discrete-event Modeling Ontology (DeMO)
It’s all semantics! The premises and promises of the semantic web. Tony Ross Centre for Digital Library Research, University of Strathclyde
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Page 1© Crown copyright 2004 FLUME Metadata Steve Mullerworth 3 rd -4 th October May 2006.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
ESMF Code Generation with Cupid Update and Demo October 2009 Rocky Dunlap Spencer Rugaber Leo Mark Georgia Tech College of Computing.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Page 1© Crown copyright 2004 FLUME Marco Christoforou, Rupert Ford, Steve Mullerworth, Graham Riley, Allyn Treshansky, et. al. 19 October 2007.
FRErator – the Bridge between FRE and Curator DB.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Curator: Gap Analysis (from a schema perspective) Rocky Dunlap Spencer Rugaber Georgia Tech.
Building Community and Capability through Common Infrastructure: ESMF and the Earth System Curator Cecelia DeLuca MAP Meeting College.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
ESMF and the future of end-to-end modeling Sylvia Murphy National Center for Atmospheric Research
Application of RDF-OWL in the ESG Ontology Sylvia Murphy: Julien Chastang: Luca Cinquini:
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech 5 th GO-ESSP Community Meeting.
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Metadata Support for Model Intercomparison Projects Sylvia Murphy: Cecelia DeLuca: Julien.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
The Re3gistry software and the INSPIRE Registry
PREMIS Tools and Services
Metadata Development in the Earth System Curator
Presentation transcript:

Metadata Semantics and the Earth System Curator Rocky Dunlap Earth System Curator Georgia Tech

Earth System Curator 3 year NSF funded project Funded Collaborators: Cecelia DeLuca (NCAR, PI) Balaji (GFDL, Co-PI) Don Middleton (NCAR, Co-PI) Chris Hill (MIT, Co-PI) Spencer Rugaber (Ga Tech, Co-PI) Leo Mark (Ga Tech) Julien Chastang (NCAR) Sergey Nikonov (GFDL) Angela Navarro (Ga Tech) Me (Ga Tech) Also working with: Lois and Katherine (NMM) Sophie Valcke (PRISM/OASIS) Others...

Curator Doctrine Currently a gap in the way we treat models and datasets (are they really so different?) Best description of a dataset is a comprehensive description of the model run that created the dataset (+ post processing) Model components are data objects for exchange Metadata-centric view Don’t start with a dataset and try to find the metadata... Start with good metadata that leads you to the datasets you want—even if they don’t yet exist! (No, really, that’s how we think.) Haiku are a valid form of model metadata

Earth System Curator Applications (Proofs of Concept) Catalog of modeling components along with comprehensive metadata CDP Curator (Michael B., Don, Luca, Julien) Demonstrate compatibility checking of components Primarily “technical” compatibility: platforms, compilers, required fields, field data types, calendar/time Demonstrate auto-generation of coupler component based on metadata Demonstrate automation of workflow tasks Model assembly, execution, archive, post- processing

Schema Development Fun To accomplish these goals, we need: Comprehensive descriptions of climate models: model metadata Includes both “semantic” and “syntactic” elements (“discovery” vs. “use”) Semantic: component name, type, owner, description, source code location, component architecture of model, platform, framework Syntactic: parameter settings, input datasets, boundary conditions, coupling details, grid coordinates

Lots of schemata... Component (NMM) Potential Model (NMM/Curator) Model (NMM) PMIOD/SMIOC (PRISM coupling spec) CRE/Curator Complete (workflow) Application (NMM) Gridspec

Reminiscing on Metadata Development Observations: (It seems) much of the community is in support of metadata development Although there are different opinions on levels of comprehensiveness People using metadata for different reasons: Annotate large datasets for retrieval Inform analysis tools Archiving of modeling components Automation of workflow (runtime environ.) Exchange datasets Each application requires different (but often overlapping) metadata

How should we think about schemata? Schemata are typically written for applications: I have a particular task I want to accomplish What metadata do I need to accomplish it? Write a schema. But... Now we have lots of schemata sitting around They may contain overlapping information Different ways of expressing the same information Each schema is used for a small number of tasks and understood by a small number of applications May need to reference elements in another schema, or aggregate elements from multiple schemata

A Unified View of Metadata Given all of the current metadata development efforts, Curator is promoting a unified view of metadata Metadata reuse must be a priority Metadata aggregation is key: schemata built (generated!) from repository of existing metadata elements (let’s call them types) We must think conceptually first and then syntactically—ideally, all groups will agree at both levels

What’s In a Schema? XML Schema (e.g., gridspec.xsd) XML Type GridTile ContactRegion Boundary GridDescriptor These are syntactic and conceptual constructs

Re-using schema elements How do I best use/re-use metadata elements from (multiple) schema(ta) to accomplish my particular application? You need: A conceptual understanding of the “types” (concepts) in the schema  Glossary The syntactic representation of that type (so you can actually use it in implementations)  XML Type Library WE ARE HERE

Multi-Schema Semantic Glossary Community-wide glossary of metadata types/concepts from multiple schemata Concepts aggregated into a centralized glossary Schema authors and users can get explanations/definitions of metadata elements. Examples: What does the contact_region tag mean in the Gridspec schema? What goes under the intent tag in the PMIOD? What is a potential model anyway?

Multi-Schema Semantic Glossary For each metadata concept provide: Human-readable definition Source schema Example usage Change notes/provenance Semantic relationships with other concepts (e.g., broader than, narrower than, part of, parent of, synonym, etc.)

Glossary Design Schema authors embed descriptions directly inside each XML schema Keep the human-readable definitions close to the formal syntactic definitions When schema is updated, it is easy to update glossary Glossary entries from distributed schemata are harvested (nightly?) and placed into centralized glossary (alternatively, live access?) Simple interface allows users to query glossary for concepts

Glossary Design Simple Knowledge Organization Systems (SKOS) data model for glossary entries SKOS supports knowledge organization systems like glossaries, thesauri, taxonomies, etc. RDF based – move the community toward languages with higher semantics (eventually get down to dataset level)

Sample SKOS RDF (Basic) potential model A set of components at the source code level that can potentially form an executable model.... Where should glossary entries be stored?

Example Annotated Schema... potential model A set of components at the source code level that can potentially form an executable model....

Sample SKOS RDF Triples esc:PotentialModel skos:Concept ‘potential model’ ‘A set of components at the source code level that can potentially form an executable model. ’ rdf:type skos:prefLabel skos:definition

Other SKOS Fields model The root element of a NMM Model description. There is one model per xml file. This model can have one or more related component configurations. simulation job run UK Met Office Unified Model The label 'model' was changed from NMM_Model. Katherine Bouton

Semantic Relationships esc:PotentialModel nmm:Component skosx:childOf skos:related nmm:Model skosx:childOf prism:Model skos:synonym

Putting it all Together More info: id=54&Itemid=84

Glossary Interface Search Schemata to Include Concept List Concept Details Links to related concepts

Syntactic Metadata Re-use So, if we agree on the concepts, what about the syntax? (i.e., XML representation) Concept = XML Type How do we share XML types from multiple schemata across the community? One idea: XML Type Library (or Catalog or Repository) “Preliminary Research” This is NOT the same thing as a single complex schema that describes everything – types are first class objects and can be manipulated individually

How does an XML Type Library work? Operations (web service?) Submit an XML type Get a list of all types Query for types Validate a type (Is my XML fragment a valid X?) Type membership (What types does my XML fragment fit?) Generate an XML Schema

How does an XML Type Library work? What metadata is available per type? Definition (e.g., XML Schema complexType) SKOS Glossary entry (for queries) Example usage scenarios Dependencies on other types Versioning metadata Available operations/web services “If you have an XML fragment of type X, you can use the following services...”

Use Case: Submit Type potential model A set of components at the source code... potential model A set of components at the source code... potential model A set of components at the source code... potential model A set of components at the source code... potential model A set of components at the source code... Existing Schemata Extract Types Submit to Type Library

Use Case: Validation Type Library... XML Fragment Validate “Valid” or “Invalid”

Use Case: Find Services Type Library... XML Fragment Find Services Interpolate_Service() Extract_Variable() Massage_Data() Another_Operation() List of available services based on type of fragment

Some Conclusions With large amount of metadata activity already in progress, metadata re-use must be a priority Conceptual understanding is essential Adoption of a glossary of concepts Syntactic agreement is desirable Concepts assigned concrete XML types and stored in a library

Some Haiku Retile the Shower Tessellated Mosaic First Write a Gridspec Forever summer questions and answers Curator complete Potential Model Like a cool autumn breeze Potentially mad

Extra Slides...

Example Gridspec Applications Not written for one particular application – general grid metadata has many potential uses IPCC Model Documentation table Moving variables to common grid for analysis Regridding vertical from 24 to 40 levels There are two levels: conceptual and syntactic – ideally, we would agree at both of these levels! If we only have conceptual agreement—we can still interoperate, but must do transformations

Type Reuse Scenario Full Schema Partial Schemata

Application: NARCCAP Vertical Interpolation Gridspec.xsd Partial Schema Description of vertical coordinate scheme Metadata required for NARCCAP experiment: interpolate from 24 to 40 vertical levels }

Schema Aggregation Scenario Schema ASchema BSchema CSchema D XML Type Application Schema

Application: Component Compatibility Checking NMM Component Gridspec Coupling Spec (PMIOD) Application Schema Technical details (e.g., supported platforms) Required coupling fields Horizontal grid descriptor All metadata required for compatibility checking of two components }