TWC Experience in ontology engineering with the Global Change Information System Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Towards a Common Provenance Model for Research Publications Linyun Fu Xiaogang Ma Patrick West Stace Beaulieu.
RDF Tutorial.
Dr. Alexandra I. Cristea RDF.
Semantic Web Tools for Authoring and Using Analysis Results Richard Fikes Robert McCool Deborah McGuinness Sheila McIlraith Jessica Jenkins Knowledge Systems.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
TWC Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies Xiaogang (Marshall) Ma Tetherless World Constellation.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Global Change Information System: Information Model and Semantic Application Prototypes (GCIS-IMSAP) Status 01/08/2013 Stephan Zednik 1, Curt Tilmes 2,
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
Logics for Data and Knowledge Representation
References: [1] [2] [3] Acknowledgments:
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Persistent Identification of Agents and Objects of Global Change: Progress in the Global Change Information System Peter Fox, RPI Curt Tilmes, NASA Xiaogang.
References: [1] Branch, B.D., Fosmire, M., The role of interdisciplinary GIS and data curation librarians in enhancing authentic scientific research.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
TWC Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program Xiaogang Ma a, Jin Guang Zheng.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC A use case-driven iterative method for building a provenance-aware GCIS ontology Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
‘Ontology Management’ Peter Fox (Semantic Web Cluster lead)
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Ontology Evolution: A Methodological Overview
Foundations; information modeling
Towards Executable Provenance Graphs for Reported Results in Research Publications Linyun Fu Xiaogang Ma Patrick West
Modeling Data Set Versioning Operations
Presentation transcript:

TWC Experience in ontology engineering with the Global Change Information System Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute Presentation for the ESIP Semantic Web Cluster, 4/22/2014

TWC Acknowledgements Project: –Global Change Information System: Information Model and Semantic Application Prototypes, funded by NSF through UCAR Collaborators: –Peter Fox (PI, TWC/RPI) –Curt Tilmes (Co-PI, NASA/USGCRP) –Xiaogang (Marshall) Ma (Project lead, TWC/RPI) –Jin Guang Zheng (TWC/RPI) –Justin Goldstein (USGCRP/UCAR) –Stephan Zednik (TWC/RPI) –Linyun Fu (TWC/RPI) –Brian Duggan (USGCRP/UCAR) –Steve Aulenbach (USGCRP/UCAR) –Patrick West (TWC/RPI) 2

TWC Contents 1.Ontologies in computer science 2.The GCIS Ontology 3.Experience from ontology engineering practice 4.Additional operations and tools to refine an ontology 3

TWC 1. Ontologies in computer science An ontology spectrum Italic text explains typical features of concepts and relationships in each ontology type (from Ma 2011, adapted from Borgo et al., 2005; McGuinness, 2003; Obrst, 2003; Uschold and Gruninger, 2004; Welty, 2002) 4

TWC A few examples following that spectrum Catalog/Glossary –Neuendorf, K.K.E., Mehl, J.J.P., Jackson, J.A., Glossary of Geology, 5th edition. American Geological Institute: Alexandria, VA, USA, 800 pp. See latest version at: Taxonomy –BGS Rock Classification Scheme, see: Thesaurus –AQSIQ, GB/T The Terminology Classification Codes of Geology and Mineral Resources. General Administration of Quality Supervision, Inspection and Quarantine of P.R. China (AQSIQ). Standards Press of China, Beijing, China pp. (In CN&EN) Conceptual Schema –NADM Steering Committee, NADM Conceptual Model 1.0—A conceptual model for geologic map information: U.S. Geological Survey Open-File Report , North American Geologic Map Data Model (NADM) Steering Committee, Reston, VA, USA, 58 pp. See: Ontologies encoded in RDF format –Semantic Web for Earth and Environmental Terminology (SWEET). See: 5

TWC Another dimension of ontologies Top-level ontologies describe very general concepts like space, time, matter, object, event, action, etc., which are independent of a particular problem or domain Domain ontologies and task ontologies describe, respectively, the vocabulary related to a generic domain (e.g., medicine) or a generic task or activity (e.g., diagnosing) Application ontologies describe concepts depending both on a particular domain and task, which are often specializations of both the related ontologies top-level ontology domain ontologytask ontology application ontology (Guarino, 1997) Ontologies according to their level of dependence on a particular task or point of view Specialization of 6

TWC A few examples following that dimension Top-level ontology –DOLCE: Descriptive Ontology for Linguistic and Cognitive Engineering, see: Domain ontologies and Task ontologies –PROV-O: The W3C PROV Ontology (for represent and interchange provenance information), see: –BIBO: The Bibliographic Ontology, see: –ORG: The Organization Ontology, see: –DCAT: The Data Catalog Vocabulary, see: Application ontology –GCIS: The GCIS Ontology, see: imsap/GCISOntologyhttp://tw.rpi.edu/web/project/gcis- imsap/GCISOntology 7

TWC A few methods for ontology engineering Ontology Design Patterns –Widely used are Content Ontology Design Patterns: small ontologies that mediate between use cases and ontology design solutions (Gangemi and Presutti, 2009) Agile Methods for Software Engineering –Adaptive planning; evolutionary development; a time-boxed iteration; and rapid and flexible response to change (Cohen et al., 2004) Use case-driven iterative approach –Use cases for identifying questions, resources & methods; small team & mixed skills; a context for collaboration between computer scientists & domain scientists; review & iteration; rapid prototype (Fox and McGuinness, 2008) 8

TWC The use case-driven iterative approach More details at: 9

TWC 2. The GCIS Ontology Global Change Information System (GCIS) –An information system under development through the United States Global Change Research Program (USGCRP) that establishes data interfaces and interoperable repositories of climate and global change data which can be easily and efficiently accessed, integrated with other data sets, maintained over time and expanded as needed into the future GCIS Ontology –An application ontology designed for representing and capturing provenance information in GCIS –Currently focusing on the third National Climate Assessment draft report (draft NCA3) –More information: imsap/GCISOntologyhttp://tw.rpi.edu/web/project/gcis- imsap/GCISOntology 10

TWC Ontology reuse: improve interoperability PROV-O: W3C Provenance Ontology DCTerms:Dublin Core Metadata Terms DCType:Dublin Core Types FOAF: Friend Of A Friend Vocabulary BIBO: Bibliographic Ontology ORG: Organization Ontology SKOS:Simple Knowledge Organization System OWL:Web Ontology Language RDF:Resource Description Framework RDFS:RDF Schema XSD:XML Schema 11

TWC PROV-O DCTerms DCType FOAF BIBO ORG SKOS OWL RDF RDFS xsd:. 12

TWC Ontology engineering: use case analysis 13 Title: Visit data center website of dataset used to generate a report figure Actor and system: a reader of the draft NCA3 on the GCIS website Flow of interactions: A reader wishes to identify the source of the data used to produce a particular figure in the draft NCA3. A reference to the paper in which the image contained in this figure was originally published appears in the figure caption. Clicking that reference displays a page of metadata information about the paper, including links to the datasets used in that paper. Pursuing each of those links presents a page of metadata information about the dataset, including a link back to the agency/data center web page describing the dataset in more detail and making the actual data available for order or download. The first use case

TWC Use case analysis: Concept map Concept map –Graphical tool for organizing and representing knowledge (Novak and Cañas, 2008) –Often used as the first step in information models that are pre- cursors to ontology engineering (Starr and de Oliveira, 2013) 14 The IHMC CmapTools is widely used for use case analysis in Semantic Web applications, see:

TWC An intuitive concept map of the 1st use case 15

TWC Classes and properties recognized from the use case An intuitive concept map of the use case 16

TWC Classes and properties recognized from the use case An intuitive concept map of the use case From an intuitive model to an ontology: (1)A defined class or property should be meaningful and robust enough to meet the requirements of various use cases (2)An ontology can be extended by adding classes and properties recognized from new use cases through the iterative approach 17

TWC Title: Identify roles of people in the generation of a chapter in the draft NCA3 Actor and system: a viewer of the GCIS website Flow of interactions: A viewer sees that Chapter 6 (Agriculture) in the draft NCA3 was written by a group of authors mentioned in a list. On the title page of that chapter the reader can view the role of each author, e.g., convening lead author, lead author or contributing author, in the generation of this report chapter. We decided to use the PROV-O ontology to describe this use case The second use case 18

TWC The three Starting Point classes in PROV-O ontology and the properties that relate them Source: 19

TWC Mapping the use case into PROV-O isA Writing of Chapter 6 in NCA3 Chapter 6 in NCA3 Author of Chapter 6 20

TWC Roles of agents in an activity in PROV-O Source: 21

TWC Mapping roles of chapter authors into PROV-O Writing of Chapter 6 in NCA3 isA Author of Chapter 6 isA Convening lead author Lead author Contributing author isA 22

TWC Here only three of the eight authors of this chapter are shown. Each author had a specific role for this chapter. Roles of people in the activity ‘Writing of Chapter 6’

TWC Re-using existing ontologies for the GCIS ontology By such mappings we can use reasoners that are suitable for the PROV-O ontology, and thus to retrieve provenance graphs from the established GCIS 24

TWC We have had more use case analyses to build the GCIS ontology 25

TWC 3. Experience from ontology engineering practice Informal message: Some times, a method is not a method at all. 26

TWC 3. Experience from ontology engineering practice For human: A modeling approach –Transform the knowledge in our brains into a list of concepts and their inter-relationships –Level of details: application needs & interoperability think about the ontology spectrum and the dimension of ontologies For machine: An encoding approach –Record the model in a format that can be used by computers in a specific context CSV, UML, XML, RDF/XML, Turtle, N3, etc. 27

TWC For human: concept map helps –Such as those in preceding slides For machine: AVOID ontology hijacking –We should not modify classes/properties that are defined in external ontologies (e.g., those in PROV-O, BIBO, FOAF, ORG, etc.) For machine: domain and range of properties –Be careful about this when reuse properties from external ontologies 28

TWC For machine: avoid ontology hijacking For example, we can make such assertions in GCIS ontology: And we should avoid such assertions in GCIS ontology: 29 gcis:Agent prov:Agent foaf:Agent rdfs:subclassOf prov:Agentfoaf:Agent rdfs:subclassOf prov:Agentfoaf:Agent owl:equivalentClass

TWC For machine: domain and range of properties For example, to use prov:wasGeneragedBy between an instance of gcis:Report and an instance of gcis:ReportGeneration We should assert that gcis:Report is a subclass of prov:Entity and gcis:ReportGeneration is a subclass of prov:Activity 30 :wasGeneratedBy a owl:ObjectProperty ; rdfs:domain :Entity ; rdfs:range :Activity ; rdfs:isDefinedBy ; rdfs:subPropertyOf :wasInfluencedBy ; … :inverse "generated" ; :qualifiedForm :Generation, :qualifiedGeneration. Definition of :wasGeneratedBy in the W3C PROV Ontology

TWC After rounds of use case analysis, we had a concept map for the GCIS ontology: – 1G0CSWH-2YH4/GCIS_Ontology_v1_2.cmaphttp://cmapspublic3.ihmc.us/rid=1MCJMLST0- 1G0CSWH-2YH4/GCIS_Ontology_v1_2.cmap And an RDF file synchronized with the concept map, serialized in Turtle format (.ttl): – IMSAP/2/GCISOntology_v_1_2.ttlhttp://escience.rpi.edu/ontology/GCIS- IMSAP/2/GCISOntology_v_1_2.ttl 31 For more information about the Turtle format, see:

TWC 4. Additional operations and tools to refine an ontology For machine: ontology syntax check For human: ontology documentation Namespace prefix: brand your ontology 32

TWC For machine: ontology syntax check There are many online tools that help check the grammar of an RDF file: –Such as the RDF Validator and Converter, see:

TWC For human: ontology documentation There are several online tools that help generate an ontology document for human to read –Such as the Live OWL Documentation Environment, see: See a list of similar tools at: roject/SeSF/Working Group/OntologyDocu mentation roject/SeSF/Working Group/OntologyDocu mentation

TWC Namespace prefix: brand your ontology For the GCIS ontology we use gcis as the namespace prefix –One can register namespace prefix and look up existing ones at:

TWC Final output of the GCIS ontology Ontology documentation – IMSAP/2/GCISOntology_v_1_2.htmhttp://escience.rpi.edu/ontology/GCIS- IMSAP/2/GCISOntology_v_1_2.htm Concept map – 1G0CSWH-2YH4/GCIS_Ontology_v1_2.cmaphttp://cmapspublic3.ihmc.us/rid=1MCJMLST0- 1G0CSWH-2YH4/GCIS_Ontology_v1_2.cmap Ontology RDF serialized in Turtle format – IMSAP/2/GCISOntology_v_1_2.ttlhttp://escience.rpi.edu/ontology/GCIS- IMSAP/2/GCISOntology_v_1_2.ttl 36

TWC See also Ma, X., Fox, P., Tilmes, C., Jacobs, K., Waple, A., Capturing and presenting provenance of global change information. Nature Climate Change. In Press. Tilmes, C., Fox, P., Ma, X., McGuinness, D., Privette, A.P., Smith, A., Waple, A., Zednik, S., Zheng, J., Provenance representation for the National Climate Assessment in the Global Change Information System. IEEE Transactions on Geoscience and Remote Sensing 51 (11), Ma, X., Fox, P., Recent progress on geologic time ontologies and considerations for future works. Earth Science Informatics 6 (1), 31–46. 37

TWC Thank you! gcis rpi Sponsors