Presentation is loading. Please wait.

Presentation is loading. Please wait.

Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1, Shawn Bowers 2, Josh Madin 3,

Similar presentations


Presentation on theme: "Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1, Shawn Bowers 2, Josh Madin 3,"— Presentation transcript:

1 Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1, Shawn Bowers 2, Josh Madin 3, Matt Jones 1 1 NCEAS UC Santa Barbara, 2 Gonzaga University 3 Macquarie University, http://sonet.ecoinformatics.org NCEAS-ACEAS Workshop, Brisbane May 2010

2 Motivation-- Critical questions Need to answer increasingly complex and critical questions about the environment: are the world’s fisheries sustainable? how will climate change impact food production? are GMOD crops safe to introduce to the environment? is deforestation accelerating climate change? why are pollinators declining around the world? will nanotech wastes alter ecosystems? what are causes of ocean acidification on reef corals? can we predict the spread of an invasive species are there tipping points in environmental change?

3 Motivation– Environmental Synthesis Answering complex, critical environmental questions requires integrating and analyzing many types of data: Local to large scale, global coverages Fine-grain, high-resolution Physical context: land-use/land-cover, geology soils, atmosphere, hydrology, oceanography Biotic context: from genes to ecosystems Socioecology: traditions & customs, economics, governance

4 Good news-- more and more data There is a growing deluge of environmental data to assist in these investigations …

5 Need for ecoinformatics But…  locating desired information is already quite difficult…  Culling through irrelevant information (precision)  Failing to find all useful information (recall)  using the data you find is problematic…  Interpretation (units, context, methods)  Merging, transforming for re-use  Manual, ad-hoc, arduous … Why?

6 Environmental Data-- State of Affairs Environmental data are: Stewarded/owned by many groups, individuals Sparsely documented (metadata, data catalog) Variably accessible via the Internet Heterogeneous: broad range of relevant topics

7 The informatics challenge… Environmental data are highly heterogeneous… geospatial data-- point, line, polygon, raster time series/monitoring data tables, spreadsheets/csv grids, matrices normalized DBMS Variable structure Variable syntax (R, MATLAB, mySQL,.xls) Variable semantics (what is “temp”?)

8 Data Integration Combining heterogeneous data is necessary for synthesis Approaches Develop consistent data models within and across entire domains– “standardized schema” “Describe” your data and its contents so that machines can process and integrate– “semantic mediation”

9 Data Integration Combining heterogeneous data is necessary for synthesis Impractical if not impossible to standardize schemas for all data sets being collected Use emerging approaches of Semantic Web 1 1 Berners-Lee, Hendler & Lassila 2001. The Semantic Web. http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html [18.04.2002 21:56:54]

10 Semantic Data Integration Metadata standards are step in right direction… Expose data in standard schema for transfer Dublin Core ISO 19115 (geospatial metadata) Darwin Core (biodiversity specimen metadata) EML (Ecological Metadata Language) GeoSciML All have XML implementations for document exchange Can map one format to another to resolve minor differences

11 Importance of semantics Descriptive metadata is insufficient “semantics” are expressed in natural language Inconsistent, imprecise, not standardized The computer can’t “understand”: what is being measured how measurements relate to one another how semantics map to logical structure

12 Importance of semantics Efficient, effective integration and subsequent analysis depends on understanding the semantic contextual relationships of each data measurement, as well as the relationships among measurements in a table structure or other data format. Usually an expert provides this, or a data catalog How to capture and expose for machine processing? Semantic Mediation!

13 Semantic Data Integration Metadata-- Cannot formally express complex constructs: Define Specific Leaf Area What type of weight measurement is involved in its calculation? How is SLA measurement in column 1 related to plot ID measurement in column 2? Cannot provide native reasoning: I measured a specimen with a prehensile tail, extrusible tongue, eats insects, has fused toes What is it? Can I know anything more about it?

14 Semantic Data Integration Ontologies do not have these limitations… Can express complex constructs: SLA is an abbreviation that is a synonym for the functional trait called Specific Leaf Area that is a measurement taken from a leaf, which is a part of a plant SLA consists of a dry weight measurement divided into an areal measurement Can natively reason: The specimen has a prehensile tail, extrusible tongue, eats insects, has fused toes infer: specimen is a chameleon infer: chameleon is a reptile infer: specimen has stereoscopic eyes Infer: specimen may be able to change color

15 Formal Ontologies and Reasoners Use W3C standard: Semantic Web http://www.w3.org/standards/semanticweb/ Expose data syntax, schema and semantics through a standardized language that computers can parse and interpret: OWL, the Web Ontology Language OWL, RDF, XML Reasoners

16 What is an ontology? A formal specification of concepts, and the relationships that may exist between those concepts.

17 How can ontologies help? Classification and “reasoning” Data discovery Integration/merge Concept mapping Units conversion Spatial & temporal scaling

18 How can ontologies help? Classification and “reasoning” New “facts” derived from ontology Potential emergence ArealDensity requires knowledge of Area and Abundance If have Area and Abundance, might have ArealDensity

19 How can ontologies help? Classification and “reasoning” Data discovery Integration/merge Analytical assistance Statistical inference Data types Data transformations

20 How can ontologies help? Use OWL-DL (OWL2 RL) W3C Recommendation Provides complete and consistent reasoning Standard, free, reasoners available Pellet, FaCT++ Construct and visualize ontologies using free tools Protégé, SWOOP OWLIFIER tool (Josh)

21 How can ontologies help? Can “Define” Objects with equivalence classes Specifies Necessary and Sufficient Conditions Reasoner will classify described Object has Fur locomotes Bipedal native_to Australia births UndevelopedYoung has GoodJumpingAbility

22 What do ontologies consist of? Objects (terms) Arrange in class (subsumption) hierarchies Can describe objects in terms of properties and relationships to other objects Relationships Specify relationships between Objects Can be reflexive, symmetric, transitive (or not)

23 View of SBC-OBOE ontology in Protégé

24 Beyond SQL… OWL DL Symbol Example Restrictions: someValuesFrom ∃ hasPart some Leaf allValuesFrom ∀ isPartof only Plant hasValue ∋ hasCountryOfOrigin value Australia minCardinality ≥ hasStoma min 1 cardinality = hasStem exactly 1 maxCardinality ≤ hasPetals max 100 Class constructors: intersectionOf ⊓ WoodyBark and RiparianHabitat unionOf ⊔ Tree or Bush complementOf ¬ not Grass

25 Model and define domain science concepts Lots of domain ontologies emerging http://www.biofoundry.org How to use these to advance data integration?

26 Model and define domain science concepts http://www.biofoundry.org Mainly biomedical, genomics

27 Use of Ontologies Genomics have largely homogeneous data Ontologies “unify” vocabularies in model organisms (fruit fly, yeast, mouse, arabidopsis etc.) Many ontologies emerging Are these useful for semantic mediation and data integration?

28 Nature of scientific data sets Scientific data often in tables Tables consist of rows (records) and columns (attributes) The association of specific columns together (tuple) in a scientific data set is often a non- normalized (materialized) view, with special meaning/use for researcher Individual cells contain values that are measurements of characteristic of some thing

29 Semantic annotation Data set slide from J. Madin computer doesn’t know that “Ht.” represents a “height” measurement computer doesn’t know whether Plot is nested within Site or vice- versa computer to determine if the Temp applies to Site or Plot or Species

30 Observation defined Observations in scientific data sets typically co- occur with other observations Ontologies must assist with describing the inter- relationships among observations within and across datasets Observational Data Model

31 Observation defined An observation represents any measurement of some characteristic (attribute) of some real- world entity or phenomenon. A measurement consists of a realized value of some characteristic of an entity, expressed in some well-specified units (drawn from a measurement standard) Observations can provide context for other observations (e.g. observations of spatial or temporal information would often provide context for some other observation) Measurements are taken using some protocol

32 Another definition for observation An observation is an act that results in the estimation of the value of a feature property, and involves application of a specified procedure, such as a sensor, instrument, algorithm or process chain. The procedure may be applied in-situ, remotely, or ex-situ with respect to the sampling location… The key idea is that the observation result is an estimate of the value of some property of the feature of interest, and the other observation properties provide context or metadata to support evaluation, interpretation and use of the result. (OGC Observations and Measurements, 2010-01-05)

33 Extensible Observation Ontology (OBOE) A scientific Observation is Measurement of the Value of a Characteristic of some Entity in a particular Context using some Protocol

34 Provides extension points for loading specialized domain ontologies To generically describe the structure of scientific observation and measurement as would be found in a scientific data set OBOE - Extensible Observation Ontology Entities represent real- world objects or concepts that can be measured. Measurements assign values and units to characteristics of observed entities. Observations are made about particular entities. Every measurement has a characteristic, which defines the property of the entity being measured. Every measurement has a unit. Observations can provide context for other observations. Entities, through observations, can be associated with one or more measured characteristics. A value is typically a cell in a data set. Extension points

35 Linking data values to concepts Extensible Observation Ontology (OBOE) OBOE provides a high-level abstraction of scientific observations and measurements Enables data (or metadata) structures to be linked to domain-specific ontology concepts Can inter-relate values in a tuple Provides clarification of semantics of data set as a whole, not just “independent” values

36 OBOE - Domain concepts Ecological Paleontological

37 OBOE - Units Standard and customized units and their relationships to one another can easily be loaded into OBOE

38 OBOE - Semantic units Measurements can be of one or more characteristics of one or more entities (unit components)

39 Plant measured in StudyArea StudyArea is on the Plant OBOE - Context Context provides essential semantic detail by linking Observations

40 OBOE - Context Experimental design Spatial & temporal scaling “Smart” data merge “Sensible” analysis

41 OBOE - Context Experimental design Spatial & temporal scaling “Smart” data merge “Sensible” analysis

42 Data Integration with OBOE Observations can be aligned for data integration... ObservationMeasurement 0.11.3 Diameter Meters has-precisionhas-value Tree Apply conversions based on alignments, e.g.  use common Entity and Characteristic concepts  apply Unit conversions to values  select lowest precision and apply 3.2

43 OBOE: Aligning Observations Observations can be aligned for data integration... Picea rubens ObservationMeasurement 0.011.25 Diameter Meters has-precisionhas-value Abbies balsa. ObservationMeasurement 10320 DBH Centimeters has-precision has-value Two similar observations of trees

44 OBOE: Aligning Observations Observations can be aligned for data integration... Picea rubens ObservationMeasurement 0.011.25 Diameter Meters has-precisionhas-value Abbies balsa. ObservationMeasurement 10320 DBH Centimeters has-precision has-value Tree isa Length has-dimension Align entities, characteristics, and standards isa

45 Observation Based Structured Query Both datasets contain “tree lengths” Annotation search for “tree length” would return both datasets Structured search allows the search to be limited by the observed entity (e.g. a tree or a tree branch) Increase precision and recall

46 Example: “Sensible” data summarization Leveraging annotations Consistency checking NOT sensible to summarize variables by “downstream” factors; e.g., Precipitation in the StudySite by TaxonomicName IS sensible to summarize variables by “upstream” factors; e.g., Plant Height by StudySite or by Precipitation IS sensible to summarize variables by factors in the same Observation; e.g., Plant Height by TaxonomicName or Precipitation by StudySite

47 Our Semantic Approach  Climbing the semantic ladder: Ontologies Semantic Annotations Metadata Data

48 Our Semantic Approach  Method for linking elements of data objects (e.g., columns in a table) to consistent and potentially rich sets of concepts  Semantic Annotations link EML attributes to concepts defined in a Formal Ontology  Store and retrieve annotations and ontologies in Metacat

49 Our “Semantic stack”

50 Semantic Annotation Links data structures via metadata, to ontology terms via OBOE Actively working on materializing data result sets from these ontology-based queries Investigating expressiveness of annotation language Annotating to other data stores

51 Metacat Implementation

52 KNB metadata catalog Stores EML (XML) and raw data objects Extend to store Ontologies, domain and OBOE (OWL-DLs serialized in XML) Extend to store Annotations (XML) Jena to facilitate querying ontologies Pellet to reason (consistency of ontologies; class subsumption)

53 Need for data interoperability MANY different “semantic” efforts underway to unify data within earth/biodiversity/environmental disciplines, converging on use of OBSERVATIONAL data construct SPECIALIZED needs and concerns of different domains may drive semantic technology solutions to be diverse and incompatible OPPORTUNITY exists for communicating and coordinating among different domains to achieve greater interoperability of emerging semantic technology solutions BENEFIT is providing cross-disciplinary scientists with more seamless and powerful access to a broad range of relevant data and information

54 USA NSF’s OCI INTEROP This NSF crosscutting program supports community efforts to provide for broad interoperability through the development of mechanisms such as robust data and metadata conventions, ontologies, and taxonomies. Support is provided… for consensus-building activities: community workshops, web resources such as community interaction sites, and task groups. … and for providing the expertise necessary to turn the consensus into technical standards with associated implementation tools and resources: information sciences, software development, and ontology and taxonomy design and implementation.

55 Objectives of SONet Broad Objectives Address semantic interoperability issues in environmental (earth sciences) data [sharing, discovery, integration] Build a network of practitioners (SONet), including domain scientists, computer scientists, and information managers Build generic, cross-disciplinary data interoperability solutions Immediate Goals to Develop An extensible and open observations data model (“core model”) to unify existing domain-specific approaches A semantic (ontology) framework for scientific terminology and corresponding domain extensions Demonstration prototypes using these to address critical data interoperability issues

56 Prospective observation models… ProjectDomainObservational data model TDWG/OSRBiodiversityMeta-model to integrate field observational data with specimen data VSTOAtmospheric sciences Ontologies for interoperations among different meteorological metadata standards ODMHydrologyCUAHSI’s Observational Data Model for storing diverse hydrological data SERONTOSocioecological research Ontology for integrating socio-ecological data OGC’s O&MGeospatialObservations and Measurements standard for enhancing sensor data interoperability SEEK’s OBOEEcologyExtensible Observation Ontology for describing data as observations and measurements

57 Variations of Observational Data Models

58 Developing a core model Identify the key observational models in the earth and environmental sciences Are these various observational models easily reconciled and/or harmonized? Are there special capabilities and features enabled by some observational approaches? What services should be developed around these observational models?

59 Working Groups Subgroup 1: Core Data Model for Observations Subgroup 2: Catalog of Common Field Observations Subgroup 3: Scientist-Oriented Term Organization Subgroup 4: Demonstration Projects Subgroup 1 Collect interoperability requirements Define common, unified data model Engage tool & data providers, data consumers Subgroup 2 Identify and catalog common observation types (semantics) Engage data providers and information managers Subgroup 3 Define general extension ontologies of scientific terms Focus work on outputs of group 2 Engage range of domain scientists Subgroup 4 Define and prototype demonstration projects Ensure compatability of subgroups Each group consists of two team leads Postdoc funded to work on demonstration projects & help ensure compatibility across subgroups Core SONet Team

60 Goals Identify and resolve commonalities and discrepancies among observational efforts Define a common core observational model for data Test with use cases (cross-disciplinary data integration tasks)

61 Where we are at… Identifying and resolving commonalities and discrepancies among observational models— O&M (ISO track) and OBOE Developing best-practices and design patterns for constructing observation-model compliant earth science ontologies, e.g. “measurement type” Developing cross-disciplinary use cases that exercise data integration capabilities of semantic approach

62 Where we are at… SEMTOOLS project Testing and enhancing semantic mediation Leveraging SONet observation data model Building semantic querying and annotation capabilities into Morpho Use Cases include using ontologies for data integration involving: ecology at an LTER site, Salmon Monitoring, and for Vegetation Traits

63 Morpho semantic annotation interface…

64 Future directions… Enabling semantic annotation onto disparate data resources Ontologies for analysis Ontologies for experimental design

65 Acknowledgments Thanks to Chad Berkley, Ben Leinfelder, and Huiping Cao for ideas, implementation and slides. This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, 0225676, 0619060, 0722079, 0743429. The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation.


Download ppt "Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1, Shawn Bowers 2, Josh Madin 3,"

Similar presentations


Ads by Google