Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB.

Similar presentations


Presentation on theme: "Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB."— Presentation transcript:

1 Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010

2 Nature of scientific data sets Scientific data often in tables Tables consist of rows (records) and columns (attributes) The association of specific columns together (tuple) in a scientific data set is often a non- normalized (materialized) view, with special meaning/use for researcher Individual cells contain values that are measurements of characteristic of some thing

3 SONet/Semtools Semantic Approach Data-> metadata-> annotations-> ontologies Ontology: formal knowledge representation in OWL-DL – Hierarchical structure of concepts – Relationships can link concepts Annotations link EML metadata elements to concepts in ontology thru Observation Ontology EML metadata describe data and its structures

4 Linking data values to concepts Extensible Observation Ontology (OBOE) OBOE provides a high-level abstraction of scientific observations and measurements Enables data (or metadata) structures to be linked to domain-specific ontology concepts Can inter-relate values in a tuple Provides clarification of semantics of data set as a whole, not just “independent” values

5 Concepts of Semantic Search Annotations give metadata attributes semantic meaning w.r.t. an ontology Enable structured search against annotations to increase precision Enable ontological term expansion to increase recall Precisely define a measured characteristic and the standard used to measure it via OBOE

6 Logical Architecture

7 Annotations XML schema defines annotation properties Namespaces to identify sources of terms Search performed against annotations not the metadata itself Returns metadata documents that are linked to the annotation Reasoning (term expansion, consistency, etc.) through domain ontology

8 XML Links

9 KNB metadata catalog Stores EML (XML) and raw data objects Extended to store Ontologies, domain and OBOE (OWL-DLs serialized in XML) Extended to store Annotations (XML) Jena to facilitate querying ontologies Pellet to reason (consistency of ontologies; class subsumption)

10 Metacat Implementation

11 11 Context Observation Measurement Relationship Entity Characteristic Value Standard hasContextRelationship ofEntity hasValue ofCharacteristic usesStandard hasMeasurement hasContext hasContextObservation 0..* 1..1 0..* 1..1 0..* 1..1 0..* 1..1 0..* OBOE Conceptual Model (OWL-DL)

12 Annotation Examples (12/18/2009) AnnotationDataset Materialize Define (view def.) OBOE Model (individuals/triples) OBOE Concepts instantiates uses terms from observation-based representation of Query* * Conceptually, we want to query datasets via annotations

13 13 Annotation Examples Annotation Syntax observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “diam” to “m2" if diam > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” * Code exists to read/write annotations using this XML format

14 14 Annotation Examples yrspecsppdbh 20071piru35.8 20071piru36.2 20082abba33.2 observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset Basic idea: go row-by-row through dataset, generating individuals/triples “external” terms should have namespacing prefix URI : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 35.8 : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 36.2 : Obs : Meas : Year : DateTime 2008 : Obs : Meas : EntN : LocTN. 2 : Meas : TaxN : ITIS Abie. : Meas : DBH : Centim. 33.2 : Tree : Tempral Range : Tree : Tempral Range : Tree : Tempral Range hasContext

15 15 Annotation Examples yrspecsppdbh 20071piru35.8 20081piru36.2 20082abba33.2 observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset Same Trees!! (both have name = 1) Same Year and year observation!! : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 35.8 : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 36.2 : Obs : Meas : Year : DateTime 2008 : Obs : Meas : EntN : LocTN. 2 : Meas : TaxN : ITIS Abie. : Meas : DBH : Centim. 33.2 : Tree : Tempral Range : Tree : Tempral Range : Tree : Tempral Range hasContext

16 16 Annotation Examples yrspecsppdbh 20071piru35.8 20081piru36.2 20082abba33.2 observation "o1” distinct yes entity ”TemporalRange” measurement "m1” key yes characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” key yes characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 35.8 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 36.2 : Obs : Meas : Year : DateTime 2008 : Obs : Meas : EntN : LocTN. 2 : Meas : TaxN : ITIS Abie. : Meas : DBH : Centim. 33.2 : Tree : Tempral Range : Tree : Tempral Range Every observation has an implicit “distinct” attribute (set to “no”) … and every measurement has an implicit “key” attribute (set to “no”) hasContext

17 17 Observation measurement keys – Like a primary key constraint – States that observation instances with the same measurement key values are of the same entity instance – Does not imply the same observation instance, unless the observation is declared distinct – All key measurements of an observation together form the primary key Distinct observations – Only applies if at least one key measurement is defined – States that observation instances with the same entity instance are of the same observation instance Annotation Examples

18 18 Annotation Examples pltsppdbh Apiru35.8 Apiru36.2 Bpiru33.2 observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : EntN : Nominal A : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 35.8 : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 36.2 : Obs : Meas : EntN : Nominal B : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 33.2 : Tree : Plot hasContext Here we don’t have unique ids for trees But, assume each spp name within a plot uniquely identifies a tree … i.e., at most one tree of a particular type was measured (possibly multiple times) in each plot

19 19 Annotation Examples pltsppdbh Apiru35.8 Apiru36.2 Bpiru33.2 observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : EntN : Nominal A : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 35.8 : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 36.2 : Obs : Meas : EntN : Nominal B : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 33.2 : Tree : Plot hasContext The Tree entity instance should depend on the plot it is in!!! (context)

20 20 Annotation Examples pltsppdbh Apiru35.8 Apiru36.2 Bpiru33.2 observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context identifying yes observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : EntN : Nominal A : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 35.8 : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 36.2 : Obs : Meas : EntN : Nominal B : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim. 33.2 : Tree : Plot hasContext Every context relationship has an “identifying” qualifier (set to “no”) Uniqueness within context observation Similar to a weak-entity constraint (ER) : Tree

21 21 Representing instances … Annotation(AnnotId, Resource) Observation(ObsId, AnnotId, EntId) Measurement(MeasId, ObsId, MeasType, Value) Context(ObsId1, ObsId2, Rel) Relationship(RelId, RelType) Entity(EntId, EntType) This could be queried itself and/or mapped to triples Note that ObsIds are unique across annotations Context.ObsId’s must be for the same annotation Annotation Examples * Simple relational schema for OBOE models (individuals/triples)

22 22 Developing compatible domain ontologies (design patterns for use with observation ontology) Scalability of materialization algorithm from annotations (data result sets) Testing and developing capabilities motivated by Use Cases (coastal ecosystems and plant traits) SONet and JWG-ODMS continue to meet and discuss Ongoing Activities

23 Acknowledgements: Shawn Bowers, Huiping Cao, SEEK KR/SMS working group, and all members of SONet and Semtools projects Thanks also to Chad Berkeley and Ben Leinfelder, project software engineers Work supported by National Science Foundation awards 0225674, 0225676, 0743429, 0733849, 0753144, 0630033


Download ppt "Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB."

Similar presentations


Ads by Google