Presentation is loading. Please wait.

Presentation is loading. Please wait.

Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.

Similar presentations


Presentation on theme: "Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer."— Presentation transcript:

1 Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer 1 National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara 1 University of California, Davis 2 MacQuarie University 3

2 Ecological studies Ecological studies focus on –Distribution and abundance of organisms –Organism interactions –Population and community processes –Ecosystem processes –Mechanistic understanding of ecosystems Diverse data sources, e.g., –Biodiversity monitoring –Experimental manipulations –Environmental monitoring

3 Synthesis over ecological process Gruner et al. 2008 –Ecology Letters, (2008) 11: 740–755 Meta-analysis of 191 factorial manipulations of nutrients and herbivores Experimenters manipulated –nutrient addition –herbivore removal Effect on producer biomass

4 Synthesis over space Costanza et al. Nature 1997

5 Synthesis over time Jackson et al., Science 2001

6 How did they do it? As a scientist, could you: –Locate the precise data used? –Locate the analytical processes used? Reconstruct them? Today, only a slim chance... –Why?

7 Insufficient sharing Researchers don’t publish their data Researchers don’t publish their analytical code In general, we have no way to verify or reproduce the conclusions in papers

8 Synthesis requires access to global ecological data Single-schema databases do not suffice Loosely-coupled metadata and data collections –No constraints on data schemas Knowledge Network for Biocomplexity (KNB) National Biological Information Infrastructure (NBII) Preserving data for synthesis

9 PhysicalDataFormat Access and Distribution LogicalDataModel MethodsCoverage: Space, Time, Taxa Identity and Discovery Information 22 independent modules open modular extensible Ecological Metadata Language Grass roots metadata Describe what data you have... rather than prescribe what to produce.

10 EML: Selected relationships 1990 19952000 2005 ‘91‘92‘93‘94 ‘96‘97‘98‘99 ‘01‘02‘03‘04 FGDC created ‘06‘07‘08‘09 EML 1.0.0 EML 1.3.0 EML 1.4.x EML 2.0.0 CSDGM 1.0 Michener ’97 paper ESA FLED Report NBIIB DP ISO 19115 Dublin Core OBOE XML 1.0 EML 2.0.1 EML 2.1.0?

11 Logical Model: Attribute structure Describes data tables and their attributes a typical data table with 10 attributes –some metadata are likely apparent, other ambiguous –missing value code is present –definitions need to be explicit, as well as data typing YEAR MONTH DATE SITE TRANSECT SECTION SP_CODE SIZE OBS_CODE NOTES 2001 8 2001-08-22 ABUR 1 0-20 CLIN 5 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 11 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 10 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 14 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 7 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 19 06. 2001 8 2001-08-22 ABUR 1 21-40 COTT 5 06. 2001 8 2001-08-22 ABUR 2 0-20 CLIN 5 06. 2001 8 2001-08-22 ABUR 2 21-40 NF 0 06. 2001 8 2001-08-27 AHND 1 0-20 NF 0 03. Species Codes Value bounds Date Format Code definitions

12 Logical Model: unit Dictionary Consistent assignment of measurement units –Quantitative definitions in terms of SI units –‘unitType’ expresses dimensionality time, length, mass, energy are all ‘unitType’s second, meter, gram, pound, joule are all ‘unit’s Mass kilogram gram UnitTypeUnit x1000

13

14 An EML Record at NCEAS

15 Knowledge Network for Biocomplexity (KNB) PISCO KNB II AND... (26) GCE LTER NCEAS ESA OBFS KNB 1 Building a data preservation network Preserve primary data Rich metadata descriptions Redundant backup via replication Access controlled by contributors

16 KNB 1 KNB II PISCO AND... (26) GCE LTER NCEAS ESA OBFS Knowledge Network for Biocomplexity (KNB) South African Data Network Mozambique Mapungubwe Marakele KrugerSAEON Grahamsto wn Cape Town San Parks Wilderness Cape Town U Addo Karoo Tsitsikama Phalabora Savannah ClusterMarine Cluster

17 South African National Parks Metacat

18

19 Metacat deployments

20 International LTER Recommendation for producing EML across all ILTER sites Recommendation for producing continental and regional metadata caches –one or more in each ILTER region –initial nodes may use Metacat

21 att1 | attr2 | attr3.... |.... |...... Dynamic Data Retrieval Data Storage Metadata Parser Metadata Parser Data Loader Data Loader DB Results Query SELECT * FROM... CREATE TABLE... Data QueryResults Data Manager Store DataStore Metadata User Client Metadata Catalog

22 Join Query Client Query Request Results Response

23 Importance of semantics So far we’ve dealt only with the logical data model –any semantics in EML in natural language The computer doesn’t really understand: –what is being measured –how measurements relate to one another –how semantics map to logical structure Analysis depends on understanding the semantic contextual relationships among data measurements –e.g., density measured within subplot

24 Semantic annotation Observation Ontology Data set Mapping between data and the ontology via semantic annotation slide from J. Madin Relational data lacks critical semantic information no way for computer to determine that “Ht.” represents a “height” measurement no way for computer to determine if Plot is nested within Site or vice-versa no way for computer to determine if the Temp applies to Site or Plot or Species

25 Scientific Observations An Observation is the Measurement of the Value of a Characteristic of some Entity in a particular Context

26 Provide extension points for loading specialized domain ontologies Goal: semantically describe the structure of scientific observation and measurement as found in a data set Observation ontology (OBOE) Entities represent real- world objects or concepts that can be measured. Observations are made about particular entities. Every measurement has a characteristic, which defines the property of the entity being measured. Observations can provide context for other observations. slide from J. Madin

27

28 Datasets vs. Observations EML describes “data sets” –collections of related observations with relatively unspecified semantics –mostly natural language descriptions OBOE describes “scientific observations” –semantically-precise descriptions of scientific measurements –allows understanding of relationships among measurements and context of an observation

29 Model correspondences

30 TDWG Observations Task Group An Observation is the Measurement of the Value of a Characteristic of some Entity in a particular Context Create: Community-sanctioned, extensible, and unified ontology model for observational data –Compatible with existing standards –Integrate with metadata standards such as EML, CSDGM, etc. –Reduce the “babel” of scientific dialects

31 Questions? http://www.nceas.ucsb.edu/ecoinformatics/ http://knb.ecoinformatics.org/http://knb.ecoinformatics.org http://seek.ecoinformatics.org/http://seek.ecoinformatics.org http://kepler-project.org/http://kepler-project.org

32 Acknowledgments This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. Collaborators: NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research), University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON, RoadNet, EOL, Resurgence


Download ppt "Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer."

Similar presentations


Ads by Google