Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Chapter 10: Designing Databases
Forest Markup / Metadata Language FML
Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB.
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Xyleme A Dynamic Warehouse for XML Data of the Web.
ETEC 100 Information Technology
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
Long-Term Ecological Research working_groups/controlled_vocabulary Working Group: “Synthesis through data.
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
The Relational Database Model
ICS – FORTH, August 31, 2000 Why do we need an “Object Oriented Model” ? Martin Doerr Atlanta, August 31, 2000 Foundation for Research and Technology -
Using observational data models to enhance data interoperability for integrative biodiversity and ecological research Mark Schildhauer*, Luis Bermudez,
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1, Shawn Bowers 2, Josh Madin 3,
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
SONet: Scientific Observations Network Semtools: Semantic Enhancements for Ecological Data Management Mark Schildhauer, Matt Jones, Shawn Bowers, Huiping.
Architecture for a Database System
Data Management David Nathan & Peter Austin & Robert Munro.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Growing challenges for biodiversity informatics Utility of observational data models Multiple communities within the earth and biological sciences are.
Data, Metadata, and Ontology in Ecology Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Digital libraries and web- based information systems Mohsen Kamyar.
Central Arizona Phoenix LTER Center for Environmental Studies Arizona State University Database Design Peter McCartney (CAP) RDIFS Training Workshop Sevilleta.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Ontology Resource Discussion
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
Jemerson Pedernal IT 2.1 FUNDAMENTALS OF DATABASE APPLICATIONS by PEDERNAL, JEMERSON G. [BS-Computer Science] Palawan State University Computer Network.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
1 Chapter 2 Database Environment Pearson Education © 2009.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Controlled Vocabulary Working Group Activities
Ontology Technology applied to Catalogues Paul Kopp.
Lecture 5 Data Model Design Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES Debashis Naskar 1 and Biswanath Dutta 2 DSIC, Universitat Politècnica de València.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Information Systems Today: Managing in the Digital World
Improving Data Discovery Through Semantic Search
DATA MODELS.
LTER Metadata Query Interface – Current Status and Future Challenges
Data Model.
The Database Environment
Database Design Hacettepe University
Measurement Semantics: “MEASEM”
DATA MODELS.
Presentation transcript:

Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin 3 CISIS/iSEEK Fukuoka, Japan March 18, National Center for Ecological Analysis and Synthesis, UC Santa Barbara 2 Genome Center, UC Davis 3 Macquarie University

Motivation Increasing numbers of data sets becoming available to scientific researchers Locating data sets of interest is a problem--- – Researcher needs observations of specific phenomena – Researcher ideally wants comprehensive data Must improve precision and recall when searching for data

Definitions Precision: number of relevant items retrieved by a search divided by the total number of items retrieved by that search Recall: the number of relevant items retrieved by a search divided by the total number of existing relevant items (which should have been retrieved) In this case, items are data objects

Test Case Knowledge Network for Biocomplexity (KNB; is a repository for ecological data KNB contains > 15,000 entries, and growing rapidly KNB used by NCEAS, LTER, PISCO, ILTER, others KNB holdings are described in formal metadata specification, Ecological Metadata Language, EML

Test Case KNB offers traditional text based searching of all or some critical metadata fields (keywords, abstract, author, personnel) Results often contain extraneous data sets— – Even keyword matches often too coarse – Need more refined methods for searching metadata fields Test extending search capabilities of KNB with semantic approach

Our Semantic Approach Data-> metadata-> annotations-> ontologies Ontology: formal knowledge representation in OWL-DL – Hierarchical structure of concepts – Relationships can link concepts Annotations link EML metadata elements to concepts in ontology EML metadata describe data and its structures

Logical Architecture

Nature of scientific data sets Scientific data often in tables Tables consist of rows (records) and columns (attributes) The association of specific columns together (tuple) in a scientific data set is often a non- normalized (materialized) view, with special meaning/use for researcher Individual cells contain values that are measurements of characteristic of some thing

Linking data values to concepts Extensible Observation Ontology (OBOE) OBOE provides a high-level abstraction of scientific observations and measurements Enables data (or metadata) structures to be linked to domain-specific ontology concepts Can inter-relate values in a tuple Provides clarification of semantics of data set as a whole, not just “independent” values

OBOE: Extensible Observation Ontology

Logical Architecture

XML Links

KNB metadata catalog Stores EML (XML) and raw data objects Extend to store Ontologies, domain and OBOE (OWL-DLs serialized in XML) Extend to store Annotations (XML)

Metacat Implementation

KNB metadata catalog Stores EML (XML) and raw data objects Extend to store Ontologies, domain and OBOE (OWL-DLs serialized in XML) Extend to store Annotations (XML) Jena to facilitate querying ontologies Pellet to reason (consistency of ontologies; class subsumption)

Types of Implemented Searches Simple Keyword (baseline) Keyword-based (ontological) term expansion Annotation enhanced term expansion Observation based structured query

Concepts of Semantic Search Annotations give metadata attributes semantic meaning w.r.t. an ontology Enable structured search against annotations to increase precision Enable ontological term expansion to increase recall Precisely define a measured characteristic and the standard used to measure it via OBOE

Simple Keyword Search High false positive rate (low precision) Metadata structure is often ignored Project level metadata often conflicts with attribute level metadata Example: search for “soil” will return frog data because the description of the lake the frogs were studied in contained the word “soil” Synonyms for search terms are ignored

Keyword-based Term Expansion Synonyms and subclasses of the search term are discovered via the ontology Additional terms are added to the query of metadata docs Example: Search for “Grasshopper” also searches for “Orchilimum,” “Romaleidae,” etc. Increases recall, possibly decreases precision Can help fight “semantic drift”: annotations allow interpretation to evolve

Annotation Enhanced Term Expansion Terms are first expanded similarly to the keyword-based term expansion Search performed against annotations not the metadata itself Returns metadata documents that are linked to the annotation increases recall through term expansion but also increases precision through explicit assertion of relevance (annotation)

Observation Based Structured Query Takes advantage of observation and measurement structures and relationships Search based on an observed entity (e.g. a Grasshopper) and the measurement standards and characteristics used to measure it Observed entity is a “template” on which the measurement characteristic and standard are applied

Observation Based Structured Query Both datasets contain “tree lengths” Annotation search for “tree length” would return both datasets Structured search allows the search to be limited by the observed entity (e.g. a tree or a tree branch) Increases precision and recall

Keyword-based Term Expansion

Annotation Enhanced Term Expansion

Structured Search

Conclusions Simple Keyword (baseline) – (+) precision, (+) recall Keyword-based (ontological) term expansion – (+/-) precision, (++) recall Annotation enhanced term expansion – (++) precision, (+++) recall Observation based structured query – (+++) precision, (+++) recall

Test site: Continue developing corpus of annotated data sets to better quantify precision/recall advantages Enable use of “context” structure in OBOE New award: – enhance tools for creating annotations using ontologies – Improve interfaces for structuring searches Work supported by National Science Foundation awards , , , , ,