Capturing and Organizing Scientific Annotations

Slides:



Advertisements
Similar presentations
Controls Group May 22-24, 2002 EPICS Collaboration Jefferson Lab Electronic Logbook Theo Larrieu Theo Mcguckin Michelle Joyce.
Advertisements

Computer Concepts 5th Edition Parsons/Oja Page 492 CHAPTER 10 File And Database Concepts Section A PARSONS/OJA Databases.
Maintenance Modifying the data –Add records –Delete records –Update records Modifying the design –Add fields into tables –Remove fields from a table –Change.
© 2002 by Prentice Hall 1 David M. Kroenke Database Processing Eighth Edition Chapter 2 Introduction to Database Development.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Databases & Data Warehouses Chapter 3 Database Processing.
Some Basic Database Terminology
January, 23, 2006 Ilkay Altintas
IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.
Research sponsored by Mathematics, Information and Computational Sciences Office U.S. Department of Energy Al Geist Jens Schwidder David Jung Computer.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Introduction to database systems
10-1 aslkjdhfalskhjfgalsdkfhalskdhjfglaskdhjflaskdhjfglaksjdhflakshflaksdhjfglaksjhflaksjhf.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
 Popularity of browsers:  Popularity of search.
Relational Database CISC/QCSE 810 some materials from Software Carpentry.
Chapter 10 Database Management. Chapter 10 Objectives Discuss the functions common to most DBMSs Identify the qualities of valuable information Explain.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Database What is a database? A database is a collection of information that is typically organized so that it can easily be storing, managing and retrieving.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Database Concepts Track 3: Managing Information using Database.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Scientific Annotation Middleware (SAM) Jim Myers, Elena Mendoza PNNL Al Geist, Jens Schwidder ORNL.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
IST 220 – Intro to Databases Lecture 2 Touring Microsoft Access.
IT Enablement Approaches Large Business may have hundreds of processes to be enabled by IT. Several Types of Application may be deployed –Departmental.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
Morpho – metadata management software SEEK Training January 2004.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
1 Annotation Framework March Terminology CV - abbreviation for controlled vocabulary CRS - Community Review System (a collection within DLESE)
1 SQL SERVER 2005 Express CE-105 SPRING 2007 Engr. Faisal ur Rehman.
Enhancements to Galaxy for delivering on NIH Commons
Building Enterprise Applications Using Visual Studio®
DATABASE.
Prepared By: Bobby Wan Microsoft Access Prepared By: Bobby Wan
BIO1130 Lab 2 Scientific literature
IST 220 – Intro to Databases
GO! with Microsoft Office 2016
Database Management:.
Single Sample Registration
Middleware independent Information Service
GO! with Microsoft Access 2016
Software Testing With Testopia
Web Engineering.
Elsevier Activity Range
Final review 24th Nov 2014 Brussels
Microsoft Access 2003 Illustrated Complete
Databases.
Brief description on how to navigate within this presentation (ppt)
Data Model.
BIO1130 Lab 2 Scientific literature
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Database Applications
Mendeley Overview VISHAL GUPTA Customer Consultant South Asia
Database Design Hacettepe University
Advanced Database Concepts: Reports & Views
Beyond Description: Metadata for Catalogers in the 21st Century
The ultimate in data organization
Mendeley Overview VISHAL GUPTA Customer Consultant South Asia
Microsoft Azure Data Catalog
OBSERVER DATA MANAGEMENT PRINCIPLES AND BEST PRACTICE (Agenda Item 4)
Database management systems
HTML5 and CSS3 Illustrated Unit B: Getting Started with HTML
Presentation transcript:

Capturing and Organizing Scientific Annotations Greg Riccardi Florida State University riccardi@cs.fsu.edu Riccardi: Workshop on Data Management March 17, 2004

What is an Annotation? An assertion of a relationship among objects Someone claims that several objects are connected by a relationship and gives evidence of the connection Includes record of author and date of assertion Objects are often datasets with provenance Annotations often assert quality characteristics of data objects Crucial social components Attribution, confidence, and validity Ontologies and compliance with standards Establishment of object naming strategy Security policies Riccardi: Workshop on Data Management March 17, 2004

Example from SkyServer These object are the same Telescope and catalog info SkyQuery dataset SkyQuery dataset Analysis Query string Query string Riccardi: Workshop on Data Management March 17, 2004

Types and Importance of Annotations Three types of annotations Systematic Semi-structured Ad Hoc Annotations are of primary importance in data semantics and analysis Record of semantics of data Record of peoples opinions about data We need tools to make annotations easy to create, organize, understand, and search Riccardi: Workshop on Data Management March 17, 2004

Systematic Annotations Collected automatically Anticipated and organized Factual Experimental metadata See example of Jefferson Lab run log A run log entry asserts a relationship between the metadata and the raw data The run number identifies each object Rows in runBegin table, runEnd table, runFiles table, runComment table Object identification is much more difficult in most cases As noted in earlier talks Experimental metadata is not always collected or curated properly Riccardi: Workshop on Data Management March 17, 2004

Systematic Provenance Annotations Derivation provenance Record of computational creation of data Must be collected by computations directly Query provenance In SkyQuery, user submits query and results dataset is retained in MyDB The query must be retained to record semantics of dataset GGF Database Access and Integration Working Group (DAIS) Deveoping standards for representing queries on databases and other data stores Provides a data access recipe that can be used to fetch a particular dataset Morphbank images of scanning electron micrographs Riccardi: Workshop on Data Management March 17, 2004

Semi-Structured Annotations Anticipated and organized Collected mostly by hand Experimental logbook from Jefferson Lab Riccardi: Workshop on Data Management March 17, 2004

Jefferson Lab Logbook Run and log daily summaries Standard logbook entry Many standard (expected) fields Comment field filled with ad hoc annotations “ADB crate” “voltage” Complaint about logbook usage Suggested strategy for creating logbook entries Automatically generated logbook entry Post processing software creates database entries directly Image tags point to files on some computer Riccardi: Workshop on Data Management March 17, 2004

Semi-Structured Annotations Anticipated and organized Collected mostly by hand Experimental logbook from Jefferson Lab Logbook entry has specific fields Run id, subject, author, entry_type, system Entry has an ad hoc field Searching comment field requires interpretation of words [Ontologies?] Search page for log book Based on predefined structure Created and used by experts Riccardi: Workshop on Data Management March 17, 2004

Ad Hoc Annotations Asserts connection between arbitrary objects Example from morphology Riccardi: Workshop on Data Management March 17, 2004

Morphology Publication Example Riccardi: Workshop on Data Management March 17, 2004

Ad Hoc Annotations Asserts connection between arbitrary objects Example from morphology Searching is difficult Ambiguous and inefficient Google is a search engine for ad hoc annotations Not based on organized ontology Not based on document structure Riccardi: Workshop on Data Management March 17, 2004

Annotating data quality Suppose that someone finds error in a SkyQuery dataset Create an ad-hoc annotation “Objects X, Y, Z in data catalog D are incorrectly identified” Include annotation in any query? We don’t know how to carry quality annotations into the query results Riccardi: Workshop on Data Management March 17, 2004

Organizing Annotations Need to find ways to structure ad hoc annotations When structure emerges, capture it Create specific schemas Create specific interfaces for collection, display and search Main goal is to make it easy enough for scientists They must see advantages to the extra work of structuring their thoughts and conforming to ontologies Riccardi: Workshop on Data Management March 17, 2004

Querying the Annotation Activity Publish/Subscribe database strategies Publish the history of updates Subscribe to queries on the history Suppose you are the curator of a SkyQuery database Someone claims that the object catalog is wrong You should be informed Riccardi: Workshop on Data Management March 17, 2004

Example of Annotation Query These object are the same Telescope and catalog info SkyQuery dataset SkyQuery dataset Analysis Curator Query string Query string Riccardi: Workshop on Data Management March 17, 2004

Challenges of Ad Hoc Annotations Establishing globally unique, persistent data object names Optimizing searches Result semantics Ontologies Capturing structure of frequent annotation styles Providing user interfaces to define semi-structured annotations Riccardi: Workshop on Data Management March 17, 2004

Annotations Technology: SAM Scientific Annotation Middleware Jim Myers and Al Geist EMSL Electronic Notebook Riccardi: Workshop on Data Management March 17, 2004

Annotations Technology: Amaya Annotations of HTML and XML documents Project includes browser and document editor Text annotations attached to XHtml, XML, MathML and SVG http://www.w3.org/amaya Annotea collaborative annotation technology http://www.w3.org/2001/Annotea/ Riccardi: Workshop on Data Management March 17, 2004

References SkyQuery and SkyServer Jefferson Lab Logbooks http://www.skyquery.org/ http://cas.sdss.org/dr2/en/tools/chart/navi.asp Jefferson Lab Logbooks Home page http://clasweb.jlab.org/clasonline/ Today’s runs http://clasweb.jlab.org/clasonline/servlet/prodruninfo?action=today Today’s Logbook entries http://clasweb.jlab.org/clasonline/servlet/prodloginfo?action=today Run detail page http://clasweb.jlab.org/clasonline/servlet/prodruninfo?action=detail&run=42331 Logbook entry http://clasweb.jlab.org/clasonline/servlet/newloginfo?action=logentry&entryId=17082 Morphbank: Johan Liljeblad & Fredrik Ronquist http://www.morphbank.net/ http://www.csit.fsu.edu/~ronquist/papers/SystEnt1998.pdf Scientific Annotation Middleware http://collaboratory.emsl.pnl.gov/ W3C Amaya XML Annotation project http://www.w3.org/Amaya/ http://www.w3.org/2001/Annotea/ Riccardi: Workshop on Data Management March 17, 2004