Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
Kino : Making Semantic Annotations Easier Ajith Ranabahu #, Priti Parikh #, Maryam Panahiazar #, Amit Sheth # and Flora Logan- Klumpler* # Ohio Center.
WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
Personalized Navigation in the Semantic Web: An Enhanced Faceted Browser Michal Tvarožek FIIT STU BA.
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
AskMe A Web-Based FAQ Management Tool Alex Albu. Background Fast responses to customer inquiries – key factor in customer satisfaction Costs for customer.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
BTW (“By The Way…”) Information Annotation By Rudd Stevens, Jason Endo University of San Francisco.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Implementing search with free software An introduction to Solr By Mick England.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
PubMed/How to Search, Display, Download & (module 4.1)
Drupal Workshop Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology, Drupal technology, directories.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
The Internet 8th Edition Tutorial 9 Creating Effective Web Pages.
Gene Expression Omnibus (GEO)
Copyright OpenHelix. No use or reproduction without express written consent1.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Depth customization of DSpace: Best practices and techniques of institutional repository at IIT Kanpur, India By S. K. Vijaianand V. D. Shrivastava Gaurav.
Project Overview Bibliographic merging, Endeca, and Web application.
Last News of and
Resource Curation and Automated Resource Discovery.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
PatentScope - Electronic Publication World Intellectual Property Organization.
Data Integration and Management A PDB Perspective.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
EMBL-EBI MSD Search and Visualization tools Jawahar Swaminathan.
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
807 - TEXT ANALYTICS Massimo Poesio Lab 2: (Quick intro to) SOLR Document clustering with MAHOUT.
ICM – API Server & Forms Gary Ratcliffe.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Web Page Design 1 Information Technology ClassAct SRS enabled. Web Page Design This presentation will explore: creating web pages structure, formatting.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
1 PDMLink Application - User Features & Functions Module 6: Search Capabilities.
Actionable Identifiers an introduction Joan Starr California Digital Library.
SSDB Progress Report Site Survey Panel Meeting CIRE, Sapporo, Japan July 22, 2006 John Weatherford San Diego Supercomputer Center Subcontract to IODP-MI.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
CONTENTdm A proven solution September A complete digital collection management software solution Stores, manages and provides access for all digital.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Introduction, Features & Technology
The Re3gistry software and the INSPIRE Registry
CS6604 Digital Libraries IDEAL Webpages Presented by
Cancer Images Database (caIMAGE)
Catherine Foley Director of Digital Archive and Library Projects MATRIX, Center for Digital Humanities and Social Sciences at MSU Mid-Michigan Digital.
Welcome to the Quantitative Trait Loci (QTL) Tutorial
Academy Hub An eUnomia Factory Solution.
Microsoft Azure Data Catalog
Eurostat Unit B3 – IT and standards for data and metadata exchange
Presentation transcript:

Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database of images, videos, and animations of cells from a variety of organisms, including both cell architecture and intracellular functionalities, as well as stimulate the economy through the creation and retention of 18 (7 full-time equivalents) positions and immediate deployment.

Team Caroline Kane Principal Investigator University of California Berkley John Murray Co-Principal Investigator University of Pennsylvania Janet Iwasa Co-Principal Investigator Harvard Medical School Joan Goldberg Executive Director American Society of Cell Biology David Orloff Manager, Image Library American Society of Cell Biology John Hufnagle Scientific Informatics Developer MBL

Expert Annotation—The Value Add 11 annotators They often solicit and upload images They are often in contact with the scientists who produced the images Gregory Antipa San Francisco State University Carrie Baker Brachmann Margaret I. Davis National Institutes of Health, National Institute on Alcohol Abuse and Alcoholism Keigi Fujiwara University of Rochester Catherine Galbraith National Institutes of Health Yu-Chen Hwang University of California, Santa Cruz Wallace Ip University of CincinnatiCollege of Medicine Caroline McKeown The Scripps Research Institute Linda Parysek University of Cincinnati College of Medicine Ginger Withers Whitman College Chris Woodcock University of Massachusetts Amherst

Annotation Information Image Description Ontology terms Attribution 1.Names 2.Pubmed Ids 3.Citations 4.links 5.dates Dimensional

Multiple Categories of Ontologies Categories including: – Biological Sources—NCBI, cell type, cellular component – Blological Context – biological process, molecular function – Imaging Methods – Sample Preparation Ontologies provide a controlled vocabulary Useful for searching, browse categorization

Ontologies NCBI Organism Classification (NCBITaxon) Gene Ontology (GO) – biological_process – molecular_function – cellular_component Cell Type (CL) Cell Line (MCC) Human Development (EHDA) Mouse Gross Anatomy (EMAP) Plant Growth (PO) Teleost Anatomy (TAO) Xenopus Anatomy (XAO) Zebrafish Anatomy (ZFA/ZFS) Human Disease (DOID) Mouse Pathology (MPATH) Biological Imaging Methods (BIM) …the project now controls this ontology

Image Lifecyle Image Data Upload Annotation Publish & Index Library Edit/Save Retract

System Components OMERO Image Repository Server OMERO Image Repository Server DB PostgreSQL Disk Index, Image Data Disk Index, Image Data Web Application Annotation Web Application Server (Harvard) Image Upload Library Browser Requests Annotation Browser Requests

Image Upload Submission Image Data Upload Annotation Publish & Index Library Edit/Save Retract

Image Data Upload Submitter downloads Upload Java application Raw image data files selected (105 image file formats supported) Submitter contact information supplied Submitter supplied image description (not visible in the Library) which contains technical image details to be used by the annotators Choose license type

Upload Process & Components Java Upload App Submitter Machine HTTP Importer Worker Process Importer Worker Process OMERO Image Repository Production Server (Harvard) DB PostgreSQL Disk Index, Image Data Disk Index, Image Data

Image Lifecyle Image Data Upload Annotation Publish & Index Library Edit/Save Retract

Annotation Process & Components OMERO Image Repository Server DB PostgreSQL Disk Index, Image Data Disk Index, Image Data Annotation Web Application (Django) Annotation Web Application (Django) Server (Harvard) Apache Server Apache Server

Image Lifecyle Image Data Upload Annotation Publish & Index Library Edit/Save Retract

Publish OMERO Image Repository Server DB PostgreSQL Disk Index, Image Data Disk Index, Image Data Annotation Web Application Server (Harvard) Publish Library Custom Indexing Plug-in Lucene Indexer Browser Publish

Indexing OMERO repository provides a way for developers to add their own custom indexing step in order to generate custom search indexing fields and values. Custom indexing plug-in, written in Java and configured into the OMERO system. Each image upon modification is presented to the custom plug-in

Cell Library Custom Indexing Generating Index Values Custom Lucene document index fields – Id – Ontology information for each term in each ontology category term id parent id ancestor ids term description synonym description – attribution (names, pubmed, citations, urls) – is_recommended (for front page/browse poster child image) – is_video – description – license type – publish date (useful for Recent browsing) – dimensions

Ontology Data Scripting Download Latest Ontology.obo file (Ruby) Download Latest Ontology.obo file (Ruby) Parse.obo file (Custom BioJava) Parse.obo file (Custom BioJava) JSON data Populate PostgreSQL ontology tables (Ruby) Populate PostgreSQL ontology tables (Ruby) BioPortal Ontology REST services DB PostgreSQL

Indexing Ontology Terms … "field_mappings" : [ { "module" : "web_annotation_module", "namespace" : "com.glencoesoftware.ilib.ann:ncbi", "name" : "NCBIORGANISMALCLASSIFICATION", "index_field_name_prefix" : "ncbi", "ontologies" : [ { "db_table_name" : "ncbis", "model_klass" : "Ncbi”, "onto_term_regex_pattern" : "NCBITaxon:[0-9]*","ontology_id" : "1023" } ] }, …. … "field_mappings" : [ { "module" : "web_annotation_module", "namespace" : "com.glencoesoftware.ilib.ann:ncbi", "name" : "NCBIORGANISMALCLASSIFICATION", "index_field_name_prefix" : "ncbi", "ontologies" : [ { "db_table_name" : "ncbis", "model_klass" : "Ncbi”, "onto_term_regex_pattern" : "NCBITaxon:[0-9]*","ontology_id" : "1023" } ] }, ….... com.glencoesoftware.ilib.ann:celltype CELLTYPE Ciliated Protist com.glencoesoftware.ilib.ann:ncbi NCBIORGANISMALCLASSIFICATION NCBITaxon: com.glencoesoftware.ilib.ann:celltype CELLTYPE Ciliated Protist com.glencoesoftware.ilib.ann:ncbi NCBIORGANISMALCLASSIFICATION NCBITaxon: Mapping fileAnnotation xml fragment

Additional Indexing Artifacts Generation of db data to support efficient Library browsing – Entries made for each ontology term in use

Image Lifecyle Image Data Upload Annotation Publish & Index Library Edit/Save Retract

System Components OMERO Server DB PostgreSQL Disk Index, Image Data Disk Index, Image Data Annotation Web Application Server (Harvard) Passenger Container Apache Jetty Servlet Container Jetty Servlet Container Library Web Service

Connecting to the OMERO Server OMERO Server Java OMERO Server Java Annotation Web Application (Django/Python) Server (Harvard) Passenger Container Jetty Servlet Container (8081,2,3,4,5) Library Web Service (Java) search get image annotation data convert video-to-flash get raw image bytes get OME-TIF image bytes search get image annotation data convert video-to-flash get raw image bytes get OME-TIF image bytes OMERO Ice Middleware (Java) OMERO Ice Middleware (Java) OMERO Ice Middleware (Python) OMERO Ice Middleware (Python) REST-like Apache R 08 OMERO Ice Middleware (Java) OMERO Ice Middleware (Java)

Library Basic Search Primary Weighting Secondary Weighting Secondary Weighting

Library Advanced Search

Advanced Search If the ontology search value is exact match for existing term, returns matches against term and descendant terms e.g. “rodentia” will match rat, mouse, etc. If the ontology search value does not match an existing ontology term a simple text match search against that ontology category is run

Library Browse Categories – Cell Process (GO biological_process) – Cellular Component (GO cellular_component) – Cell Type (cell type CL) – Organism (NCBITaxon) Sub-categories consist of all ontology terms currently annotated to images…captured during Indexing phase Efficiency (NCBI 500K+)

Some Image Sources Journals – Journal of Cell Biology – Molecular Biology of the Cell – The Plant Cell – Plant Physiology

Some Sources and Contributors Don W. Fawcett’s The Cell Some images from researchers with MBL ties – Clara Franzini-Armstrong – Rudolph Oldenburg

Programmatic Access Jetty web service interface is externally available. – Search – Image metadata – raw & OME-TIFF download formats

Statistics February stats – 6,635 Visits – 5,093 Absolute Unique Visitors – 31,609 Pageviews

Future Enhancements Themed collections with descriptive content Image tagging Faceted searching (SOLR)

Summary Research tool with raw image data available for future image processing Image Submissions always accepted…contact David Orloff