The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

OASIS OData Technical Committee. AGENDA Introduction OASIS OData Technical Committee OData Overview Work of the Technical Committee Q&A.
Developing an application ontology for biomedical resource annotation and retrieval: challenges and lessons learned C. Torniai, M. Brush, N. Vasilevsky,
WikiNeuron: Semantic Wiki of Collective Minds in Neuroscience Kei Cheung, Ph.D. Yale Center for Medical Informatics NCBO Seminar Series, March 18, 2009.
Data Landscapes neuinfo.org Anita Bandrowski, Ph. D. University of California, San Diego.
Information and Business Work
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Environmental Terminology System and Services (ETSS) June 2007.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
The possibility and probability of establishing a global neuroscience information framework: lessons learned from practical experiences in data integration.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
The use of ontologies within the Neuroscience Information Framework, a neuroscience-centered portal for searching and accessing diverse resources Maryann.
Mental Functioning and Semantic Search in the Neuroscience Information Framework Maryann Martone Fahim Imam Funded in part by the NIH Neuroscience Blueprint.
The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San.
Clément Troprès - Damien Coppéré1 Semantic Web Based on: -The semantic web -Ontologies Come of Age.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Resource Curation and Automated Resource Discovery.
Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego.
N IF : A C OMPREHENSIVE O NTOLOGY FOR N EUROSCIENCE & P RACTICAL G UIDE FOR D ATA -O NTOLOGY I NTEGRATION Maryann E. MARTONE, Fahim IMAM, Anita Bandrowski,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NIFSTD Maryann Martone University of California, San Diego.
Ontologies for Neuroscience and Neurology The Neuroscience Information Framework Fahim Imam, Stephen Larson, Georgio Ascoli, Gordon Shepherd, Anita Bandrowski,
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
The Neuroscience Information Framework Making Resources Discoverable for the Computational Neuroscience Community Jeffrey S. Grethe, Ph. D. Co-Principal.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
INCF Digital Atlasing Infrastructure: An Overview.
Data Integration Progress. BIRN Data Integration Framework 2. Create conceptual links to a shared ontology 1. Create multimodal databases 3. Situate the.
Introduction to the Semantic Web and Linked Data
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
The Uniform Resource Layer Anita Bandrowski Neuroscience Information Framework.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
University of California, San Diego Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
N IF S TD : A C OMPREHENSIVE O NTOLOGY FOR N EUROSCIENCE Fahim IMAM 1, Stephen LARSON 1, Sridevi POLAVARAM 2, Georgio ASCOLI 2, Gordon SHEPHERD 3, Jeffery.
Needs and Progress: Summary Flexible, powerful, modular atlas interface, and a query gateway to multiple types of data (GeneNetwork, Barlow, Smith, CCDB,
The Neuroscience information framework A User’s Guide.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
The Uniform Resource Layer Anita Bandrowski Neuroscience Information Framework.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Contributions to mouse BIRN tools and resources Maryann Martone and Mark Ellisman University of California, San Diego 2008.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Uniform Resource Layer Anita Bandrowski, Ph. D. Neuroscience Information Framework University of California, San Diego.
Experience with the development and operation of the Neuroscience Information Framework (NIF) portal Maryann E. Martone, Ph. D. University of California,
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
The Semantic Web By: Maulik Parikh.
University of California, San Diego
Modern Systems Analysis and Design Third Edition
UCSD Neuron-Centered Database
Fahim IMAM, Stephen LARSON, Georgio ASCOLI, Gordon SHEPHERD,
The Re3gistry software and the INSPIRE Registry
WikiNeuron: Semantic Neuro-Mashup
Modern Systems Analysis and Design Third Edition
An ecosystem of contributions
Modern Systems Analysis and Design Third Edition
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
SDMX IT Tools SDMX Registry
Presentation transcript:

The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San Diego

NIF Team Amarnath Gupta, UCSD, Co Investigator Jeff Grethe, UCSD, Co Investigator Gordon Shepherd, Yale University Perry Miller Luis Marenco David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech Arun Rangarajan Hans Michael Muller Giorgio Ascoli, George Mason University Sridevi Polavarum Anita Bandrowski, NIF Curator Fahim Imam, NIF Ontology Engineer Karen Skinner, NIH, Program Officer Lee Hornbrook Kara Lu Vadim Astakhov Xufei Qian Chris Condit Stephen Larson Sarah Maynard Bill Bug Karen Skinner, NIH

What does this mean? 3D Volumes 2D Images Surface meshes Tree structure Ball and stick models Little squiggly lines Data People Information systems

The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience UCSD, Yale, Cal Tech, George Mason, Washington Univ Supported by NIH Blueprint  A portal for finding and using neuroscience resources  A consistent framework for describing resources  Provides simultaneous search of multiple types of information, organized by category  Supported by an expansive ontology for neuroscience  Utilizes advanced technologies to search the “hidden web”

Where do I find… Data Software tools Materials Services Training Jobs Funding opportunities Websites Databases Catalogs Literature Supplementary material Information portals...And how many are there?

NIF in action

Query expansion: Synonyms and related concepts Boolean queries Query expansion: Synonyms and related concepts Boolean queries Data sources categorized by “data type” and level of nervous system Simplified views of complex data sources Tutorials for using full resource when getting there from NIF Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn”

NIF searches across multiple sources of information NIF data federation Independent databases registered with NIF or through web services NIF registry: catalog NIF Web: custom web index (work in progress) NIF literature: neuroscience-centered literature corpus (Textpresso) NIF data federation Independent databases registered with NIF or through web services NIF registry: catalog NIF Web: custom web index (work in progress) NIF literature: neuroscience-centered literature corpus (Textpresso)

Guiding principles of NIF Builds heavily on existing technologies (open source tools) Information resources come in many sizes and flavors Framework has to work with resources as they are, not as we wish them to be – Federated system; resources will be independently maintained – Very few use standard terminology or map to ontologies No single strategy will work for the current diversity of neuroscience resources Trying to design the framework so it will be as broadly applicable as possible to those who are trying to develop technologies Interface neuroscience to the broader life science community Take advantage of emerging conventions in search and in building web communities

Registering a Resource to NIF Level 1 NIF Registry: high level descriptions from NIF vocabularies supplied by human curators Level 2 Access to deeper content; mechanisms for query and discovery; DISCO protocol Level 3 Direct query of web accessible database Automated registration Mapping of database content to NIF vocabulary by human

The NIF Registry Very, very simple model Annotated with NIF vocabularies Resource type Organism Reviewed by NIF curators Very, very simple model Annotated with NIF vocabularies Resource type Organism Reviewed by NIF curators

Level 2: Updates and deeper integration DISCO involves a collection of files that reside on each participating resource. These files store information describing: - attributes of the resource, e.g., description, contact person, content of the resource, etc. -> updates NIF registry - how to implement DISCO capabilities for the resource These files are maintained locally by the resource developers and are “harvested” by the central DISCO server. In this way, central NIF capabilities can be updated automatically as resources evolve over time. The developers of each resource choose which DISCO capabilities their resource will utilize Luis Marenco, MD, Rixin Wang, PhD, Perry L. Miller, MD, PhD, Gordon Shepherd, MD, DPhil Yale University School of Medicine

DISCO Level 2 Interoperation Level 2 interoperation is designed for resources that have only Web interfaces (no database API). Different resources require different approaches to achieve Level 2 interoperation. Examples are: 1.CRCNS - requires metadata tagging of Web pages 2.DrugBank - requires directed traversal of Web pages to extract data into a NIF data repository 3.GeneNetwork - requires Web-based queries to achieve “relational-like” views using “wrappers”

DrugBank Example The DrugBank Web interface showing data about a specific drug (Phentoin).

DrugBank Example (continued) This DISCO Interoperation file specifies how to extract data from the DrugBank Web interface automatically.

DrugBank Example (continued) A NIF user views data retrieved from DrugBank in response to a query in a transparent, integrated fashion.

Level 3 Deep query of federated databases with programmatic interface Register schema with NIF – Expose views of database – Map vocabulary to NIFSTD Currently works with relational and XML databases – RDF capability planned for NIF 2.5 (April 2010) Works with NIF registry: databases also annotated according to data type and biological area

Integrated views and gene search

Is GRM1 in cerebral cortex? NIF system allows easy search over multiple sources of information Well known difficulties in search Inconsistent and sparse annotation of scientific data Many different names for the same thing No standards for data exchange or annotation at the semantic level – Lack of standards in data annotation require a lot of human investment in reconciling information from different sources Allen Brain Atlas MGD Gensat

Cerebral Cortex AtlasChildrenParent GenepaintNeocortex, Olfactory cortex (Olfactory bulb; piriform cortex), hippocampus Telencephalon ABACortical plate, Olfactory areas, Hippocampal Formation Cerebrum MBAT (cortex)Hippocampus, Olfactory, Frontal, Perirhinal cortex, entorhinal cortex Forebrain MBLDoesn’t appear GENSATNot definedTelencephalon BrainInfofrontal lobe, insula, temporal lobe, limbic lobe, occipital lobe Telencephalon Brainmaps Entorhinal, insular, 6, 8, 4, A SII 17, Prp, SI Telencephalon

Modular ontologies for neuroscience  NIF covers multiple structural scales and domains of relevance to neuroscience  Incorporated existing ontologies where possible; extending them for neuroscience where necessary  Normalized under the Basic Formal Ontology: an upper ontology used by the OBO Foundry  Based on BIRNLex: Neuroscientists didn’t like too many choices  Cross-domain relationships are being built in separate files  Encoded in OWL-DL, but also maintained in a Wiki form, a relational database form and any other way it is needed NIFSTD NS Function Molecule Investigation Subcellular Anatomy Macromolecule Gene Molecule Descriptors Techniques Reagent Protocols Cell Instruments Bill Bug NS Dysfunction Quality Macroscopic Anatomy Macroscopic Anatomy Organism Resource

How are ontologies used? Search: query expansion – Synonyms – Related classes – “concept based queries” Annotation: – Resource categorization – Entity mapping Ranking of results – NIF Registry; NIF Web

Concept-based search: Entity mapping Brodmann area 3 Brodmann.3 Synonyms and explicit mapping of database content help smooth over terminology differences and custom terminologies

Concept-based query: GABAergic neuron Simple search will not return examples of GABAergic neurons unless the data are explicitly tagged as such Too many possible classifications to get them all by keywords Classes are logically defined in NIF ontology i.e., a GABAergic neuron is any neuron that uses GABA as a neurotransmitter

Reclassification based on logical definitions: GABA neuron is any member of class neuron that has neurotransmitter GABA NIF Bridge File NIF Cell NIF Molecule Define a set of properties that relate neuron classes to molecule classes, e.g., Neuron Has neurotransmitter Purkinje cell is a Neuron Purkinje cell has neurotransmitter GABA

NIF Bridge Files: Defining classes to enhance neuroscience search Neuron by neurotransmitter Neuron by brain region Neuron by circuit role Neuron by morphology Molecule by role Neurotransmitter Drug Drug of abuse

NIF 2.5: Linking NIF entities

NIF Cards: Linked data

NIF 3.0+: NIF Annotation Standards Tying keyword search to quantitative definitions NIF is developing mappings between “data types” (e.g., age) and NIF annotation standards, e.g., adult Query string: adult mouse – Translation: Adult mouse = >/= 40 days postnatal

Building NIF ontologies: Balancing act Different schools of thought as to how to build vocabularies and ontologies NIF is trying to navigate these waters, keeping in mind: – NIF is for both humans and machines – Our primary concern is data – We have to meet the needs of the community – We have a budget and deadlines Building ontologies is difficult even for limited domains, never mind all of neuroscience, but we’ve learned a few things – Reuse what’s there: trying to re-use URI’s rather than map when possible – Make what you do reusable: adopt best practices where feasible Numerical identifiers, unique labels, single asserted simple hierarchies – Engage the community – Avoid “religious” wars: separate the science from the informatics – Start simple and add more complexity Create modular building blocks from which other things can be built

Ontologies, etc Mike Bergman

What we’ve learned Strategy: Create modular building blocks that can be knit into many things – Step 1: Build core lexicon (NeuroLex) Classes and their definitions Simple single inheritance and non-controversial hierarchies Each module covers only a single domain Understandable by an average human – Step 2: NIFSTD: standardize modules under same upper ontology OBO compliant in OWL – Step 3: Cre a te intra-domain and more useful hierarchies using properties and restrictions – Brain partonomy – Step 4: Bridge two or more domains using a standard set of relations – Neuron to brain region – Neuron to molecule, e.g., GABAergic neuron

Neurolex More human centric More stable class structure – Ontologies take time; many versions are retired-not good for information systems that are using identifiers Synonyms and abbreviations were essential for users Can’t annotate if they can’t find it Can’t use it for search if they can’t find it Facilitates semi-automated mapping Contains subsets of ontologies that are useful to neuroscientists – e.g., only classes in Chebi that neuroscientists use Wanted the community to be able to see it and use it – Simple understandable hierarchies Removed the independent continuants, entity etc Used labels that humans could understand – Even if they were plural (meninges vs meninx)

Maintaining multiple versions NIF maintains the NIF vocabularies in different forms for different purposes – Neurolex Wiki: Lexicon for community review and comment – NIFSTD: set of modular OWL files normalized under BFO and available for download – NCBO Bioportal for visibility and mapping services – Ontoquest: NIF’s ontology server Relational store customized for OWL ontologies Materialized inferred hierarchies for more efficient queries

NeuroLex Wiki Larson “The human interface” Semantic media wiki

NIF Architecture Gupta et al., Neuroinformatics, 2008 Sep;6(3):205-17

Summary NIF has tried to adopt a flexible, practical approach to assembling, extending and using community ontologies – We use a combination of search strategies: string based, lexicon-based, ontology-based – We believe in modularity – We believe in starting simple and adding complexity – We believe in single asserted hierarchies and multiple inferred hierarchies – We believe in balancing practicality and rigor NIF is working through the International Neuroinformatics Coordinating Facility (INCF) to engage the community to help build out the Neurolex and start adding additional relations – Neuron registry task force – Structural lexicon

Where do we go from here? NIF 2.5 – Automatic expansion of logically defined terms More services – NIF vocabulary services are available now More content More, more, more NIF Blog NIF is learning lessons in practical data integration

Musings from the NIF… No single approach, technology, philosophy, tool, platform will solve everything The value of making resources discoverable is not appreciated Developing resources (tools, databases, data) that are interoperable at this point is an act of will Decisions can be made at the outset that will make it easier or harder to integrate We build resources for ourselves and our constituents, not automated agents We get mad when commercial providers don’t make their products interoperable Many times the choice of terminology is based on expediency or who taught you biology rather than deep philosophical differences Ontologies, terminologies, lexicons, thesauri Developing a semantic framework on top of unstable resources is difficult Need to adopt a flexible approach

NIF Evolution V1.0: NIF NIF ThenNowLater