Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Knowledge Graph: Connecting Big Data Semantics
Future of Research Communications and E-Scholarship Maryann E. Martone, Ph. D. Executive Director Professor of Neuroscience, University of California,
The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San.
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.
Data Landscapes neuinfo.org Anita Bandrowski, Ph. D. University of California, San Diego.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Information and Business Work
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
The possibility and probability of establishing a global neuroscience information framework: lessons learned from practical experiences in data integration.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
3.02 The Information Superhighway
Amarnath Gupta Univ. of California San Diego If There is a Data Deluge, Where are the Data?
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
The use of ontologies within the Neuroscience Information Framework, a neuroscience-centered portal for searching and accessing diverse resources Maryann.
Mental Functioning and Semantic Search in the Neuroscience Information Framework Maryann Martone Fahim Imam Funded in part by the NIH Neuroscience Blueprint.
The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San.
Integrating digital atlases of the brain: atlas services with WPS Ilya Zaslavsky San Diego Supercomputer Center, UCSD Lead of the INCF Digital Atlasing.
GEM/IRDR Social Vulnerability and Resilience Information System and Metadata Portal IRDR Scientific Board Meeting Chengdu 03/11/2012.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
N IF : A C OMPREHENSIVE O NTOLOGY FOR N EUROSCIENCE & P RACTICAL G UIDE FOR D ATA -O NTOLOGY I NTEGRATION Maryann E. MARTONE, Fahim IMAM, Anita Bandrowski,
Atlas Interoperablity I & II: progress to date, requirements gathering Session I: 8:30 – 10am Session II: 10:15 – 12pm.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NIFSTD Maryann Martone University of California, San Diego.
Ontologies for Neuroscience and Neurology The Neuroscience Information Framework Fahim Imam, Stephen Larson, Georgio Ascoli, Gordon Shepherd, Anita Bandrowski,
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Big data from small data: A deep survey of the neuroscience landscape data via the Neuroscience Information Framework Maryann Martone, Ph. D. University.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
The Neuroscience Information Framework Making Resources Discoverable for the Computational Neuroscience Community Jeffrey S. Grethe, Ph. D. Co-Principal.
Data Integration and Management A PDB Perspective.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
INCF Digital Atlasing Infrastructure: An Overview.
The Semantic Logger: Supporting Service Building from Personal Context Mischa M Tuffield et al. Intelligence, Agents, Multimedia Group University of Southampton.
Introduction to the Semantic Web and Linked Data
The Uniform Resource Layer Anita Bandrowski Neuroscience Information Framework.
University of California, San Diego Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
N IF S TD : A C OMPREHENSIVE O NTOLOGY FOR N EUROSCIENCE Fahim IMAM 1, Stephen LARSON 1, Sridevi POLAVARAM 2, Georgio ASCOLI 2, Gordon SHEPHERD 3, Jeffery.
The Neuroscience information framework A User’s Guide.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
The Uniform Resource Layer Anita Bandrowski Neuroscience Information Framework.
Function BIRN The ability to find a subject who may have participated in multiple experiments and had multiple assessments done is a critical component.
Contributions to mouse BIRN tools and resources Maryann Martone and Mark Ellisman University of California, San Diego 2008.
Uniform Resource Layer Anita Bandrowski, Ph. D. Neuroscience Information Framework University of California, San Diego.
Experience of indexing brain research related measurements with NIFSTD (More requirements than implementation) Maryann Martone.
Experience with the development and operation of the Neuroscience Information Framework (NIF) portal Maryann E. Martone, Ph. D. University of California,
Of 24 lecture 11: ontology – mediation, merging & aligning.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
University of California, San Diego
Ilya Zaslavsky Jeffrey Grethe amarnath Gupta burak Ozyurt
Cloud based linked data platform for Structural Engineering Experiment
UCSD Neuron-Centered Database
WikiNeuron: Semantic Neuro-Mashup
Ontology-Based Approaches to Data Integration
LOD reference architecture
Web archives as a research subject
Presentation transcript:

Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

“Neural Choreography” “A grand challenge in neuroscience is to elucidate brain function in relation to its multiple layers of organization that operate at different spatial and temporal scales. Central to this effort is tackling “neural choreography” -- the integrated functioning of neurons into brain circuits--their spatial organization, local and long-distance connections, their temporal orchestration, and their dynamic features. Neural choreography cannot be understood via a purely reductionist approach. Rather, it entails the convergent use of analytical and synthetic tools to gather, analyze and mine information from each level of analysis, and capture the emergence of new layers of function (or dysfunction) as we move from studying genes and proteins, to cells, circuits, thought, and behavior.... However, the neuroscience community is not yet fully engaged in exploiting the rich array of data currently available, nor is it adequately poised to capitalize on the forthcoming data explosion. “ Akil et al., Science, Feb 11, 2011

NIF is an initiative of the NIH Blueprint consortium of institutes NIF is an initiative of the NIH Blueprint consortium of institutes What types of resources (data, tools, materials, services) are available to the neuroscience community? What types of resources (data, tools, materials, services) are available to the neuroscience community? How many are there? How many are there? What domains do they cover? What domains do they not cover? What domains do they cover? What domains do they not cover? Where are they? Where are they? Web sites Web sites Databases Databases Literature Literature Supplementary material Supplementary material Who uses them? Who uses them? Who creates them? Who creates them? How can we find them? How can we find them? How can we make them better in the future? How can we make them better in the future? PDF files PDF files Desk drawers Desk drawers

How many resources are there? NIF Registry: A catalog of neuroscience-relevant resources > 4800 currently listed > 2000 databases And we are finding more every day NIF Registry: A catalog of neuroscience-relevant resources > 4800 currently listed > 2000 databases And we are finding more every day

The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience A portal for finding and using neuroscience resources  A consistent framework for describing resources  Provides simultaneous search of multiple types of information, organized by category  Supported by an expansive ontology for neuroscience  Utilizes advanced technologies to search the “hidden web” UCSD, Yale, Cal Tech, George Mason, Washington Univ Supported by NIH Blueprint Literature Database Federation Registry

What are the connections of the hippocampus? Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn” Query expansion: Synonyms and related concepts Boolean queries Query expansion: Synonyms and related concepts Boolean queries Data sources categorized by “data type” and level of nervous system Common views across multiple sources Tutorials for using full resource when getting there from NIF Link back to record in original source

Results are organized within a common framework Connects to Synapsed with Synapsed by Input region innervates Axon innervates Projects to Cellular contact Subcellular contact Source site Target site Each resource implements a different, though related model; systems are complex and difficult to learn, in many cases

The scourge of neuroanatomical nomenclature NIF Connectivity: 6 databases containing connectivity primary data or claims Brain Architecture Management System (rodent) Connectome Wiki (human) Brain Maps (various) CoCoMac (primate cortex) UCLA Multimodal database (Human fMRI) Avian Brain Connectivity Database (Bird) Total: 1800 unique brain terms (exluding Avian) Number of exact terms used in > 1 database: 42 Number of synonym matches: 99 Number of partonomy matches: 385 The INCF is working with NIF to develop semantic and spatial strategies for translating anatomy across information systems

What is an ontology? Brain Cerebellum Purkinje Cell Layer Purkinje cell neuron has a is a Ontology: an explicit, formal representation of concepts relationships among them within a particular domain that expresses human knowledge in a machine readable form Branch of philosophy: a theory of what is e.g., Gene ontologies Provide universals for navigating across different data sources Semantic “index” Provide the basis for concept-based queries to probe and mine data Perform reasoning Link data through relationships not just one- to-one mappings

PONS program Structural Lexicon Taskforce Structural Lexicon Taskforce Concentrate on Human, Non-human Primate, Rat and Mouse Concentrate on Human, Non-human Primate, Rat and Mouse Define structural concepts from level of organ to macromolecular complexes Define structural concepts from level of organ to macromolecular complexes Provide a set of criteria by which structures can be identified Provide a set of criteria by which structures can be identified Neuronal Registry Taskforce Neuronal Registry Taskforce Establish conventions for naming new types of neurons Establish conventions for naming new types of neurons Establish a standard set of properties to define neurons Establish a standard set of properties to define neurons Create a Neuron Registry for registering new types of neurons Create a Neuron Registry for registering new types of neurons Deployment and representation (Alan Ruttenberg) Deployment and representation (Alan Ruttenberg) Brought together ontologists working across scales Brought together ontologists working across scales Courtesy of Chris Mungall, Lawrence Berkeley Labs ***Not about imposing a single view of anatomy; about making concepts computable and being able to translate among views

NeuroLex Wiki Larson Provide a simple framework for defining the concepts required Cell, Part of brain, subcellular structure, molecule Community based: Avian neuroanatomy Fly neurons (England) Neuroimaging terms Brain regions identified by text mining Creating a computable index for neuroscience data INCF working to coordinate Wiki efforts underway at Allen Institute, Blue Brain and Neurolex Demo D03

Comparison of traffic to NIF Portal vs Neurolex 5000 hits15000 hits Wiki is readily indexed by search engines

Neurons in Neurolex INCF building a knowledge base of neurons and their properties via the Neurolex Wiki INCF building a knowledge base of neurons and their properties via the Neurolex Wiki Led by Dr. Gordon Shepherd Led by Dr. Gordon Shepherd Consistent and parseable naming scheme Consistent and parseable naming scheme Knowledge is readily accessible, editable and computable Knowledge is readily accessible, editable and computable Stephen Larson

NIF data federation connectivity Brain activation foci Microarray 98% Primary data, secondary data, claims, repositories Recently added: BioNOT literature mining tool; Retraction Watch blog

What do you mean by data? Databases come in many shapes and sizes Primary data : Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL) Secondary data Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS) Tertiary data Claims and assertions about the meaning of data E.g., gene upregulation/downregulation, brain activation as a function of task Registries: Metadata Pointers to data sets or materials stored elsewhere Data aggregators Aggregate data of the same type from multiple sources, e.g., Cell Image Library,SUMSdb, Brede Single source Data acquired within a single context, e.g., Allen Brain Atlas

Striatum Hypothalamus Olfactory bulb Cerebral cortex Brain Brain region Data source Vadim Astakhov, Keppler Workflow Engine NIF landscape analysis

How much of the landscape do we have? Query for “reference” brain structures and their parts in NIF Connectivity database

NIF Reports: Male vs Female Gender bias NIF can start to answer interesting questions about neuroscience research, not just about neuroscience

Embracing duplication: Data Mash ups ~300 PMID’s were common between Brede and SUMSdb Same information; value added Same data; different aspects

Same data: different analysis Chronic vs acute morphine in striatum Drug Related Gene database: extracted statements from figures, tables and supplementary data from published article Gemma: Reanalyzed microarray results from GEO using different algorithms Both provide results of increased or decreased expression as a function of experimental paradigm 4 strains of mice 3 conditions: chronic morphine, acute morphine, saline Mined NIF for all references to GEO ID’s: found small number where the same dataset was represented in two or more databases

How easy was it to compare? Gemma: Gene ID + Gene Symbol DRG: Gene name + Probe ID Gemma: Increased expression/decreased expression DRG: Increased expression/decreased expression But...Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases Analysis: 1370 statements from Gemma regarding gene expression as a function of chronic morphine 617 were consistent with DRG;  over half of the claims of the paper were not confirmed in this analysis Results for 1 gene were opposite in DRG and Gemma 45 did not have enough information provided in the paper to make a judgment NIF annotation standard

Grabbing the long tail of small data Analysis of NIF shows multiple databases with similar scope and content Analysis of NIF shows multiple databases with similar scope and content Many contain partially overlapping data Many contain partially overlapping data Data “flows” from one resource to the next Data “flows” from one resource to the next Data is reinterpreted, reanalyzed or added to Data is reinterpreted, reanalyzed or added to When does it become something else? When does it become something else? Is duplication good or bad? Is duplication good or bad?

Phases of NIF : A survey of what was out there : Strategy for resource discovery NIF Registry vs NIF data federation Ingestion of data contained within different technology platforms, e.g., XML vs relational vs RDF Effective search across semantically diverse sources NIFSTD ontologies : Strategy for data integration Unified views across common sources Mapping of content to NIF vocabularies 2011-present: Data analytics Uniform external data references

Data, not just stories about them! 47/50 major preclinical published cancer studies could not be replicated “The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.” Getting data out sooner in a form where they can be exposed to many eyes and many analyses, and easily compared, may allow us to expose errors and develop better metrics to evaluate the validity of data Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531 “There are no guidelines that require all data sets to be reported in a paper; often, original data are removed during the peer review and publication process. “

A global view of data You (and the machine) have to be able to find it Accessible through the web Annotations You have to be able to use it Data type specified and in a usable form You have to know what the data mean Some semantics Context: Experimental metadata Provenance: Where did the data come from? Reporting neuroscience data within a consistent framework helps enormously

NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Interim PI Amarnath Gupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd, Yale University Perry Miller Luis Marenco Rixin Wang David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech Arun Rangarajan Hans Michael Muller Yuling Li Giorgio Ascoli, George Mason University Sridevi Polavarum Fahim Imam, NIF Ontology Engineer Larry Lui Andrea Arnaud Stagg Jonathan Cachat Jennifer Lawrence Lee Hornbrook Binh Ngo Vadim Astakhov Xufei Qian Chris Condit Mark Ellisman Stephen Larson Willie Wong Tim Clark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer

Concept-based search: search by meaning Search Google: GABAergic neuron Search Google: GABAergic neuron Search NIF: GABAergic neuron Search NIF: GABAergic neuron NIF automatically searches for types of GABAergic neurons NIF automatically searches for types of GABAergic neurons Types of GABAergic neurons