Presentation is loading. Please wait.

Presentation is loading. Please wait.

Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego.

Similar presentations


Presentation on theme: "Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego."— Presentation transcript:

1 Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

2 “Neural Choreography” “A grand challenge in neuroscience is to elucidate brain function in relation to its multiple layers of organization that operate at different spatial and temporal scales. Central to this effort is tackling “neural choreography” -- the integrated functioning of neurons into brain circuits--their spatial organization, local and long-distance connections, their temporal orchestration, and their dynamic features. Neural choreography cannot be understood via a purely reductionist approach. Rather, it entails the convergent use of analytical and synthetic tools to gather, analyze and mine information from each level of analysis, and capture the emergence of new layers of function (or dysfunction) as we move from studying genes and proteins, to cells, circuits, thought, and behavior.... However, the neuroscience community is not yet fully engaged in exploiting the rich array of data currently available, nor is it adequately poised to capitalize on the forthcoming data explosion. “ Akil et al., Science, Feb 11, 2011

3 NIF is an initiative of the NIH Blueprint consortium of institutes NIF is an initiative of the NIH Blueprint consortium of institutes What types of resources (data, tools, materials, services) are available to the neuroscience community? What types of resources (data, tools, materials, services) are available to the neuroscience community? How many are there? How many are there? What domains do they cover? What domains do they not cover? What domains do they cover? What domains do they not cover? Where are they? Where are they? Web sites Web sites Databases Databases Literature Literature Supplementary material Supplementary material Who uses them? Who uses them? Who creates them? Who creates them? How can we find them? How can we find them? How can we make them better in the future? How can we make them better in the future? http://neuinfo.org PDF files PDF files Desk drawers Desk drawers

4 How many resources are there? NIF Registry: A catalog of neuroscience-relevant resources > 4800 currently listed > 2000 databases And we are finding more every day NIF Registry: A catalog of neuroscience-relevant resources > 4800 currently listed > 2000 databases And we are finding more every day

5 The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience A portal for finding and using neuroscience resources  A consistent framework for describing resources  Provides simultaneous search of multiple types of information, organized by category  Supported by an expansive ontology for neuroscience  Utilizes advanced technologies to search the “hidden web” http://neuinfo.org UCSD, Yale, Cal Tech, George Mason, Washington Univ Supported by NIH Blueprint Literature Database Federation Registry

6 What are the connections of the hippocampus? Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn” Query expansion: Synonyms and related concepts Boolean queries Query expansion: Synonyms and related concepts Boolean queries Data sources categorized by “data type” and level of nervous system Common views across multiple sources Tutorials for using full resource when getting there from NIF Link back to record in original source

7 Results are organized within a common framework Connects to Synapsed with Synapsed by Input region innervates Axon innervates Projects to Cellular contact Subcellular contact Source site Target site Each resource implements a different, though related model; systems are complex and difficult to learn, in many cases

8 The scourge of neuroanatomical nomenclature NIF Connectivity: 6 databases containing connectivity primary data or claims Brain Architecture Management System (rodent) Connectome Wiki (human) Brain Maps (various) CoCoMac (primate cortex) UCLA Multimodal database (Human fMRI) Avian Brain Connectivity Database (Bird) Total: 1800 unique brain terms (exluding Avian) Number of exact terms used in > 1 database: 42 Number of synonym matches: 99 Number of partonomy matches: 385 The INCF is working with NIF to develop semantic and spatial strategies for translating anatomy across information systems

9 What is an ontology? Brain Cerebellum Purkinje Cell Layer Purkinje cell neuron has a is a Ontology: an explicit, formal representation of concepts relationships among them within a particular domain that expresses human knowledge in a machine readable form Branch of philosophy: a theory of what is e.g., Gene ontologies Provide universals for navigating across different data sources Semantic “index” Provide the basis for concept-based queries to probe and mine data Perform reasoning Link data through relationships not just one- to-one mappings

10 PONS program Structural Lexicon Taskforce Structural Lexicon Taskforce Concentrate on Human, Non-human Primate, Rat and Mouse Concentrate on Human, Non-human Primate, Rat and Mouse Define structural concepts from level of organ to macromolecular complexes Define structural concepts from level of organ to macromolecular complexes Provide a set of criteria by which structures can be identified Provide a set of criteria by which structures can be identified Neuronal Registry Taskforce Neuronal Registry Taskforce Establish conventions for naming new types of neurons Establish conventions for naming new types of neurons Establish a standard set of properties to define neurons Establish a standard set of properties to define neurons Create a Neuron Registry for registering new types of neurons Create a Neuron Registry for registering new types of neurons Deployment and representation (Alan Ruttenberg) Deployment and representation (Alan Ruttenberg) Brought together ontologists working across scales Brought together ontologists working across scales Courtesy of Chris Mungall, Lawrence Berkeley Labs ***Not about imposing a single view of anatomy; about making concepts computable and being able to translate among views

11 NeuroLex Wiki http://neurolex.orgStephen Larson Provide a simple framework for defining the concepts required Cell, Part of brain, subcellular structure, molecule Community based: Avian neuroanatomy Fly neurons (England) Neuroimaging terms Brain regions identified by text mining Creating a computable index for neuroscience data INCF working to coordinate Wiki efforts underway at Allen Institute, Blue Brain and Neurolex Demo D03

12 Comparison of traffic to NIF Portal vs Neurolex 5000 hits15000 hits Wiki is readily indexed by search engines

13 Neurons in Neurolex INCF building a knowledge base of neurons and their properties via the Neurolex Wiki INCF building a knowledge base of neurons and their properties via the Neurolex Wiki Led by Dr. Gordon Shepherd Led by Dr. Gordon Shepherd Consistent and parseable naming scheme Consistent and parseable naming scheme Knowledge is readily accessible, editable and computable Knowledge is readily accessible, editable and computable Stephen Larson

14 NIF data federation connectivity Brain activation foci Microarray 98% Primary data, secondary data, claims, repositories Recently added: BioNOT literature mining tool; Retraction Watch blog

15 What do you mean by data? Databases come in many shapes and sizes Primary data : Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL) Secondary data Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS) Tertiary data Claims and assertions about the meaning of data E.g., gene upregulation/downregulation, brain activation as a function of task Registries: Metadata Pointers to data sets or materials stored elsewhere Data aggregators Aggregate data of the same type from multiple sources, e.g., Cell Image Library,SUMSdb, Brede Single source Data acquired within a single context, e.g., Allen Brain Atlas

16 Striatum Hypothalamus Olfactory bulb Cerebral cortex Brain Brain region Data source Vadim Astakhov, Keppler Workflow Engine NIF landscape analysis

17 How much of the landscape do we have? Query for “reference” brain structures and their parts in NIF Connectivity database

18 NIF Reports: Male vs Female Gender bias NIF can start to answer interesting questions about neuroscience research, not just about neuroscience

19 Embracing duplication: Data Mash ups ~300 PMID’s were common between Brede and SUMSdb Same information; value added Same data; different aspects

20 Same data: different analysis Chronic vs acute morphine in striatum Drug Related Gene database: extracted statements from figures, tables and supplementary data from published article Gemma: Reanalyzed microarray results from GEO using different algorithms Both provide results of increased or decreased expression as a function of experimental paradigm 4 strains of mice 3 conditions: chronic morphine, acute morphine, saline Mined NIF for all references to GEO ID’s: found small number where the same dataset was represented in two or more databases http://www.chibi.ubc.ca/Gemma/home.html

21 How easy was it to compare? Gemma: Gene ID + Gene Symbol DRG: Gene name + Probe ID Gemma: Increased expression/decreased expression DRG: Increased expression/decreased expression But...Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases Analysis: 1370 statements from Gemma regarding gene expression as a function of chronic morphine 617 were consistent with DRG;  over half of the claims of the paper were not confirmed in this analysis Results for 1 gene were opposite in DRG and Gemma 45 did not have enough information provided in the paper to make a judgment NIF annotation standard

22 Grabbing the long tail of small data Analysis of NIF shows multiple databases with similar scope and content Analysis of NIF shows multiple databases with similar scope and content Many contain partially overlapping data Many contain partially overlapping data Data “flows” from one resource to the next Data “flows” from one resource to the next Data is reinterpreted, reanalyzed or added to Data is reinterpreted, reanalyzed or added to When does it become something else? When does it become something else? Is duplication good or bad? Is duplication good or bad?

23 Phases of NIF 2006-2008: A survey of what was out there 2008-2009: Strategy for resource discovery NIF Registry vs NIF data federation Ingestion of data contained within different technology platforms, e.g., XML vs relational vs RDF Effective search across semantically diverse sources NIFSTD ontologies 2009-2011: Strategy for data integration Unified views across common sources Mapping of content to NIF vocabularies 2011-present: Data analytics Uniform external data references

24 Data, not just stories about them! 47/50 major preclinical published cancer studies could not be replicated “The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.” Getting data out sooner in a form where they can be exposed to many eyes and many analyses, and easily compared, may allow us to expose errors and develop better metrics to evaluate the validity of data Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531 “There are no guidelines that require all data sets to be reported in a paper; often, original data are removed during the peer review and publication process. “

25 A global view of data You (and the machine) have to be able to find it Accessible through the web Annotations You have to be able to use it Data type specified and in a usable form You have to know what the data mean Some semantics Context: Experimental metadata Provenance: Where did the data come from? Reporting neuroscience data within a consistent framework helps enormously

26 NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Interim PI Amarnath Gupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd, Yale University Perry Miller Luis Marenco Rixin Wang David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech Arun Rangarajan Hans Michael Muller Yuling Li Giorgio Ascoli, George Mason University Sridevi Polavarum Fahim Imam, NIF Ontology Engineer Larry Lui Andrea Arnaud Stagg Jonathan Cachat Jennifer Lawrence Lee Hornbrook Binh Ngo Vadim Astakhov Xufei Qian Chris Condit Mark Ellisman Stephen Larson Willie Wong Tim Clark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer

27 Concept-based search: search by meaning Search Google: GABAergic neuron Search Google: GABAergic neuron Search NIF: GABAergic neuron Search NIF: GABAergic neuron NIF automatically searches for types of GABAergic neurons NIF automatically searches for types of GABAergic neurons Types of GABAergic neurons


Download ppt "Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego."

Similar presentations


Ads by Google