Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.

Slides:



Advertisements
Similar presentations
BioPortal: A Web Repository and Services for Biomedical Ontologies and Data Resources Natasha Noy and the BioPortal team Stanford Center for Biomedical.
Advertisements

Kino : Making Semantic Annotations Easier Ajith Ranabahu #, Priti Parikh #, Maryam Panahiazar #, Amit Sheth # and Flora Logan- Klumpler* # Ohio Center.
NCBO-I2B2 Collaboration Overview and Use Cases Nigam Shah
Bioinformatics (and Systems Biology?) in Biomedical Research Donald Dunbar Systems Biology Club 30th November 2005.
The National Center for Biomedical Ontology Online Knowledge Resources for the Industrial Age Mark A. Musen Stanford University
Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms Brian D. Halligan, Ph.D. Medical.
Gene Set Enrichment Analysis (GSEA)
1 Recognition Assessment Questions Answered. 2 What is Recognition Assessment? It is not an exam or test. It looks at the candidate’s industry skills.
Disease Portals A Platform for Genetic and Genomic Research Disease and Phenotype Data in the Context of the Genome Victoria Petri, Mary Shimoyama, Andrew.
Linked Sensor Data Harshal Patni, Cory Henson, Amit P. Sheth Ohio Center of Excellence in Knowledge enabled Computing (Kno.e.sis) Wright State University,
THE NATIONAL CENTER FOR BIOMEDICAL ONTOLOGY Ontology-based Tools to Enhance Data Curation Trish Whetzel, PhD Outreach Coordinator December 9, 2010.
Integrating Literature and Experimental Data Fan Meng, Ph.D. Microarray Laboratory Psychiatry Department and Molecular & Behavioral Neuroscience Institute.
Overview of The Operations Research Modeling Approach.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
VIVO: Enabling National Networking of Scientists Michael Conlon, PhD Principal Investigator
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Striving for Quality Using continuous improvement strategies to increase program quality, implementation fidelity and durability Steve Goodman Director.
OFC 200 Microsoft Solution Accelerator for Intranets Scott Fynn Microsoft Consulting Services National Practices.
Gene Expression Omnibus (GEO)
Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
W3C Life Science Ontology Issues Session on Triples and Ontologies.
UPData A data curation experiment at the University of Porto using DSpace João Rocha da SilvaFEUP Cristina RibeiroDEI- FEUP / INESC-Porto João Correia.
September 30, 2002EON 2002Slide 1 Integrating Ontology Storage and Ontology-based Applications A lesson for better evaluation methodology Peter Mika:
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
RGD Demo ISMB Scotland 8/03/04 Rat Genome Database RGD Dean Pasko Norie de la Cruz.
Resource Curation and Automated Resource Discovery.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD
Gene expression analysis
The GUDMAP Database: An Online Resource for Genitourinary Research Dr. Simon Harding Stem Cells & Bioinformatics 22 nd September 2009.
PGA Workshop August 2003 Rat Genome Database an introduction Simon N. Twigger, Ph.D. Bioinformatics Research Center Medical College of Wisconsin, Milwaukee.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Copyright OpenHelix. No use or reproduction without express written consent1.
Core 2: Bioinformatics NCBO-Berkeley. Core 2 Specific Aims 1.Apply ontologies  Software toolkit for describing and classifying data 2.Capture, manage,
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
Sage Congress 2012 Session 1: Synapse Michael Kellen, PhD Director of Technology, Sage Bionetworks SYNAPSE SHARED COLLABORATION SPACE GITHUB.
PRO and the NIF / ImmPort Antibody Registries Alexander Diehl Protein Ontology Workshop 6/18/14.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Copyright OpenHelix. No use or reproduction without express written consent1.
Mapping to Ontologies Nigam Shah
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Genetic Literature Curation at FlyBase-Cambridge Steven Marygold ABC meeting, December 2007 A Database of.
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
Cyril Pommier et al. / Feedback from the RDA and WheatIS recommendations for Wheat Data Interoperability Adoption of the Wheat Data Interoperability Guidelines.
EBI is an Outstation of the European Molecular Biology Laboratory. Semantic Interoperability Framework Sarala M. Wimalaratne (RICORDO project)
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Ontology Web Services from the National Center for Biomedical Ontology Mark Musen and Nigam Shah {musen,
Towards a unified MOD resource: An Overview
Using NCBO Web services
Stanford University, Stanford, CA, USA
Exploiting semantic technologies to build an application ontology
Using ArrayExpress.
Collaborating with the National Center for Biomedical Ontology
Neil A. Ernst, Margaret-Anne Storey, Polly Allen, Mark Musen
Department of Genetics • Stanford University School of Medicine
Gene Expression Omnibus (GEO)
“If you give a mouse a cookie” Format
Session 1: WELCOME AND INTRODUCTIONS
Presentation transcript:

Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.

MCW Department of Physiology Human & Molecular Genetics Center

Meet the client

Rat researchers ask... What tissue is this gene expressed in? What expression data is known for SD (aka SD/NHsd, Harlan Sprague Dawley, Sprague Dawley) rats? Are any of these genes associated with my phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)? Has anyone done any expression studies using congenic rats?

Biological Data Warehouse Really important piece of data...

Problem... Where, what, when? +

(one) Solution? Where, what, when? +

How to create the index?

Examine One by One? Analysis of anterior pituitary glands of ACI, Copenhagen, and Brown Norway males following treatment with the synthetic estrogen diethylstilbestrol (DES). Copenhagen = COP Brown Norway = BN

NCBO ontology services

Open Biomedical Annotator

Datasets Series Samples Datasets Series Samples Initial Ontologies & Workflow

Phase 1 Small Scale Testing

Initial Test Load: 30 Rat Dataset records (GDS) out of Series records (GSE) out of Sample records (GSM) out of 7288 RubyOnRails web application to view data

Parallel Annotation Workflow

#Workers # Jobs Time 1 Time 2 Time ’ 25”11’ 26”11’ 13” ’ 14”10’ 45”10’ 28” ’ 15”10’ 53”10’ 59” #Workers # Jobs Time 1 Time ’ 50”7’ 19” ’ 18” ’ 33”6’ 40”

Concurrent Annotation Results AugustOctober

Cloud-enabled Workflow?

Results/Demo

Initial Observations - Synonyms DES Ept6 Searching with synonyms can be great: Ept6 = ACI.COP-(D3Mgh16- D3Rat119)/Shul DES = Diethylystilbestrol

Initial Observations - Synonyms Searching with synonyms can cause problems: Estrogen-induced pituitary tumorigenesis = EPT Ethanolaminephosphotransferase activity = EPT

Initial Observations 2 Rat Strain symbols AT, AN, AS, A, B, CD G (1000 x g) C (˚C) TX (Abbreviation for Texas)...pituitary gland of the ACI, Copenhagen and Brown Norway Rat month-old Sprague-Dawley females that......expression data from female SD rats with access to lifelong......Strain or Line: F344/NCrl......dahl Salt-sensitive (S) rat and S.R(9)x3A congenic rat kidneys from Dahl salt-sensitive males... Train classifier on real strain phrases? Look for relevant neighboring terms?

Initial Observations - Anatomy In GEO records Corresponding MA term White Adipose TissueWhite Fat Brown Adipose TissueBrown Fat Ulnar boneUlna bone Skeletal MuscleSet of Skeletal Muscle Anterior PituitaryAnterior Pituitary Gland Calvarial BoneChondrocranium Left VentricleHeart Left Ventricle Potential synonyms that could be added to MA

Search Records by Terms

Phase 2 All Rat Affy Samples 1 ontology (Anatomy)

0 Rat Dataset records (GDS) 479 Series records (GSE) 12,012 Sample records (GSM) Larger scale data load

Targeted Indexing Mouse Adult Gross Anatomy Ontology

Results/Demo

Linking annotations to data Tm2d1 RGD Svs4 Hbb Scgb2a1 Alb

Tm2d1 RGD Svs4 Hbb Scgb2a1 Alb + Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Linking annotations to data

Probeset results on GMiner Gabdr

Probeset results on GMiner

RDF Data integration Triple Store OpenRDF Sesame Virtuoso Open Source Rat Genes & xrefs Probeset to RGD ID Probeset to MA Mouse Anatomy Ontology

Ongoing Work on term recognition, strains, etc. Evaluation of Probeset-to-Anatomy results Curation interface to add additional terms RDF formats, Triple Store implementation Integrate Strain and tissue results into RGD

Education & Outreach

Meet the student

You! Heavy Scientific Problem Ontologies More knowledge through education = bigger lever! Researchers

Video #3 is being shot this week

Future Videos Target is the scientist! Solve common tasks Use annotation tools Evaluate annotations Intro to specific ontologies Interview ontology teams Ideas? What does your community need?

Acknowledgements Joey Geiger - Development of GMiner Jennifer Smith - Video creation, data curation Rajni Nigam - Rat Strain Ontology Clement Jonquet - NCBO OBA tools Trish Whetzel - Video script feedback Mark Musen & NIH Roadmap Initiative - Our Funding!

Links Project webpage Web application Gminer Code RDFizer codeF RDFizer code