Download presentation
Presentation is loading. Please wait.
Published byCarmella Hines Modified over 8 years ago
1
Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.
2
MCW Department of Physiology Human & Molecular Genetics Center http://rgd.mcw.edu
3
Meet the client
4
Rat researchers ask... What tissue is this gene expressed in? What expression data is known for SD (aka SD/NHsd, Harlan Sprague Dawley, Sprague Dawley) rats? Are any of these genes associated with my phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)? Has anyone done any expression studies using congenic rats?
5
Biological Data Warehouse Really important piece of data...
6
Problem... Where, what, when? +
7
(one) Solution? Where, what, when? +
8
How to create the index?
9
Examine One by One? Analysis of anterior pituitary glands of ACI, Copenhagen, and Brown Norway males following treatment with the synthetic estrogen diethylstilbestrol (DES). Copenhagen = COP Brown Norway = BN
10
NCBO ontology services http://bioportal.bioontology.org/annotator
11
Open Biomedical Annotator http://www.bioontology.org/wiki/index.php/Annotator_Web_service
12
Datasets Series Samples Datasets Series Samples Initial Ontologies & Workflow
13
Phase 1 Small Scale Testing
14
http://gminer.mcw.eduhttp://gminer.mcw.edu/ Initial Test Load: 30 Rat Dataset records (GDS) out of 236 32 Series records (GSE) out of 750 587 Sample records (GSM) out of 7288 RubyOnRails web application to view data
15
Parallel Annotation Workflow
16
#Workers # Jobs Time 1 Time 2 Time 3 599911’ 25”11’ 26”11’ 13” 1099910’ 14”10’ 45”10’ 28” 2599910’ 15”10’ 53”10’ 59” #Workers # Jobs Time 1 Time 2 59995’ 50”7’ 19” 109995’ 18”- 259995’ 33”6’ 40”
17
Concurrent Annotation Results AugustOctober
18
Cloud-enabled Workflow?
19
Results/Demo
20
Initial Observations - Synonyms DES Ept6 Searching with synonyms can be great: Ept6 = ACI.COP-(D3Mgh16- D3Rat119)/Shul DES = Diethylystilbestrol
21
Initial Observations - Synonyms Searching with synonyms can cause problems: Estrogen-induced pituitary tumorigenesis = EPT Ethanolaminephosphotransferase activity = EPT
22
Initial Observations 2 Rat Strain symbols AT, AN, AS, A, B, CD G (1000 x g) C (˚C) TX (Abbreviation for Texas)...pituitary gland of the ACI, Copenhagen and Brown Norway Rat....16 month-old Sprague-Dawley females that......expression data from female SD rats with access to lifelong......Strain or Line: F344/NCrl......dahl Salt-sensitive (S) rat and S.R(9)x3A congenic rat.......kidneys from Dahl salt-sensitive males... Train classifier on real strain phrases? Look for relevant neighboring terms?
23
Initial Observations - Anatomy In GEO records Corresponding MA term White Adipose TissueWhite Fat Brown Adipose TissueBrown Fat Ulnar boneUlna bone Skeletal MuscleSet of Skeletal Muscle Anterior PituitaryAnterior Pituitary Gland Calvarial BoneChondrocranium Left VentricleHeart Left Ventricle Potential synonyms that could be added to MA
24
Search Records by Terms
25
Phase 2 All Rat Affy Samples 1 ontology (Anatomy)
26
0 Rat Dataset records (GDS) 479 Series records (GSE) 12,012 Sample records (GSM) Larger scale data load
27
Targeted Indexing Mouse Adult Gross Anatomy Ontology
28
Results/Demo
29
Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb
30
Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb + Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Linking annotations to data
31
Probeset results on GMiner Gabdr
32
Probeset results on GMiner
33
RDF Data integration Triple Store OpenRDF Sesame Virtuoso Open Source Rat Genes & xrefs Probeset to RGD ID Probeset to MA Mouse Anatomy Ontology
34
Ongoing Work on term recognition, strains, etc. Evaluation of Probeset-to-Anatomy results Curation interface to add additional terms RDF formats, Triple Store implementation Integrate Strain and tissue results into RGD
35
Education & Outreach
36
Meet the student
37
You! Heavy Scientific Problem Ontologies More knowledge through education = bigger lever! Researchers
41
Video #3 is being shot this week
42
Future Videos Target is the scientist! Solve common tasks Use annotation tools Evaluate annotations Intro to specific ontologies Interview ontology teams Ideas? What does your community need?
43
Acknowledgements Joey Geiger - Development of GMiner Jennifer Smith - Video creation, data curation Rajni Nigam - Rat Strain Ontology Clement Jonquet - NCBO OBA tools Trish Whetzel - Video script feedback Mark Musen & NIH Roadmap Initiative - Our Funding!
44
Links http://twigger.hmgc.mcw.edu/ncbo/ Project webpage http://gminer.mcw.edu Web application http://github.com/mcwbbc/gminer Gminer Code http://github.com/mcwbbc/gminer http://github.com/simont/MCW-RDF RDFizer codeF RDFizer code simont@mcw.edu
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.