Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.

Similar presentations


Presentation on theme: "Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D."— Presentation transcript:

1 Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.

2 MCW Department of Physiology Human & Molecular Genetics Center http://rgd.mcw.edu

3 Meet the client

4 Rat researchers ask... What tissue is this gene expressed in? What expression data is known for SD (aka SD/NHsd, Harlan Sprague Dawley, Sprague Dawley) rats? Are any of these genes associated with my phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)? Has anyone done any expression studies using congenic rats?

5 Biological Data Warehouse Really important piece of data...

6 Problem... Where, what, when? +

7 (one) Solution? Where, what, when? +

8 How to create the index?

9 Examine One by One? Analysis of anterior pituitary glands of ACI, Copenhagen, and Brown Norway males following treatment with the synthetic estrogen diethylstilbestrol (DES). Copenhagen = COP Brown Norway = BN

10 NCBO ontology services http://bioportal.bioontology.org/annotator

11 Open Biomedical Annotator http://www.bioontology.org/wiki/index.php/Annotator_Web_service

12 Datasets Series Samples Datasets Series Samples Initial Ontologies & Workflow

13 Phase 1 Small Scale Testing

14 http://gminer.mcw.eduhttp://gminer.mcw.edu/ Initial Test Load: 30 Rat Dataset records (GDS) out of 236 32 Series records (GSE) out of 750 587 Sample records (GSM) out of 7288 RubyOnRails web application to view data

15 Parallel Annotation Workflow

16 #Workers # Jobs Time 1 Time 2 Time 3 599911’ 25”11’ 26”11’ 13” 1099910’ 14”10’ 45”10’ 28” 2599910’ 15”10’ 53”10’ 59” #Workers # Jobs Time 1 Time 2 59995’ 50”7’ 19” 109995’ 18”- 259995’ 33”6’ 40”

17 Concurrent Annotation Results AugustOctober

18 Cloud-enabled Workflow?

19 Results/Demo

20 Initial Observations - Synonyms DES Ept6 Searching with synonyms can be great: Ept6 = ACI.COP-(D3Mgh16- D3Rat119)/Shul DES = Diethylystilbestrol

21 Initial Observations - Synonyms Searching with synonyms can cause problems: Estrogen-induced pituitary tumorigenesis = EPT Ethanolaminephosphotransferase activity = EPT

22 Initial Observations 2 Rat Strain symbols AT, AN, AS, A, B, CD G (1000 x g) C (˚C) TX (Abbreviation for Texas)...pituitary gland of the ACI, Copenhagen and Brown Norway Rat....16 month-old Sprague-Dawley females that......expression data from female SD rats with access to lifelong......Strain or Line: F344/NCrl......dahl Salt-sensitive (S) rat and S.R(9)x3A congenic rat.......kidneys from Dahl salt-sensitive males... Train classifier on real strain phrases? Look for relevant neighboring terms?

23 Initial Observations - Anatomy In GEO records Corresponding MA term White Adipose TissueWhite Fat Brown Adipose TissueBrown Fat Ulnar boneUlna bone Skeletal MuscleSet of Skeletal Muscle Anterior PituitaryAnterior Pituitary Gland Calvarial BoneChondrocranium Left VentricleHeart Left Ventricle Potential synonyms that could be added to MA

24 Search Records by Terms

25 Phase 2 All Rat Affy Samples 1 ontology (Anatomy)

26 0 Rat Dataset records (GDS) 479 Series records (GSE) 12,012 Sample records (GSM) Larger scale data load

27 Targeted Indexing Mouse Adult Gross Anatomy Ontology

28 Results/Demo

29 Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb

30 Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb + Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Linking annotations to data

31 Probeset results on GMiner Gabdr

32 Probeset results on GMiner

33 RDF Data integration Triple Store OpenRDF Sesame Virtuoso Open Source Rat Genes & xrefs Probeset to RGD ID Probeset to MA Mouse Anatomy Ontology

34 Ongoing Work on term recognition, strains, etc. Evaluation of Probeset-to-Anatomy results Curation interface to add additional terms RDF formats, Triple Store implementation Integrate Strain and tissue results into RGD

35 Education & Outreach

36 Meet the student

37 You! Heavy Scientific Problem Ontologies More knowledge through education = bigger lever! Researchers

38

39

40

41 Video #3 is being shot this week

42 Future Videos Target is the scientist! Solve common tasks Use annotation tools Evaluate annotations Intro to specific ontologies Interview ontology teams Ideas? What does your community need?

43 Acknowledgements Joey Geiger - Development of GMiner Jennifer Smith - Video creation, data curation Rajni Nigam - Rat Strain Ontology Clement Jonquet - NCBO OBA tools Trish Whetzel - Video script feedback Mark Musen & NIH Roadmap Initiative - Our Funding!

44 Links http://twigger.hmgc.mcw.edu/ncbo/ Project webpage http://gminer.mcw.edu Web application http://github.com/mcwbbc/gminer Gminer Code http://github.com/mcwbbc/gminer http://github.com/simont/MCW-RDF RDFizer codeF RDFizer code simont@mcw.edu


Download ppt "Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D."

Similar presentations


Ads by Google