Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of.

Similar presentations


Presentation on theme: "Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of."— Presentation transcript:

1 Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of Human Genetics, McGill University Montreal, Quebec, Canada December 8 th, 2011

2 Next-Gen Gap Bacterial genome in < 1 week for ~ $3000 (Nature Methods 6, S2 - S5 (2009)) (Genome Assembly)+ “Unfortunately, the software and computer hardware demands on these analyses are not much less than those of the large Genome Centers. From this perspective, the gap between large-scale genome centers and individual investigators may seem to be growing, not shrinking, as the next-generation platforms’ apparent promise of a ‘Genome Center in a box’ may have only been half delivered, providing data without a full suite of tools.” Download Data Learn *NIX Install Software and Dependencies Run Software … Wait? … Problems?

3 Three Common Methodologies in de novo Genome Analysis 3 1.Display and analysis of genome annotations 2.Quality assessment of a genome assembly 3.Comparison and mining of genomic data from public repositories. ProjectSoftwareMethodology C. difficile 14 Genome Comparisoncgb1. Genome Display Multi-centre WGS of O. novo-ulmiContiGo2. Assembly QA E. fergusonii ECD-227BLAST in Pivot3. Data Mining One or more methodologies used to address needs in three specific projects; projects used as a vehicle to develop software:

4 Assembly Quality Assessment

5 Assembly Analysis Researchers should have easy access to determine quality and perform simple analysis. Researcher Sequencing Centre DNA Assembly Delays and limits on data access exist: - Viewers need to be installed and have specific software (e.g. Linux) or hardware requirements (e.g. RAM). - Assembly data (multiple GBs) must be downloaded.

6 Objective Develop a simple assembly viewer that operates within a web-browser, allowing a researcher to rapidly analyze and access their data.

7 Method Parser/Converter: Used python to parse, analyze, and convert assembly data into web accessible formats (HTML, JSON, JPG images) which are stored on sequence centre servers. Interface: Use browser-based interface (HTML) to dynamically access data (Javascript) on servers. Incorporates pre-existing web- technologies (JQuery, Seadragon Deepzoom AJAX). Usage: - after genome assembly, parser/converter is run on sequencing center servers - researcher accesses interface over the internet using a modern web browser

8 Performance Parser/Converter: – Multiple platforms (Windows/OS X/Linux) – Multi-processor support. – Low memory usage (< 250Mb of memory per processor). User interface: – Client-side programming  decreased server load – Data is downloaded is on-demand  limited bandwidth users. – Sole system requirement: a modern web-browser (Firefox, Opera, Google Chrome)  ease of installation. – Low memory usage (peaks at ~ 250 Mb).

9 The Interface Table of contig/scaffold statistics: Sortable/Filter by column Access to contig sequence/quality and read sequences. Assembly statistics, batch download of sequence and statistical data. Dynamic Charts: toggle axis value identify points summarize regions Contig Assembly: -Pan/Zoom - Identify position, read names, mismatches

10 Demo

11 3. Data Mining

12 Microsoft Research Summer Internship Microsoft Biology Foundation Redmond, Washington, USA Mentor - Simon Mercer Microsoft Research Summer Internship Microsoft Biology Foundation Redmond, Washington, USA Mentor - Simon Mercer BLAST Pivot blip.codeplex.com

13 BLAST ACGTCACTGACTG ACTAGCTAGCTAG CTAGCATCGATCG ATCGATCGATCGA TCGACGTAACTAG CACGACTGACTCT ? ? Species, Function, … Species, Function, … NCBI Local blip.codeplex.com

14 Limitation = = + + ~5000 genes E. coli Scientist Programmer >gi|301326298|ref|ZP_07219671.1| TIM-barrel protein, nifR3 family [Escherichia coli MS 78-1] Length=321 Score = 583.563 bits (1503), Expect = 8.65371E-165 Identities = 280/281 (100%), Positives = 280/281 (100%), Gaps = 0/281 (0%) Frame = 0 Query 1 MMSSNPQVWESDKSRLRMVHIDEPGIRTVQIAGSDPKEMADAARINVESGAQIIDINMGC 60 MMSSNPQVWESDKSRLRMVHIDEPGIRTVQIAGSDPKEMADAARINVESGAQIIDINMGC Sbjct 41 MMSSNPQVWESDKSRLRMVHIDEPGIRTVQIAGSDPKEMADAARINVESGAQIIDINMGC 100 Query 61 PAKKVNRKLAGSALLQYPDVVKSILTEVVNAVDVPVTLKIRTGWAPEHRNCEEIAQLAED 120 PAKKVNRKLAGSALLQYPDVVKSILTEVVN VDVPVTLKIRTGWAPEHRNCEEIAQLAED Sbjct 101 PAKKVNRKLAGSALLQYPDVVKSILTEVVNTVDVPVTLKIRTGWAPEHRNCEEIAQLAED 160 Query 121 CGIQALTIHGRTRACLFNGEAEYDSIRAVKQKVSIPVIANGDITDPLKARAVLDYTGADA 180 CGIQALTIHGRTRACLFNGEAEYDSIRAVKQKVSIPVIANGDITDPLKARAVLDYTGADA Sbjct 161 CGIQALTIHGRTRACLFNGEAEYDSIRAVKQKVSIPVIANGDITDPLKARAVLDYTGADA 220 Query 181 LMIGRAAQGRPWIFREIQHYLDTGELLPPLPLAEVKRLLCAHVRELHDFYGPAKGYRIAR 240 LMIGRAAQGRPWIFREIQHYLDTGELLPPLPLAEVKRLLCAHVRELHDFYGPAKGYRIAR Sbjct 221 LMIGRAAQGRPWIFREIQHYLDTGELLPPLPLAEVKRLLCAHVRELHDFYGPAKGYRIAR 280 Query 241 KHVSWYLQEHAPNDQFRRTFNAIEDASEQLEALEAYFENFA 281 KHVSWYLQEHAPNDQFRRTFNAIEDASEQLEALEAYFENFA Sbjct 281 KHVSWYLQEHAPNDQFRRTFNAIEDASEQLEALEAYFENFA 321 blip.codeplex.com

15 Blast in Pivot 2 3 ACGTCACTGACTG ACTAGCTAGCTAG CTAGCATCGATCG ATCGATCGATCGA TCGACGTAACTAG CACGACTGACTCT ? ? BLAST Pivot ACGTCACTGACTG ACTAGCTAGCTAG CTAGCATCGATCG ATCGATCGATCGA TCGACGTAACTAG CACGACTGACTCT ? ? ? ? 1 blip.codeplex.com

16 E. coli ECD227 E. coli ????? E. coli ECD-227 Acknowledgement Moussa Diarra, Heidi Rempel Species? Function? Antibiotic Resistant! Divergent Strain blip.codeplex.com

17 Demo

18 Conclusions  ContiGo: used by clients of the Genome Centre at McGill (release soon).  BL!P: >500 downloads (blip.codeplex.com). 18

19 C. difficile Ken Dewar Andre Dascal Matthew Oughton Joana Dias Gary Leveque Pascale Marquis Corina Nagy Amelie Villeneuve Ivan Brukner, Mark Miller Vivian Loo Mike Mulvey Dale Gerding Maya Rupnik Elaine Mardis V. Magrini M. Hickenbotham K. Haub C. Markovic J. Nelson 19 Ophiostoma novo-ulmi Jan Kieleczawa Michael Zianni Robert Steen Deborah Grove Anoja Perera Robert Lyons Jr. Sushmita Singh Doug Bintzler Scottie Adams Deborah Grove Gregory Grove Robert Lyons Jr. Suzanne Genik Chris Wright Alvaro Hernandez Sharon Bachman Lorie Hetrick Sushmita Singh Nichole Peterson Gary Leveque Joana Dias Clotilde Teiling Tim Harkins E. coli ECD-227 H. Rempel Andrew Metcalfe M. S. Diarra BL!P/Microsoft Simon Mercer Xin-Yi Chua Mauro Luigi Drago Beatriz Diaz Acosta Vivek Kumar Bob Davidson Mike Zyskowski Xiaoji Chen Bob Silverstein Vikram Bapat Jared Jackson Wei Lu The Pivot Team Acknowledgements


Download ppt "Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of."

Similar presentations


Ads by Google