Presentation on theme: "Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of."— Presentation transcript:
Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of Human Genetics, McGill University Montreal, Quebec, Canada December 8 th, 2011
Next-Gen Gap Bacterial genome in < 1 week for ~ $3000 (Nature Methods 6, S2 - S5 (2009)) (Genome Assembly)+ “Unfortunately, the software and computer hardware demands on these analyses are not much less than those of the large Genome Centers. From this perspective, the gap between large-scale genome centers and individual investigators may seem to be growing, not shrinking, as the next-generation platforms’ apparent promise of a ‘Genome Center in a box’ may have only been half delivered, providing data without a full suite of tools.” Download Data Learn *NIX Install Software and Dependencies Run Software … Wait? … Problems?
Three Common Methodologies in de novo Genome Analysis 3 1.Display and analysis of genome annotations 2.Quality assessment of a genome assembly 3.Comparison and mining of genomic data from public repositories. ProjectSoftwareMethodology C. difficile 14 Genome Comparisoncgb1. Genome Display Multi-centre WGS of O. novo-ulmiContiGo2. Assembly QA E. fergusonii ECD-227BLAST in Pivot3. Data Mining One or more methodologies used to address needs in three specific projects; projects used as a vehicle to develop software:
Assembly Quality Assessment
Assembly Analysis Researchers should have easy access to determine quality and perform simple analysis. Researcher Sequencing Centre DNA Assembly Delays and limits on data access exist: - Viewers need to be installed and have specific software (e.g. Linux) or hardware requirements (e.g. RAM). - Assembly data (multiple GBs) must be downloaded.
Objective Develop a simple assembly viewer that operates within a web-browser, allowing a researcher to rapidly analyze and access their data.
Performance Parser/Converter: – Multiple platforms (Windows/OS X/Linux) – Multi-processor support. – Low memory usage (< 250Mb of memory per processor). User interface: – Client-side programming decreased server load – Data is downloaded is on-demand limited bandwidth users. – Sole system requirement: a modern web-browser (Firefox, Opera, Google Chrome) ease of installation. – Low memory usage (peaks at ~ 250 Mb).
The Interface Table of contig/scaffold statistics: Sortable/Filter by column Access to contig sequence/quality and read sequences. Assembly statistics, batch download of sequence and statistical data. Dynamic Charts: toggle axis value identify points summarize regions Contig Assembly: -Pan/Zoom - Identify position, read names, mismatches
3. Data Mining
Microsoft Research Summer Internship Microsoft Biology Foundation Redmond, Washington, USA Mentor - Simon Mercer Microsoft Research Summer Internship Microsoft Biology Foundation Redmond, Washington, USA Mentor - Simon Mercer BLAST Pivot blip.codeplex.com
E. coli ECD227 E. coli ????? E. coli ECD-227 Acknowledgement Moussa Diarra, Heidi Rempel Species? Function? Antibiotic Resistant! Divergent Strain blip.codeplex.com
Conclusions ContiGo: used by clients of the Genome Centre at McGill (release soon). BL!P: >500 downloads (blip.codeplex.com). 18
C. difficile Ken Dewar Andre Dascal Matthew Oughton Joana Dias Gary Leveque Pascale Marquis Corina Nagy Amelie Villeneuve Ivan Brukner, Mark Miller Vivian Loo Mike Mulvey Dale Gerding Maya Rupnik Elaine Mardis V. Magrini M. Hickenbotham K. Haub C. Markovic J. Nelson 19 Ophiostoma novo-ulmi Jan Kieleczawa Michael Zianni Robert Steen Deborah Grove Anoja Perera Robert Lyons Jr. Sushmita Singh Doug Bintzler Scottie Adams Deborah Grove Gregory Grove Robert Lyons Jr. Suzanne Genik Chris Wright Alvaro Hernandez Sharon Bachman Lorie Hetrick Sushmita Singh Nichole Peterson Gary Leveque Joana Dias Clotilde Teiling Tim Harkins E. coli ECD-227 H. Rempel Andrew Metcalfe M. S. Diarra BL!P/Microsoft Simon Mercer Xin-Yi Chua Mauro Luigi Drago Beatriz Diaz Acosta Vivek Kumar Bob Davidson Mike Zyskowski Xiaoji Chen Bob Silverstein Vikram Bapat Jared Jackson Wei Lu The Pivot Team Acknowledgements