Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of.

Slides:



Advertisements
Similar presentations
Svetlin Nakov Director Training and Consulting Activities National Academy for Software Development (NASD) ASP.NET 3.5 New Features.
Advertisements

Broadband Session Michael Byrne. Broadband Map Technical Details Data Integration Map Presentation Since Launch.
TDW Teams Presenter : Yi-Hsuan Chen Contact : National Center for High-performance Computing, Taiwan Date: 14/07/2009 A Distributed Architecture.
Welcome to Middleware Joseph Amrithraj
XProtect® Web Client 1 Product presentation.
Web GIS Oregon Explorer Marc G Rempel Oregon State University The Valley Library Oregon Explorer
11 Decembre 2000V. Breton Milan WP6 DataGRID meeting Biological applications in testbed 0 Evaluate GRID added value for handling biological data –What.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
A pilot application 12/9/2008Microsoft eScience Workshop 2008 Robert Bukowski and Jarek Pillardy Computational Biology Service Unit Cornell University.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
Engineering the Cloud Andrew McCombs March 10th, 2011.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
TC2-Computer Literacy Mr. Sencer February 8, 2010.
Internet sources WEB-BASED GENOME BROWSER USING AJAX AND CANVAS TECHNOLOGIES T.F.Valeev 1,2, N.Tolstykh 1, F.A.Kolpakov 1,3 1 Institute of System Biology,
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Computing For Biology An online course for A-level students Runs 18 th to 29 th August 2014 TCGATTCCAGAACTAGGCATTATAGATAGATTCAG ATAGGACATAGATCGATTCAGATAGGATATAATCG.
Santosh Ghimire – 066 BCT 533 Subit Raj Pokharel – 066 BCT 538 Sudip Kafle – 066 BCT
1 UTGB Shell An Open-Source Browser Framework for the Integration of Biological Data Taro L. Saito, Shin Sasaki, Budrul Ahsan and.
Internal Guide: Prof S M Narayana By: Meghana(1MS07CS049) Padmavathi T(1MS07CS057) Priyanka A L(1MS07CS069) Sandeep Kumar B(1MS07CS082)
What is SGN? S GN is a rapidly evolving comparative resource for the plants of the Solanaceae family, which includes important crop and model plants such.
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
Matrix Mapping Tool Sam Gross Internship at Virtual Technology Corporation.
Pi In The Sky (Web Interface) Gaston Seneza Philander Smith College, Little Rock, AR SIParCS Intern Mentors: Dr. Richard Loft & Dr. Raghu Raj Kumar 1.
Chapter 13-Tools for the World Wide Web. Overview Web servers. Web browsers. Web page makers and site builders. Plug-ins and delivery vehicles. Beyond.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
The 2000 Decennial Census School District Project: Using Census Data for the School District Mapping System **** Development and Implementation Tai A.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Printed by Web Browser Comparison By: Gustavo Marrero & Ignacio Pérez Universidad Interamericana de Puerto Rico   In our experiment.
Asynchronous Interactive Design of Web Applications: Real-time SIP Message Monitoring System using AJAX Student: Yan-Hsiang Wang Advisor: Dr. Quincy Wu.
Denovo genome assembly and analysis
-- Don Preuss NCBI/NLM/NIH
Microsoft Silverlight An Introduction. Silverlight is a cross-browser, cross-platform plug-in* * An auxiliary program that works with a software package.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Are you getting the benefits from ASP.NET and AJAX? Introduction to the CTC ASP.NET Webforms Generator.
Jodi Humann, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Sook Jung, Jill Wegrzyn, David Neale and Dorrie Main An easy to use, web-based solution for specialty.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
MAST Users Group – June 29, 2007 MAST Team:  cmo Pat Brown  cmo Alberto Conti  Tony Rogers  Bernie Shiao  Myron Smith  Shui-Ay Tseng  *A. Volpicelli.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Processware 2016 Tech Launch. Welcome ! Technical Pre-Launch event for Processware 2016 First hotlab session Format of today Some talking and slides Break.
Quicksoft Project Team 6 Team members: Brian H Johnson Brannen J Sorem Kenneth Ng, Project Manager Michael Puzon, QA Catherine Gamboa, UI lead.
Ajax for Dynamic Web Development Gregory McChesney.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Printed by Web Browser Comparison By: Gustavo Marrero & Ignacio Pérez Universidad Interamericana de Puerto Rico The purpose of this.
Skill Area 214 Introduce World wide web(www)
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
JavaScript and Ajax (Internet Background) Week 1 Web site:
IPS Infrastructure Technological Overview of Work Done.
Gene_identifier color_no gtm1_mouse 2 gtm2_mouse 2 >fasta_format_description_line >GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (GSTM1-1) PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKI.
Web Page Designing With Dreamweaver MX\Session 1\1 of 9 Session 1 Introduction to PHP Hypertext Preprocessor - PHP.
Chapter 4 COMPUTER SOFTWARE. Objective Describe several important trends occurring in computer software. Explain the purpose of several popular software.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
REU Summer 2014 A Video Interface For Elderly Patients to Review Depth Videos Over Network Gary Grimm Dr.Skubic Zac Crane Moein Enayat i.
Intermountain West Data Warehouse - Western Air Quality Study
Web Programming Language
Andrew McCombs March 10th, 2011
BTEC NCF Dip in Comp - Unit 15 Website Development Lesson 05 – Website Performance Mr C Johnston.
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
How to Install Microsoft Office 2013?
Communication and Information Resource Centre Administrator
Explore Evolution: Instrument for Analysis
Web Application Development Using PHP
Presentation transcript:

Developing Accessible Application Software for Individual de novo Genome Projects Vince Forgetta, PhD Candidate Ken Dewar PhD, Supervisor Department of Human Genetics, McGill University Montreal, Quebec, Canada December 8 th, 2011

Next-Gen Gap Bacterial genome in < 1 week for ~ $3000 (Nature Methods 6, S2 - S5 (2009)) (Genome Assembly)+ “Unfortunately, the software and computer hardware demands on these analyses are not much less than those of the large Genome Centers. From this perspective, the gap between large-scale genome centers and individual investigators may seem to be growing, not shrinking, as the next-generation platforms’ apparent promise of a ‘Genome Center in a box’ may have only been half delivered, providing data without a full suite of tools.” Download Data Learn *NIX Install Software and Dependencies Run Software … Wait? … Problems?

Three Common Methodologies in de novo Genome Analysis 3 1.Display and analysis of genome annotations 2.Quality assessment of a genome assembly 3.Comparison and mining of genomic data from public repositories. ProjectSoftwareMethodology C. difficile 14 Genome Comparisoncgb1. Genome Display Multi-centre WGS of O. novo-ulmiContiGo2. Assembly QA E. fergusonii ECD-227BLAST in Pivot3. Data Mining One or more methodologies used to address needs in three specific projects; projects used as a vehicle to develop software:

Assembly Quality Assessment

Assembly Analysis Researchers should have easy access to determine quality and perform simple analysis. Researcher Sequencing Centre DNA Assembly Delays and limits on data access exist: - Viewers need to be installed and have specific software (e.g. Linux) or hardware requirements (e.g. RAM). - Assembly data (multiple GBs) must be downloaded.

Objective Develop a simple assembly viewer that operates within a web-browser, allowing a researcher to rapidly analyze and access their data.

Method Parser/Converter: Used python to parse, analyze, and convert assembly data into web accessible formats (HTML, JSON, JPG images) which are stored on sequence centre servers. Interface: Use browser-based interface (HTML) to dynamically access data (Javascript) on servers. Incorporates pre-existing web- technologies (JQuery, Seadragon Deepzoom AJAX). Usage: - after genome assembly, parser/converter is run on sequencing center servers - researcher accesses interface over the internet using a modern web browser

Performance Parser/Converter: – Multiple platforms (Windows/OS X/Linux) – Multi-processor support. – Low memory usage (< 250Mb of memory per processor). User interface: – Client-side programming  decreased server load – Data is downloaded is on-demand  limited bandwidth users. – Sole system requirement: a modern web-browser (Firefox, Opera, Google Chrome)  ease of installation. – Low memory usage (peaks at ~ 250 Mb).

The Interface Table of contig/scaffold statistics: Sortable/Filter by column Access to contig sequence/quality and read sequences. Assembly statistics, batch download of sequence and statistical data. Dynamic Charts: toggle axis value identify points summarize regions Contig Assembly: -Pan/Zoom - Identify position, read names, mismatches

Demo

3. Data Mining

Microsoft Research Summer Internship Microsoft Biology Foundation Redmond, Washington, USA Mentor - Simon Mercer Microsoft Research Summer Internship Microsoft Biology Foundation Redmond, Washington, USA Mentor - Simon Mercer BLAST Pivot blip.codeplex.com

BLAST ACGTCACTGACTG ACTAGCTAGCTAG CTAGCATCGATCG ATCGATCGATCGA TCGACGTAACTAG CACGACTGACTCT ? ? Species, Function, … Species, Function, … NCBI Local blip.codeplex.com

Limitation = = + + ~5000 genes E. coli Scientist Programmer >gi| |ref|ZP_ | TIM-barrel protein, nifR3 family [Escherichia coli MS 78-1] Length=321 Score = bits (1503), Expect = E-165 Identities = 280/281 (100%), Positives = 280/281 (100%), Gaps = 0/281 (0%) Frame = 0 Query 1 MMSSNPQVWESDKSRLRMVHIDEPGIRTVQIAGSDPKEMADAARINVESGAQIIDINMGC 60 MMSSNPQVWESDKSRLRMVHIDEPGIRTVQIAGSDPKEMADAARINVESGAQIIDINMGC Sbjct 41 MMSSNPQVWESDKSRLRMVHIDEPGIRTVQIAGSDPKEMADAARINVESGAQIIDINMGC 100 Query 61 PAKKVNRKLAGSALLQYPDVVKSILTEVVNAVDVPVTLKIRTGWAPEHRNCEEIAQLAED 120 PAKKVNRKLAGSALLQYPDVVKSILTEVVN VDVPVTLKIRTGWAPEHRNCEEIAQLAED Sbjct 101 PAKKVNRKLAGSALLQYPDVVKSILTEVVNTVDVPVTLKIRTGWAPEHRNCEEIAQLAED 160 Query 121 CGIQALTIHGRTRACLFNGEAEYDSIRAVKQKVSIPVIANGDITDPLKARAVLDYTGADA 180 CGIQALTIHGRTRACLFNGEAEYDSIRAVKQKVSIPVIANGDITDPLKARAVLDYTGADA Sbjct 161 CGIQALTIHGRTRACLFNGEAEYDSIRAVKQKVSIPVIANGDITDPLKARAVLDYTGADA 220 Query 181 LMIGRAAQGRPWIFREIQHYLDTGELLPPLPLAEVKRLLCAHVRELHDFYGPAKGYRIAR 240 LMIGRAAQGRPWIFREIQHYLDTGELLPPLPLAEVKRLLCAHVRELHDFYGPAKGYRIAR Sbjct 221 LMIGRAAQGRPWIFREIQHYLDTGELLPPLPLAEVKRLLCAHVRELHDFYGPAKGYRIAR 280 Query 241 KHVSWYLQEHAPNDQFRRTFNAIEDASEQLEALEAYFENFA 281 KHVSWYLQEHAPNDQFRRTFNAIEDASEQLEALEAYFENFA Sbjct 281 KHVSWYLQEHAPNDQFRRTFNAIEDASEQLEALEAYFENFA 321 blip.codeplex.com

Blast in Pivot 2 3 ACGTCACTGACTG ACTAGCTAGCTAG CTAGCATCGATCG ATCGATCGATCGA TCGACGTAACTAG CACGACTGACTCT ? ? BLAST Pivot ACGTCACTGACTG ACTAGCTAGCTAG CTAGCATCGATCG ATCGATCGATCGA TCGACGTAACTAG CACGACTGACTCT ? ? ? ? 1 blip.codeplex.com

E. coli ECD227 E. coli ????? E. coli ECD-227 Acknowledgement Moussa Diarra, Heidi Rempel Species? Function? Antibiotic Resistant! Divergent Strain blip.codeplex.com

Demo

Conclusions  ContiGo: used by clients of the Genome Centre at McGill (release soon).  BL!P: >500 downloads (blip.codeplex.com). 18

C. difficile Ken Dewar Andre Dascal Matthew Oughton Joana Dias Gary Leveque Pascale Marquis Corina Nagy Amelie Villeneuve Ivan Brukner, Mark Miller Vivian Loo Mike Mulvey Dale Gerding Maya Rupnik Elaine Mardis V. Magrini M. Hickenbotham K. Haub C. Markovic J. Nelson 19 Ophiostoma novo-ulmi Jan Kieleczawa Michael Zianni Robert Steen Deborah Grove Anoja Perera Robert Lyons Jr. Sushmita Singh Doug Bintzler Scottie Adams Deborah Grove Gregory Grove Robert Lyons Jr. Suzanne Genik Chris Wright Alvaro Hernandez Sharon Bachman Lorie Hetrick Sushmita Singh Nichole Peterson Gary Leveque Joana Dias Clotilde Teiling Tim Harkins E. coli ECD-227 H. Rempel Andrew Metcalfe M. S. Diarra BL!P/Microsoft Simon Mercer Xin-Yi Chua Mauro Luigi Drago Beatriz Diaz Acosta Vivek Kumar Bob Davidson Mike Zyskowski Xiaoji Chen Bob Silverstein Vikram Bapat Jared Jackson Wei Lu The Pivot Team Acknowledgements