MapView: visualization of short reads alignment on a desktop computer

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
Introduction to Short Read Sequencing Analysis
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg 林恩羽 宋曉亞 陳翰平.
Design Goals Crash Course: Reference-guided Assembly.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Overview of Search Engines
NGS Analysis Using Galaxy
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
Presented by Mario Flores, Xuepo Ma, and Nguyen Nguyen.
Customized cloud platform for computing on your terms !
Instructor: Yuzhuang Hu Memory Hierarchy.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Introduction to Short Read Sequencing Analysis
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Massive Parallel Sequencing
Introduction to CacheWorx Lucian Plesea - Esri Robert Jensen - Esri.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Hua Bao Sun Yat-sen University, Guangzhou,
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
VOMegaPlot Efficient Plotting of Large VOTable Datasets.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
billion-piece genome puzzle
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
Efficient SAS programming with Large Data Aidan McDermott Computing Group, March 2007.
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
Supplementary Figure S1. Supplementary Figure S2.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Short Read Workshop Day 5: Mapping and Visualization
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Visualizing data from Galaxy
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Galaxy for analyzing genome data Hardison October 05, 2010
Gennia Michlin, Clinical Data Management Systems (CDMS) Project Leader Mar 2010 New RDC features training.
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Computer Architecture & Operations I
Computing challenges in working with genomics-scale data
Canadian Bioinformatics Workshops
Short Read Sequencing Analysis Workshop
Displacement (Indexed) Stack
William Stallings Computer Organization and Architecture 6th Edition
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Integrative Genomics Viewer (IGV)
Immediate Addressing Mode
MATLAB Distributed, and Other Toolboxes
Bioinformatics Research Group
Lecture 16: Data Storage Wednesday, November 6, 2006.
William Stallings Computer Organization and Architecture 8th Edition
Microsoft Windows Support Number USA
University of Pittsburgh
Department of Computer Science
Quicken File Password related Issues
Quicken File Password related Issues
2nd (Next) Generation Sequencing
CSC2431 February 3rd 2010 Alecia Fowler
A critical evaluation of HTQC: a fast quality control toolkit for Illumina sequencing data Chandan Pal, PhD student Sahlgrenska Academy Institute of.
UCSD / BIRN Coordinating Center NAMIC Group
Explore Evolution: Instrument for Analysis
Maximize read usage through mapping strategies
BF nd (Next) Generation Sequencing
William Stallings Computer Organization and Architecture 8 th Edition Chapter 11 Instruction Sets: Addressing Modes and Formats.
SeeSoft A Visualization Tool..
Apollo: A Sequencing-Technology-Independent, Scalable,
Presentation transcript:

MapView: visualization of short reads alignment on a desktop computer InCoB 2009 MapView: visualization of short reads alignment on a desktop computer Hua Bao Sun Yat-sen University 2009-09-09

Next-generation sequencing Sequencing by synthesis High-throughput (tens of millions reads per lane) Read length is short (25-50bp) Sequencing error rate is relatively higher than Sanger sequencing

Statement of the problem 1. Alignment results: (e.g. , 50M reads) read1 TATCGCACATAGTTCGCG hhhhhhhllhhhh;hA - Chr1 126609 read2 CATACGACACTCATGTAG h,abhhhh;hAhhda, + Chr2 94 2. Reference genome: (e.g. , 500M bp) >Chr1 CGATCGAGCGACAGACGAGCACACGTAGCACTGTGGGGGAA Visualization of large-scale alignment data with super-high computational efficiency.

Computational efficiency Memory usage : Data compressed Fractional loading CPU time : Indexing Pre-computing

File format design MapView format (MVF) : Head Data Index Statistics Basic info of reference and reads Offset of Data, Index and Statistics Compressed sequences Ordered alignments The offset address of data is indexed by reference position Coverage information of reference site Head Data Index Statistics

Loading algorithms MapView file Jump to different region MapView window MapView window Genomic position Using Index Offset address Data Data MapView file

Efficiency of MapView Computational efficiency comparision Tool Version Memory usage CPU times Consed 18.0 12.06 GB 208 s Hawkeye 2.0.8 14.14 GB 296 s EagleView 2.2 3.91 GB 207 s MapView 3.1 0.04 GB 2 s The alignment data for the assessment are of reference length 43 million bp and 6 million Illumina 44-bp reads.

User-friendly Interface

User-friendly Interface

User-friendly Interface

Summary Super-high computational efficiency: Visualization of hundreds of millions reads with 40M memory in 2 seconds. Rich featured and user-friendly: Compact alignment view for both single-end and paired-end short reads, multiple navigation and zoom modes.

Thank you! MapView: visualization of short reads alignment on a desktop computer Thank you! 2009-09-09