When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Proprietary Signal Generation and Imaging Photons Generated Reagent Flow PicoTiterPlate Wells Sequencing By Synthesis 1600K field of addressable wells.
Next-generation sequencing
Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.
Next Generation Sequencing, Assembly, and Alignment Methods
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Henrik Lantz - BILS/SciLife/Uppsala University
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
High Throughput Sequencing
Genome sequencing and assembly Mayo/UIUC Summer Course in Computational Biology Genome sequencing and assembly.
Update on Next-Generation Sequencing
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Next generation sequencing Xusheng Wang 4/29/2010.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Finishing the Human Genome
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
GNUMAP-SNP Nathan Clement The University of Texas Austin, TX, USA.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
1 A Robust Framework for Detecting Structural Variations February 6, 2008 Seunghak Lee 1, Elango Cheran 1, and Michael Brudno 1 1 University of Toronto,
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
The iPlant Collaborative
Towards your own genome. Designing your Sequencing Run Sequencing strategy Genome size and genome.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Jan Pačes Institute of Molecular Genetics AS CR
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
Identification of Copy Number Variants using Genome Graphs
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Human Genome.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
De Novo Genome Assembly - Introduction
The Wellcome Trust Sanger Institute
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Chapter 2 Genetic Variations. Introduction The human genome contains variations in base sequence from one individual to another. Some sequence variants.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer.
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Short Read Sequencing Analysis Workshop
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Denovo genome assembly of Moniliophthora roreri
Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease Using Next- Generation Sequencing  Adrian Y. Tan, Alber Michaeel, Genyan Liu, Olivier.
Next-generation DNA sequencing
Single-Molecule Sequencing: Towards Clinical Applications
Canadian Bioinformatics Workshops
Exon Skipping in IVD RNA Processing in Isovaleric Acidemia Caused by Point Mutations in the Coding Region of the IVD Gene  Jerry Vockley, Peter K. Rogan,
Presentation transcript:

When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012

Outline 1. Introduction Illumina GA/Hiseq Ion Torrent PGM 2. Data processing The accuracy of assembly 3. Application NGS in cancer genomics

Introduction 454 GS FLX Hiseq 2000 SOLiDv4 Sanger 3730xl SE: Single-end reads. PE: Paired-end reads.

Which system is more popular Instrument: $690,000 Cost/Mb: $0.07 Instrument: $68,000 Cost/Mb: $0.63 (318 chip)

Emulsion PCR Ion Torrent PGM system Mineral oil DNA template Beads Contain polymerase+dNTP Beads with no products Adapter P2 Beads with amplified product Adapter P1

Ion Torrent PGM system Ion Torrent PGM finished sequencing an isolate (TY2482) of the outbreak E. coli O104:H4 infection within 72 hours. Seven runs generated 79 Mb of sequence data.

Illumina GA/Hiseq system Later, Illumina Hiseq sequenced the same sample to improve the accuracy. Runs of three libraries generated 2.1 Gb sequence data within 2 weeks.

Comparison in alignment between two systems a use TMAP to align. b use SOAP2 to align. A Rhodobacter sample with high GC content (66%) and 4.2Mb genome was sequenced in Ion Torrent and HiSeq 2000 sequencers.

Comparison in sequencing quality Per base sequence quality of samples generated by FASTQC. The yellow box show the base-calling quality scores across all sequencing reads. The blue line indicates the mean quality score. Q20=99% accuracy. Q30=99.9% accuracy… Illumina Hiseq 2000 Ion Torrent PGM

Comparison in homopolymer error rate Rates of insertions or deletions in homopolymer tracts normalized by homopolymer length. Illumina Hiseq 2000 Ion Torrent PGM

NGS inspired sequencing of new species 13 years; $2.7 billion 5 months; $1.5 million

Challenges in de novo genome assembly How to assemble? High-quality reads A good assembler

Framework of data processing Filter low quality reads Filter or trim adapter reads (optional) Filter PCR duplication reads Remove contaminate reads (mitochondrion, etc) Split tandem repeat reads (di- or tri-) (optional) Remove (Correct) low frequency k-mer reads De novo assembly

Effects of error correction Correcting the reads reduces the number of contigs and scaffolds, increases the contig sizes, and allows the assembler to include more reads SOAPdenovo bee assembly Y-axis: Expected VS observed error rates K-mer occurrence

Assembler evaluated Name Launched year AlgorithmsFeatures ABySS2009De Bruijn graphSmall memory required ALLPATHS-LG2011De Bruijn graphSupport jumping-libraries Bambus22011Repeat detection Designed for polymorphic and metagenomic scaffolding CABOG2008 Overlap-Layout- Consenus Designed for 454 Data MSR-CA2011 De Bruijn graph and Overlap-Layout- Consensus Reduce reads number by grouping SGA2012 Burrows Wheeler Transform Small memory required; mix of short and long reads SOAPdenovo2010De Bruijn graph Designed for short reads; support large k-mer value Velvet2008De Bruijn graphMix of short and long reads

Illustration of assembling based on de Bruijn graphs This method of construction ensures that connected nodes have overlaps of K-1 nucleotides. It can be repeated to construct graphs of a large genome of any size. (a)(a) (b)(b) (c)(c) (d)(d) (e)(e)

Illustration of mis-assemblies A rearrangement style mis-assemblies.(a) Three copy repeat R, with interspersed unique sequences B and C, shown with properly sized and oriented mates. (b) Mis-assembled repeat shown with mis-oriented and expanded mate-pairs. The mis-assembly is caused by co-assembled reads from different repeat copies.

A dot-plot comparison of SOAPdenovo and Velvet scaffolds The finished reference chromosomes are plotted on the x-axis and the assembly scaffolds on the y-axis. Inversion Relocation Insertion Deletion

The best value for each column is shown in bold. Errors = number of misjoins + indel errors >5 bp. Corrected N50 values were computed after correcting contigs and scaffolds by breaking them at each error. Performance of 8 assemblers on R. sphaeroides genomes (~4.60 Mb) SNPs: Single nucleotide polymorphisms. Indels: Insertion and deletion. Inv: Inversions. Reloc: Relocations.

Average contig (a) and scaffold (b) sizes, measured by N50 VS error rates, averaged over all three genomes (S. aureus, R. sphaeroides, and human chromosome 14). In both plots, the best assemblers appear in the upper right. (b)(b) Summary of assemblers performance Comparison of insertion and deletion errors among all eight assemblers for human chromosome 14. Indel errors >5bp were counted. (a)(a)

BRCA mutations associated with a risk of breast cancer Ref BRCA sequences from normal people BRCA sequences from patients Mutation Mapping … …

Variants in exons, introns, and untranslated regions of BRCA1 (a) and BRCA2 (b) Distribution of variants in complete genes of BRCA The data below demonstrates the number of variants detected in BRCA1 and BRCA2. (a)(a) (b)(b)

(a) (c) (b) (d) The variant position is indicated by an arrow, with the corresponding sample, gene, and nucleotide change indicated above each chromatogram. Accuracy of variants detection in NGS data Discrepant nucleotide substitutions between NGS and Sanger sequencing

Conclusion The next-generation sequencing is becoming the now-generation. It has changed the situation of simply sequencing to genome-wide analyses. Its broad usage in sequencing mRNA and non-coding RNA, as well as DNA methylation will reveal the rules of genomic regulation and help to diagnose the genetic diseases.

Illustration of mis-assemblies An inversion style mis-assemblies. (a) Two copy, inverted repeat R, bounding unique sequence B, shown with properly sized and oriented mate-pairs. (b) Mis-assembled repeat shown with mis-oriented mate-pairs.

(a) A G deletion at BRCA1; (b) An A insertion at position at BRCA2; (c) A TG deletion at BRCA2 Identification of indels in BRCA genes by Sanger and NGS (a)(a) (b)(b) (c)

Analysis of 430 genes mutated across seven cancer genomes with DAVID ( Number of mutated genes by GO terms of gene function. Application of NGS in cancer genomics