High Throughput Sequencing

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

RNA-seq library prep introduction
The Past, Present, and Future of DNA Sequencing
An Introduction to Studying Expression Data Through RNA-seq
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Transcriptomics Jiri Zavadil, PhD Molecular Mechanisms and Biomarkers
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Next–generation DNA sequencing technologies – theory & practice
High Throughput Sequencing
SOLiD Sequencing & Data
Peter Tsai Bioinformatics Institute, University of Auckland
Next-generation sequencing
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
Data Analysis for High-Throughput Sequencing
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Greg Phillips Veterinary Microbiology
The Challenges Of Sequencing FFPE DNA Using NGS
The SOLiD System: Next-Generation Sequencing Overview of the SOLiD System –  Scalable  Accurate Ultra High Throughput  Flexible  Mate Pairs.
Introduction to Genetics. Chromosomes Chromosomes are made up of DNA wrapped around proteins. Each chromosome codes for several genes. Each Gene codes.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Expression Analysis of RNA-seq Data
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees
DNA Technology Chapter 20.
Introduction to next generation sequencing Rolf Sommer Kaas.
Agenda Introduction to microarrays
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
I519 Introduction to Bioinformatics, Fall, 2012
RNA Sequencing I: De novo RNAseq
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
SeqWare for NGS analysis MGI meeting, 12/17/2012 Jianying Li.
Introduction to RNAseq
Computational methods for genomics-guided immunotherapy Sahar Al Seesi Computer Science & Engineering Department, UCONN Immunology Department, UCONN Health.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Lecture-5 ChIP-chip and ChIP-seq
No reference available
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Canadian Bioinformatics Workshops
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Library QA & QC Day 1, Video 3
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introduction to Illumina Sequencing
From Reads to Results Exome-seq analysis at CCBR
Interpreting exomes and genomes: a beginner’s guide
Short Read Sequencing Analysis Workshop
Next generation sequencing
Cancer Genomics Core Lab
Introduction to NGS.
Today… Review a few items from last class
2nd (Next) Generation Sequencing
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
RNA sequencing (RNA-Seq) and its application in ovarian cancer
BF nd (Next) Generation Sequencing
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

High Throughput Sequencing

Agenda Introduction to sequencing Applications Bioinformatics analysis pipelines What should you ask yourself before planning the experiment

Introduction to sequencing

What is sequencing? Finding the sequence of a DNA/ RNA molecule What can we sequence? http://cancergenome.nih.gov/newsevents/multimedialibrary/images/CancerBiology

Sanger sequencing Up to 1,000 bases molecule One molecule at a time Widely used from 1970-2000 First human genome draft was based on Sanger sequencing Still in use for single molecules http://www.genomebc.ca/education/articles/sequencing/

High Throughput Sequencing Next Generation Sequencing (NGS) / Massively parallel sequencing Sequencing millions of molecules in parallel Do not need prior knowledge of what you’re sequencing Platform Read length No. of reads per run 454 sequencing Up to 1,000 bp ~1 M (1 million) SOLiD 50-75 bp ~1 G (1 billion) HiSeq 100-150 bp ~0.5 G We will discuss Illumina’s platform only

Sequencing Workflow Extract tissue cells Extract DNA/RNA from cells Sample preparation for sequencing Sequencing Bioinformatics analysis Why is it important to understand the “wet lab” part?

Sample Prep Random shearing of the DNA Adding adaptors and barcodes Size selection Amplification Sequencing

Sequencing process

Leave only sequences from one direction Sequencing process Leave only sequences from one direction

Sequencing process

Sequencing process

Sequencing process

Applications

DNA sequencing Resequencing – sequencing the genome of an organism with a known genome Exome sequencing / Targeted sequencing – sequencing only selected regions from the genome De-novo sequencing– sequencing the genome of an organism with a unknown genome

RNA-Seq Sequencing of mRNA extracted form the cell to get an estimate of expression levels of genes.

RNA-Seq vs. DNA-sequencing Counting vs. Reading RNA-Seq vs. DNA-sequencing

ChIP-Seq Sequencing the regions in the genome to which a protein binds to.

Basic concepts Insert – the DNA fragment that is used for sequencing. Read – the part of the insert that is sequenced. Single Read (SR) – a sequencing procedure by which the insert is sequenced from one end only. Paired End (PE) – a sequencing procedure by which the insert is sequenced from both ends.

Bioinformatics Analysis Pipelines

Demultiplexing Lane Unknown inserts

Mapping parameters affect the rest of the analysis Demultiplexing Sample Mapping Reference Genome Example of mapping parameters: Number of mismatches per read Scores for mismatch or gaps Mapping parameters affect the rest of the analysis

Removing duplicates and non-unique mappings Demultiplexing Mapping Reference Genome Removing duplicates and non-unique mappings Reference Genome ? 𝒂𝒗𝒆𝒓𝒂𝒈𝒆 𝒄𝒐𝒗𝒆𝒓𝒂𝒈𝒆= 𝑟𝑒𝑎𝑑 𝑙𝑒𝑛𝑔𝑡ℎ ⋅ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑎𝑑𝑠 ⋅ % 𝑢𝑛𝑖𝑞𝑢𝑒𝑙𝑦 𝑚𝑎𝑝𝑝𝑒𝑑 𝑟𝑒𝑎𝑑𝑠 𝑔𝑒𝑛𝑜𝑚𝑒 𝑠𝑖𝑧𝑒

Resequencing/ Exome Pipeline

Removing duplicates and non-unique mappings Demultiplexing Mapping Removing duplicates and non-unique mappings Coverage profile and variant calling Reference Genome …ACTTCGTCGAAAGG… G

Removing duplicates and non-unique mappings Demultiplexing Frequency >= 20% Mapping Reference Genome …ACTTCGTCGAAAGG… Removing duplicates and non-unique mappings Coverage profile and variant calling Coverage >= 5 Variant filtering Reference Genome …ACTTCGTCGAAAGG…

Removing duplicates and non-unique mappings Genes and known variants Demultiplexing Mapping Removing duplicates and non-unique mappings Variant calling Reference Genome …ACTTCGTCGAAATG… …GTCCCGTGATACTCCGT… G A Variant filtering Genes and known variants rs230985 Gene X

Resequencing results

Example for further analysis Recessive disease: Variant not in known databases Homozygous variant shared by all affected individuals Same variant appears in healthy parents at heterozygous state Healthy brothers can be heterozygous to the same variant Demultiplexing Mapping Removing duplicates and non-unique mappings Coverage profile and variant calling Dominant disease: Variant not in known databases Heterozygous variant shared by all affected individuals The variant doesn’t appear in healthy individuals Variant filtering Genes and known variants 1 Table per sample Finding suspicious variants 1 Table per project

Quality control steps in the pipeline Demultiplexing QC Mapping QC Removing duplicates and non-unique mappings QC Coverage profile and variant calling QC Variant filtering QC Genes and known variants Finding suspicious variants

How is de-novo assembly different from resequencing analysis ?

RNA-Seq Pipeline

Removing duplicates and non-unique mappings Gene expression levels FPKM Normalization: 𝑟𝑎𝑤 𝑐𝑜𝑢𝑛𝑡 𝑔𝑒𝑛𝑒 𝑙𝑒𝑛𝑔𝑡ℎ⋅𝑠𝑎𝑚𝑝𝑙 𝑒 ′ 𝑠 𝑚𝑎𝑝𝑝𝑒𝑑 𝑟𝑒𝑎𝑑𝑠 𝑖𝑛 𝑚𝑖𝑙𝑙𝑖𝑜𝑛𝑠 Demultiplexing Mapping Removing duplicates and non-unique mappings Gene expression levels Unannotated genes

Differential expression parameters: Demultiplexing Mapping Differential expression parameters: Threshold - Minimum number of reads for pair testing Normalization Replicates Differential expression parameters affect the results Removing duplicates and non-unique mappings Gene expression levels Differential gene expression

RNA-Seq results

Coverage Coverage – resequencing “Coverage” – RNA-Seq Number of bases that cover each base in the genome in average. 𝒂𝒗𝒆𝒓𝒂𝒈𝒆 𝒄𝒐𝒗𝒆𝒓𝒂𝒈𝒆= 𝑟𝑒𝑎𝑑 𝑙𝑒𝑛𝑔𝑡ℎ ⋅ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑎𝑑𝑠 ⋅ % 𝑢𝑛𝑖𝑞𝑢𝑒𝑙𝑦 𝑚𝑎𝑝𝑝𝑒𝑑 𝑟𝑒𝑎𝑑𝑠 𝑔𝑒𝑛𝑜𝑚𝑒 𝑠𝑖𝑧𝑒 “Coverage” – RNA-Seq Depends on the expression profile of each sample. Highly expressed genes will be detected with less “coverage” than lowly expressed genes.

What should you ask yourself before sequencing when planning the experiment Reference genome: What is my reference genome? Does it have updated annotations? What annotations are known? Are my samples closely related to the reference genome? Do I expect to have contaminations in my sample? Do I have validations from other technologies? (RT-PCR, SNPchip…) Do I have controls and replicates? RNA-Seq: am I interested in alternative splicing? Resequencing: What kind of mutations do I expect to find?