MCB3895-004 Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

Next-Generation Sequencing: Methodology and Application
V Improvements to 3kb Long Insert Size Paired-End Library Preparation Naomi Park, Lesley Shirley, Michael Quail, Harold Swerdlow Wellcome Trust Sanger.
Recombinant DNA technology
Next Generation Sequencing, Assembly, and Alignment Methods
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Assembly.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
MCB 130L Lecture 1 1. How to get the most from your time in lab 2. Recombinant DNA 3. Tips on giving a Powerpoint talk.
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
CS 6293 Advanced Topics: Current Bioinformatics
Final Presentation Sample Preparation Nextera TruSeq RiboZero Strand Specific Clontech smRNA 16s.
11 © 2009 PerkinElmer © 2010 PerkinElmer November 20, 2012 DNA Services Overview.
Reading the Blueprint of Life
De-novo Assembly Day 4.
High Throughput Sequencing Methods and Concepts
CS 394C March 19, 2012 Tandy Warnow.
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
DNA Methylation mapping
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
The iPlant Collaborative
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Sequencing Kristian Stevens Mark Crepeau Charis Cardeno Charles H. Langley University of California, Davis Evolution.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
The Polymerase Chain Reaction (DNA Amplification)
Chemical Synthesis, Amplification, and Sequencing of DNA (Part II)
Comparison between old generation and new generation of sequencing machines.
DNA Sequencing.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Semiconservative DNA replication Each strand of DNA acts as a template for synthesis of a new strand Daughter DNA contains one parental and one newly synthesized.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Library QA & QC Day 1, Video 3
Introduction to Illumina Sequencing
16S rRNA Experimental Design
Next-generation sequencing technology
Sequencing Introduction
Sequencing technologies
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Next-generation sequencing technology
NGS technologies.
Removing Erroneous Connections
DNA Clean-Up Using MagNA Beads
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
CS 598AGB Genome Assembly Tandy Warnow.
mRNA Sequencing Sample Preparation
Recombinant DNA Unit 12 Lesson 2.
Hybrid Capture and Next-Generation Sequencing Identify Viral Integration Sites from Formalin-Fixed, Paraffin-Embedded Tissue  Eric J. Duncavage, Vincent.
ChIP DNA Sample Preparation
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
Digital Gene Expression – Tag Profiling Sample Preparation
CSCI 1810 Computational Molecular Biology 2018
(Top) Construction of synthetic long read clouds with 10× Genomics technology. (Top) Construction of synthetic long read clouds with 10× Genomics technology.
Genomic DNA Sample Preparation
Presentation transcript:

MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly

Illumina sequencing ube.com/watch? v=womKfikWlxMhttps:// ube.com/watch? v=womKfikWlxM

Illumina sequencing - summary 1.Template consists of DNA fragments amplified by bridge clustering 2."Sequencing by synthesis" used to generate DNA sequences 3.DNA sequence read as unique fluorescent signatures following base incorporation

Illumina sequencing - summary 4.Adapters at each end of the template molecule bind the flowcell adaptors and facilitate bridge amplification 5."Dual indexing" allows multiple samples to be sequenced on the same flowcell, each having a unique set of indices 6.Paired-end sequencing extends the regular sequencing protocol to read each template molecule in both directions

Paired-end sequencing Objective: allows repetitive regions to be sequenced more precisely

Paired-end sequencing Be careful to distinguish terms! Do not confuse adapters with the read or template fragment

Paired-end sequencing "Insert" is even more confusing Refers to entire fragment, including both the reads and the unsequenced "inner mate" region between them Term stems from long-dead plasmid sequencing approaches

Paired-end sequencing It is possible to have paired end reads that overlap each other Can assemble to create long, highly accurate contiguous reads

Paired-end sequencing If the template fragment is too short, it is possible to read past the end of the fragment Results in adapter region being included in read Needs to be removed computationally.

Library preparation How exactly are template fragments generated? Lots of methods, I only present two: TruSeq and Nextera Most common Illumina methods (specific kits available from Illumina) Think about: where might biases arise?

TruSeq library preparation Step #1: Fragment DNA Typically via shearing Produces uniformly sized fragments

TruSeq library preparation Step #2: Create blunt ends using a polymerase to remove 3' overhangs and fill in 5' overhangs Use bead purification to remove smallest fragments, blunt ending reagents

TruSeq library preparation Step #3: Adenylate 3' ends to prevent self- ligation while adding adapters

TruSeq library preparation Step #4: Ligate adapters containing sequencing primer, indices, flowcell capture site

Nextera library preparation Nextera uses engineered transposases to fragment genomic DNA and add sequencing adaptors at the same time Low DNA input requirement "Transposome" = transposon + DNA for attachment

Nextera library preparation Step #1: Use "tagmentation" to simultaineously fragment template DNA and add sequencing adapters 300bp insert size reflects minimum needed by transposases to cut and add adapters

Nextera library preparation Step 2: Purify fragments from transposome (part of Nextera kit) Result: fragment contains both 5' and 3' sequencing adapters

Nextera library preparation Step #3: Use PCR to add indices and flowcell capture sites to the fragment Non-template fragments excluded during bead clean-up following this step

Nextera library preparation Final result: Template fragment Sequencing adapters Dual indices Flowcell capture sites (same structure as TruSeq)

Library prep is not error-free

Library prep is not error-free

Library prep is not error-free Regions with lower coverage are GC-rich No method is perfect Also note: Nextera uses low cycle PCR, has potential for bias

Mate pairs Paired end sequencing actually binds each fragment to the flowcell and sequences from each end Size limitations: large fragments are too floppy to sequence well Mate pairs: maintain same philosophy of adding inserts of known sizes, but facilitating larger insert sizes

Nextera mate pair library preparation Step #1: Use Nextera tagmentation to fragment template and add adapters Adaptors are biotinylated for later steps

Nextera mate pair library preparation Step #2: Fragment is circularized using a "biotin junction adapter"

Nextera mate pair library preparation Step #3: Circular molecules fragmented, biotin tags used to enrich fragments having junction Recall: junction contains original fragment ends

Nextera mate pair library preparation Step #4: Use TruSeq protocol to end repair, A- tail, and ligate flowcell capture sequences and barcodes Final product has all the normal parts of an Illumina template library but also junction region mid-fragment

Questions?

Digging deeper into the guts de novo genome assembly Important to know to be able to tune assembly software appropriately! Two paradigms: 1.Overlap/layout/consensus 2.De Bruijn graphs Both find overlaps between sequences, create a network representation, and find the best path through that network to represent the final assembly

Overlap/layout/consensus genome assembly Step #1: Compare all reads to each other to find those that overlap Let's do it together! Reads (5'->3'): TGGCA CAATT ATTTGAC GCATTGCAA TGCAAT

Overlap/layout/consensus genome assembly Step #2: Create overlap graph arranging reads according to their overlaps Step #3: Find unique path through the graph Step #4: Assemble overlapping reads by aligning the reads and deriving consensus

Overlap/layout/consensus genome assembly Requires all-vs-all comparison of reads becomes computationally intensive as the number of reads increases Developed and applied for Sanger and 454 sequencing Not dead yet! Has reemerged for PacBio and other long-read techniques

But consider errors Our network was for perfectly accurate reads What happens when you have both the correct TGGCA read and a TGCCA read containing a substitution sequencing error?

De Bruijn graph assembly Instead of comparing all reads with each other, split reads up into kmers i.e., subsets of each read of a given length Much more computationally efficient than all- vs-all comparison in overlap/layout/consensus

De Bruijn graph assembly Step #1: Tally kmers Let's find all kmers where k=4 for our set of reads from before TGGCA CAATT ATTTGAC GCATTGCAA TGCAAT

De Bruijn graph assembly Step #2: Create graph of kmer overlap, where kmers are nodes and overlap between them are edges More complex than overlap graph Step #3: Find unique path through the graph Can leverage kmers adjacent to each other in reads to reduce complexity Step #4: Synthesize path into a consensus sequence

De Bruijn graph assembly Doesn’t need all-vs-all comparison so is much faster Can handle large numbers of reads, e.g., as generated by Illumina technology Graph is much more complicated, RAM intensive More sensitive to errors

De Bruijn graph assembly Consider errors: make the graph even more complicated with bubbles, dead ends Consider repeats: parts of the graph with no unique path through it Graph broken on each side, forming contigs

Next class Quality control of Illumina data Adapter trimming Error correction Next week: de novo genome assembly