Additional references Linnarsson S. Recent advances in DNA sequencing methods - general principles of sample preparation. Exp Cell Res May 1;316(8): Epub 2010 Mar 6. Review. PubMed PMID: Buehler B, Hogrefe HH, Scott G, Ravi H, Pabón-Peña C, O'Brien S, Formosa R, Happe S. Rapid quantification of DNA libraries for next-generation sequencing. Methods Apr;50(4):S15-8. Review. PubMed PMID: the use of real time PCR
Stages in the library preparation Steps accompanied by numbers are those for which we suggest alternatives to the standard Illumina protocols. Numbers correspond to those given in Supplementary Protocols
Fragmentation Nebulization Uneconomical distribution of fragment size. Approximately half of the DNA vaporises Adaptive Focused acoustics – Covaris Acoustic energy is controllably focused into the aqueous DNA sample by a dish-shaped transducer, resulting in cavitatin events within the sample. 17% of the sample is in the 200bp size range, and little DNA loss
Fragmentation Enzymatic digestion (Linnarsson 2010) Two recent commercial enzymatic fragmentation kits were introduced. Fragmentase (New England Biolabs) - based on V.vulnificus nuclease that generates random nicks, and modified T7 endonuclease that recognises the nicks and cleaves the opposite strands. Nextera (Epicentre) - based on random transposon insertion. Also introduces adapter sequences simultaneously with fragmentation.
A-tailing, ligation and size selection Artefacts from standard library prep; 1. Bias in base composition 2. High frequency of chimeric sequences 3. Imperfect insert size distribution Overcome by; 1. Pair-end oligos 2. Gel extraction-melt slice at room temp-reduces GC bias 3. Improved efficiency of the end repair and A-tailing 4. Double size selection 5. Paired end size selection-only excise a 2mm size gel slice
Figure 3 A-tailing, ligation and size selection GC plots before (a) and after (b) optimisation of gel extraction. The figures show the total area in which reads with a particular GC content are distributed, with the mean and standard deviation. The greater width of shaded area in plot a) indicates a wider dispersion of coverage for all values of GC content for which sequences were obtained. Agilent traces Bioanalyzer 2100 traces for two suboptimal libraries c) 60bp insert library, with optimised PCR, d) the same 60bp library with excess DNA in PCR e) 200bp insert library, showing shoulder of small fragments. Insert size distribution from sequenced human DNA using f) the standard and g) modified paired end library prep protocols
PCR Template quality -use optimized quantities of DNA template. Use of high fidelity polymerases in an optimised reaction. Use of solid phase reversible immobilization SPRI technology (SPRI) removes a higher proportion of primers and adapter dimers than spin columns. Reduce the number of PCR cycles: 3ng DNA and 14 cycles of PCR amplification for single end libraries, 25ng DNA and 12 cycles for high complexity libraries, and 10ng DNA and 18 cycles for lower complexity samples. These quantities give the optimal compromise between clean libraries and a low frequency of duplicate sequences. Possible to eliminate the PCR step by ligating on appropriate adaptors after A- tailing. Direct sequencing of short amplicons.
Figure 4 PCR a) A ~200bp fragment library was prepared, and 10ng was amplified for 18 cycles using standard Illumina conditions, and with more optimal PCR conditions. b) After PCR we divided the library into two: half was purified following the standard Illumina protocol, through a Qiaquick PCR cleanup column, whereas the other was purified using SPRI technology. Each was then run on an agarose gel alongside a 100bp ladder to view the DNA species that remained.
PCR PCR duplication example;
Quantification Optimal concentration range of DNA that will yield clusters in the optimal density range. Spectrophotometry is not accurate. From [bp]To [bp] Corr. Area % of Total Average Size [bp] Size distribution in CV [%]Conc. [pg/µl] Molarity [pmol/l] 2001, Quantitative PCR. Quantify unknown libraries against standard libraries that have been sequenced previously for which cluster number is known. Electrophoresis with Agilent bioanalyser -Gives a check of size distribution. -Can be inaccurate for a small proportion of libraries, may be due to single stranded DNA not easily quantified when mixed with double stranded -Can use the bioanalyser to check size distribution and Fluorometery to determine the concentration more accurately (e.g. Qubit dsDNA BR Assay)
Quantification a)Cluster throughput as a function of total clusters for 200 and 500bp inserts. The 500bp inserts underwent fewer cycles of cluster amplification (28, compared to 35 for the 200bp libraries), resulting in smaller clusters, and so a cluster density of 40-44k / tile (GA1) will produce the maximum yield from either insert size. b)Standardisation of cluster density with qPCR quantification. Runs were grouped into 25-run bins and a boxplot plotted. After some initial problems with degradation of standards, cluster number has levelled out at ~35-40k / tile.
Denaturation For low concentrations of Double stranded DNA denaturation by heating can damage DNA and introduce G+C bias. Use Modified hybrization buffers; prefer use of 0.1NaOH to heating. Subnanomolar libraries require an alternative buffer. 1. Addition of Tris to illumina buffer prevents rise in pH. 2. Diluting supplied 2M NaOH and using a greater volume reduces fluctuation caused by pipetting error.
Denaturation a)pH titration of hybridisation buffers. The concentration of NaOH in DNA templates is 0.1M NaOH. Adding more than 8μl of this denatured template to the 1ml of Hybridisation Buffer prior to loading DNA onto the flowcell, increases the pH to above 10. This prevents efficient hybridisation, and thus the cluster density falls. The addition of Tris-HCl pH7.3 to the supplied bottles of Hybridisation Buffer dramatically increases buffering capacity, making template hybridisation more robust. b) the addition of 5mM Tris-HCl pH 7.3 to Illumina Hybridisation Buffer allows a greater volume of denatured template to be added before high pH prevents effective annealing of templates to the oligos on the flowcell surface. This increases the robustness of cluster generation, by counteracting pipetting errors in the denaturation step.
Amplification Quality control After cluster amplification double stranded DNA on the flow cell can be stained using an intercalating dye to be detected by a fluorescence microscope. Use on flow cells before linearization and blocking to confirm that the cluster density is appropriate.
Additions to the method Careful DNA quantification before fragmentation and checking for degraded DNA. Use of low absorbing plastic ware (Linnarsson 2010), e.g Beckman Coulter non stick or equivalent. Also advise to add some detergent (e.g. 0.02% Tween-20) to reduce absorption to tube walls. The implementation of SPRI XP beads for all purification steps. The use of the bioanalyser to check concentration and size distribution after fragmentation. Cheaper alternatives to illumina kits, e.g. NEB kits, making own adapters and primers.
Conclusion The Genome Analyzer is a powerful sequencing technology, Here the authors describe a number of modifications that allow for more efficient library preparation, and which enable a stable workflow in a production environment. At the Sanger Institute, they have several teams for every stage of sequencing. All steps in the process are recorded using custom-written lab- tracking and run-tracking database software. Combined with improvements to the image analysis software and a faster run time, they predicted that by Christmas 2008, their output will reach 6-10 terabases of high-quality sequence per year - equivalent to 180 human genomes at 15-fold coverage, or approximately 200,000 bases per second. The improved workflow and high yield should maintain the Genome Analyzer as their next-generation sequencing platform of choice for the immediate future. But how long this remains true depends upon the performance of existing rival technologies, and those that are on the horizon. For example Oxford Nanopore Technologies, and Pacific Biosciences Single Molecule Real Time technology which promise to bring us closer to the eagerly anticipated $1,000 genome.