Matthew Hudson Dept of Crop Sciences University of Illinois DNA and genome sequencing.

Matthew Hudson Dept of Crop Sciences University of Illinois DNA and genome sequencing

Genome projects 2,424 ongoing genome projects 696 for eukaryotes 520 completed genomes 47 from eukaryotes Almost every crop now has a genome project

DNA Sequencing Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s. Often called “Sanger sequencing”. Nobel prize number 2 for Fred Sanger in 1980, shared with Walter Gilbert from Harvard (inventor of the now little-used Maxam-Gilbert sequencing method).

Sanger’s Dideoxy DNA sequencing method -How it works: 1.DNA template is denatured to single strands. 2.DNA primer (with 3’ end near sequence of interest) is annealed to the template DNA and extended with DNA polymerase. 3.Four reactions are set up, each containing: 1.DNA template – eg a plasmid 2.Primer 3.DNA polymerase 4.dNTPS (dATP, dTTP, dCTP, and dGTP) 4.Next, a different radio-labeled dideoxynucleotide (ddATP, ddTTP, ddCTP, or ddGTP) is added to each of the four reaction tubes at 1/100th the concentration of normal dNTPs……

ddNTPs are terminators: they possess a 3’-H instead of 3’-OH, compete in the reaction with normal dNTPS, and produce no phosphodiester bond. Whenever the radio-labeled ddNTPs are incorporated in the chain, DNA synthesis terminates. Terminators stop further elongation of a DNA deoxyribose-phosphate backbone

“hasta la vista”

Manual Dideoxy DNA sequencing-How it works (cont.): 5. Each of the four reaction mixtures produces a population of DNA molecules with DNA chains terminating at each “terminator”base.. 6.Extension products in each of the four reaction mixutes also end with a different radio-labeled ddNTP (depending on the base). 7.Next, each reaction mixture is electrophoresed in a separate lane (4 lanes) at high voltage on a polyacrylamide gel. 8.Pattern of bands in each of the four lanes is visualized on X-ray film. 9.Location of “bands” in each of the four lanes indicate the size of the fragment terminating with a respective radio-labeled ddNTP. 10.DNA sequence is deduced from the pattern of bands in the 4 lanes.

Vigilant et al. 1989 PNAS 86:9350-9354

Short products Long products Radio-labeled ddNTPs (4 rxns) Sequence (5’ to 3’) G A T A T A C T G T

Manual vs automatic sequencing Manual sequencing has basically died out. It needs four lanes, radioactive gels, and a technician in one day from one gel can get four sets of four lanes, with maybe 300 base pairs of data from each template. Everyone now uses “automatic sequencing” – the downside is no one lab can afford the machine, so it is done in a central facility (eg. Keck center). Most automated DNA sequencers can load robotically and operate around the clock for weeks with minimal labor.

Dye deoxy terminators One tube. One gel lane or capilliary

Robotic 96 capillary machine: ABI 3730 xl

DNA sequence output from ABI 377 (a gel-based sequencer) 1.Trace files (dye signals) are analyzed and bases called to create chromatograms. 2.Chromatograms from opposite strands are reconciled with software to create double- stranded sequence data.

Genome sequencing How do you use these chunks of sequence to make a “whole genome” sequence?

The “traditional” genome A physical map is made A BAC “tiling path” is created BACs are farmed out to hundreds of collaborating laboratories Each lab does a few BACs Arabidopsis, E. coli etc were done this way, but since Craig Venter got interested, everything is “going shotgun”

Shotgun Genome Sequencing Slow and expensive.. but accurate and complete and assembly is straightforward Much faster and cheaper very hard to get complete genome assembly of large (>10Mb) genomes

Finished genome Shotgun genome Maize now Whole chromosome sequences 100kb average chunks Some BAC contigs Done clone by clone Need physical map MAGIs e.g. human, Arabidopsis e.g. poplar

Shotgun sequencing ~700 bases per read One or two reads per clone Shotgun sequence of mouse, ~2.6GB, 7x coverage That’s 26,000,000 sequencing reactions, 13,000,000 minipreps… Extract DNA Shear Ligate into library Pick clones Grow clones Extract vector DNA Sequence using ddNTPs Read fragments with gel or capillary

The genome factory There are a few centers around the world that have a “factory” big enough to do shotgun sequence of a large eukaryotic genome: Broad Institute, MIT Baylor College of Medicine, Houston Washington University, St Louis DoE Joint Genomics Institute, Walnut Creek, CA Sanger Centre, Cambridge Beijing Genomics Institute, Chinese Academy of Sciences

Pictures from JGI

Qpix robot – picks colonies

Biomek – PCR / cleanup robot

PCR – 384 x 4 x 48 x 3

About 150 sequencers, at $200,000 each…

Sequence analysis

Bioinformatics Armies of programmers and large supercomputers are necessary to assemble and annotate the sequence

Assembly and annotation Assembly – we have to compare those 30,000,000 seqences with each other and work out how they fit together. Nasty mathematical problem… Annotation – when we have the sequence, we have to work out where the genes are and what they do. Mostly a computational problem – very large databases.

Whole-genome resequencing Wouldn’t it be great to have the whole genome of each line you work with? Then the whole genome would be haplotyped. Whole plant or metazoan genomes still cost $40-50m NIH have target for human genome to cost $100,000 in 2010 $1,000 in 2020 This is likely to be achieved ahead of schedule Human resequencing technology is likely to have a big impact on plant biology also.

Cost of sequencing is falling exponentially

Robotic 96 capillary machine: ABI 3730 xl

DNA sequence output from ABI 377 (a gel-based sequencer) 1.Trace files (~350KB / run) 2. Analyzed and bases called to create sequence and quality files (~2kb / run) 3.One run is about 700 base pairs (bp) 4.Typical genome project – soybean – 6M runs so far

Limits to how cheap sequencing can get using the Sanger method ~700 bases per read One or two reads per clone Cost: $2 per read high throughput Plus costs of clone generation ~$1 Total current lowest cost, ~$5/kb, 0.5c /Q20 base Extract DNA Shear Ligate into library Pick clones Grow clones Extract vector DNA Sequence using ddNTPs Read fragments with gel or capillary

Next-generation sequencing A number of proprietary technologies, most based on the manipulation of microbeads and/or nanobeads where sequencing is performed without gels or capillaries First on the market was a company called 454 (now Roche) now on the second generation of instruments. 454 have a major competitor in Solexa (now Illumina) Recently AB announced its own next-generation platform, SOLiD (AB acquired Agencourt)

Next-generation sequencing approach Extract and Shear DNA Fluorescent or luminescent readout in situ Isolate clonal molecules on beads “polony” amplification Immobilize on Solid support No E. coli No plasmids No freezers No hydras No gels No capillaries

454 Sequencing technology

Picowell (50nm) technology

Sequencing by synthesis using chemiluminescence GS20: 20Mb of sequence for ~$5,000 in running costs Quality is similar to early ESTs (97-98% at best) We have no clone information, so no read pairings Homopolymer…

“flowgram file” – binary SFF format About 250 MB per run Similar to trace file – contains luminosity readings for each of 1.6M wells from a photomultiplier, for each of four bases, for each of 42 flow cycles Processed using on-board FPGA with instrument Others have tried to improve software, but 454’s is still best all round Data output

454 “FLX” Claimed: 100 MB per run, 200+ base reads Cost: ~$12,000 / run in reagents & basic maintenance Ours delivered Tues June 12 – no data yet

1Gb of sequence for < $3,000 in running costs

Data output No access to data yet, reportedly: A series of huge image files Each is color Analysis uses image analysis techniques Raw data output is ~ 500GB per run Current customers say compute infrastructure cannot cope 100s of CPU hours to process one run Raw data currently must be discarded

Polony sequencing / ABI SOLiD George Church’s group invented “polony” method Since developed by Agencourt Now bought by ABI Similar to Solexa – no wells, small beads, 4-color fluorescent detection, about 1G per run, about $3,000 per run Uses ligation of nucleotide-specific probes rather than reversible terminators

Summary of NGS technologies

Matthew Hudson Dept of Crop Sciences University of Illinois DNA and genome sequencing.

Similar presentations

Presentation on theme: "Matthew Hudson Dept of Crop Sciences University of Illinois DNA and genome sequencing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Matthew Hudson Dept of Crop Sciences University of Illinois DNA and genome sequencing.

Similar presentations

Presentation on theme: "Matthew Hudson Dept of Crop Sciences University of Illinois DNA and genome sequencing."— Presentation transcript:

Similar presentations

About project

Feedback