Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006.

Similar presentations


Presentation on theme: "Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006."— Presentation transcript:

1 Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006 review paper Assigned listening: Ecic Lander genomics lecture

2 DNA Sequence Project Size/Type 500 bases 2500 bases 10 kbp 150 kbp 3 Mbp –simple –repeats 3 Gbp 31 Gbp 1 EST,STS whole cDNA/EST Gene, virus BAC, big virus Bacterial genome, YAC-size Human, mouse Salamander

3 Metazoan genome sizes Nematode (Caenorhabditis elegans): 100 Mb Thale cress (Arabidopsis thaliana): 160 Mb Fruit fly (Drosophila melanogaster): 180 Mb Puffer fish (Takifugu rubripes): 400 Mb Rice (Oryza sativa): 490 Mb Human (Homo sapiens): 3.5 Gb Leopard frog (Rana pipiens): 6.5 Gb Onion (Allium cepa):16.4 Gb Mountain grasshopper(Podisma pedestris):16.5 Gb Tiger salamander (Ambystoma tigrinum):31 Gb Easter lily (Lilium longiflorum): 34 Gb Marbled lungfish (Protopterus aethiopicus):130 Gb

4 DNA Sequencing Methods Chain termination/Dideoxy/Sanger ABI –Fluorescence paradigm, ABI –Main method Next generation sequencing –Polymerase addition sequencing –454 Sequencing, Illumina Affymetrix –Chips: Affymetrix

5 Dideoxy / Chain Terminator / Sanger Template Primer Extension Chemistry –polymerase –termination –labeling Separation Detection

6 Chain Terminator Basics Target Template-Primer Extend ddA ddG ddC ddT Labeled Terminators ddA AddC ACddG ACGddT TGCA dN : ddN 100 : 1

7 Electrophoresis Sequencing Reaction products Polyacrylamide Gel Electrophoresis (PAGE)‏

8 DNA sequencing trace file

9 Separation Gel Electrophoresis Capillary Electrophoresis –suited to automation rapid (2 hrs vs 12 hrs)‏ re-usable simple temperature control 96 well format

10 Paradigm Instrument Applied Biosystems http://www.appliedbiosystems.com/ –ABI3730XL (2002, 96 samples, 1000 base reads, ~$350,000, higher sensitivity, lower reagent cost, ~$1/reaction)‏ –700 Kbp / 24 hours. 384 capillary sequencers –5700 sequences / 24 hr day –2.8 Mbp / 24 hours.

11 384-well capillary sequencing Results are shown as an electropherogram showing a peak for each base. From the peak heights and widths, a Phred score is assigned to each individual base. A high Phred score indicates a high certainty as to the identity of that particular base.

12 Sample Output 1 lane

13 1 trace=1000 bases or less –ABI: 1000 bp reads –Illumina: 50-100 bp reads –454 Sequencing: 300-400 bp reads How do we cover a genome? –DIVIDE AND CONQUER: assemble these short sequence fragments.

14 Assembly/Trace Editing Consed –UNIX EBI’s Phusion EditView (ABI PRISM)‏ –Mac Chromas (free/pay versions)‏ –Windows

15 Sequencing Strategies Ordered –Divide and Conquer Random Sequence –Brute Force The random approach now predominates for big projects

16 Random Method (details for Sanger seq) Shear DNA (nebulize)‏ –finish ends, ligate into vector Produce template Sequence to 8X – 10X coverage –Sequence both ends of templates. –Read length (1,000bp typical)‏ –Accuracy (99% good)‏

17 Assembly Problem CONTIG

18 Contigs, Islands contigs Island

19 Assembling random sequences No coverage Only 1 strand DISAGREEMENT T T C

20 Assembly programs Celera Assembler (Eugene Myers et al.) Arachne (Serafim Batzoglou et al.) PCAP (Xiaoqiu Huang, Iowa State University) Phusion (EBI)

21 Continuing rapid improvement in sequencing technology

22 1990’s: Human genome 3Gbps, $300 million (just sequencing)‏ Current: Mammalian genome (3 Gbps): $1 million Goal: $100,000 genome, 10X cheaper (and faster)‏ likely 2012! New goal! $1,000 genome. UK’s sequencing center has one: http://www.uky.edu/Centers/AGTC/

23 454 Sequencing’s Genome Sequencer FLX Pyrosequencing (sequencing by detection of nucleotides added during DNA synthesis. 350-400 million bases per run (10 hrs.). 400 bp sequence reads. 1,000,000 reads per run. $6,600 per run, 60kb/$1, or $0.00165/bp.


Download ppt "Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006."

Similar presentations


Ads by Google