Presentation is loading. Please wait.

Presentation is loading. Please wait.

“First generation" sequencing technologies and genome assembly

Similar presentations


Presentation on theme: "“First generation" sequencing technologies and genome assembly"— Presentation transcript:

1 “First generation" sequencing technologies and genome assembly
Roger Bumgarner Associate Professor, Microbiology, UW

2 Overview How to sequence any DNA How to sequence a lot of DNA
What have we learned from 20 years of the genome project? What’s next?

3 Intended outcomes An understanding of:
The process of DNA sequencing the types/rates of errors in DNA sequence data A historical perspective of genome sequencing An understanding of the outcomes of the genome project and the post-genome challenges A introduction to some of the related ethical issues

4 Automated DNA Sequencing

5 Goal - To Read the Sequence of the Basepairs in a region of DNA

6 DNA Structure

7 DNA Sequencing: Process Overview
Generation of a nested set of fragments Separation of the fragments Detection Analysis or base calling

8 Maxim-Gilbert Sequencing

9 DNA Replication helicase 5’ 3’ 5’ single stranded DNA binding proteins
primosome primase 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ replicating DNA polymerase III active sites ligase RNA primer DNA polymerase I

10 The 3’ hydroxyl group is the point of attachment of the next base
What happens if the 3’ OH is not there? X

11 Sanger Sequencing

12 An “AutoRad” of a Sequencing Gel
ACGT

13 With 4-colors, all reaction can be run in one lane
C A G T C A G T C G A T C G A T C G A T Label each with a different color Mix all reactions prior to loading

14 The Principle of 4-color Fluorescent DNA Sequencing

15 The Perkin Elmer/ABI 373 Fluorescence Based DNA Sequencer

16 A Sequencing Gel Image

17 Automated DNA Sequencing
+ - ACGTT…. A AC ACG ACGT ACGTT The technology for four-color automated DNA sequencing was developed in Dr. Leroy Hood’s lab at the California Institute of Technology in the early to mid 1980’s. The DNA which is to be sequenced is copied in such a way to generate fragments of all lengths starting at one end. Furthermore, the fragments are labeled fluorescent “tag” that varies in color depending on which base the fragments end - e.g. green for fragments that end on A, blue for fragments that end on C. The fragments are generated as a mixture. They are then separated by size in a process called gel electrophoresis. The mixture is placed on top of a gel which will act like a sieve. A voltage is applied to drive the mixture through the gel (DNA is negatively charged and will travel towards the “plus” end of the gel). Small fragments will travel through the gel more quickly than large fragments. A scanner near the bottom of the gel reads the fluorescence and stores the information in a computer. Software then processes this information to yield the sequence of bases in the original DNA. Automated sequencing helped build the company Applied BioSystems Inc. (ABI) into a large a powerful supplier of technology to the biotech industry. ABI was acquired by Perkin Elmer in 199_.

18 Gel Analysis Process Lane Finding - Look for local correlations in the vertical dimension Lane extraction - sum up pixel across the lanes, straighten if necessary Transform from wavelength domain to concentration domain Apply mobility and spacing correction Filter noise from data - low and high pass filters Find and identify peaks - numerical derivative Output called data

19 Raw Sequencing Data

20 Idealized Dye Spectra

21 Actual Dye Spectra

22 “Chromaticity” Transformation
Measured - Signal in four filters (channels) Want - Signal in four concentrations [dye]=[fragments] basepairs 4 equations, 4 unknowns Matrix formulation I1 = a1[A] + c1[C] + g1[G] + t1[T] I2 = a2[A] + c2[C] + g2[G] + t2[T] I3 = a3[A] + c3[C] + g3[G] + t3[T] I4 = a4[A] + c4[C] + g4[G] + t4[T] I = {x} Conc Conc = {x}-1 I

23 Gel Analysis Process Lane Finding - Look for local correlations in the vertical dimension Lane extraction - sum up pixel across the lanes, straighten if necessary Transform from wavelength domain to concentration domain Apply mobility and spacing correction Filter noise from data - low and high pass filters Find and identify peaks - numerical derivative Output called data

24 Processed Electropherogram

25 Higher Voltages Produce Faster rates of Electrophoresis
Speed is proportional to Voltage (V) Current (I) is depends on the resistance of the gel I=V/R Energy in Watts is W = V*I Thinner gels give higher R. Hence, thin or otherwise small gels must be used for higher voltages.

26

27 8 Capillary Array

28 Beckman CEQ 8000 DNA Sequencer
8 Capillary Array Linear polyacrylamide separation matrix 4 color terminator sequencing chemistry Windows NT based operating system

29 Beckman CEQ 8000

30 Different Labeling Chemistries can be used
Dye Primer - dye is attached to the 5’ end of the sequencing primer. Dye Terminator - dye is attached to the ddNTP - allows all 4 reactions to be run in same tube. Internal Labeling - dye is attached to a dNTP - signal/molecule increases with length

31 Large Scale Sequencing

32 The (Human) Genome Project.
The ultimate goal of the Human Genome Project is to decode, letter by letter, the exact sequence of all 3 billion nucleotide bases that make up the human genome. Just a single misplaced letter is sufficient to cause disease. GCTTACTGAGTACATGTGCTAATCGT 3,400,000,000 letters total

33 The (Human) Genome Project.
Begun in 1990 with a 15 year budget of $3.0B overall. Goals: To obtain the sequences of human and model Organisms - E-Coli, Drosophila (fruit fly), C-Elegans (a worm), Yeast, Mouse Develop the necessary technologies to obtain the above.

34 Sizes and status of a sampling of Genomes

35 Overview of the goal

36 How do we begin to analyze a genome?
We want DNA sequence for the entire genome (3.5 Bbp for human, 4Mbp for a bacterium). Sequencing allows one to read about 750 base pairs/sample. We need a method to sequence bigger pieces.

37 Primer Walking Vector Clone to sequence Primer Sequence New Primer
Repeat

38 “Shotgun” sequencing ….GTCTACCTGTACTGATCTAGC...
Sub- clone Copy Clone to sequence Sequence and “assemble” ….GTCTACCTGTACTGATCTAGC... …. CCTGTACTGATCTAGCATTA... …. GTACTGATCTAGCATTACG...

39 Shotgun vs. walking

40 Methods for very large scale sequencing
A hierarchical approach Map on a large scale (physical mapping), sequence specific clones whose position in the genome is known Shot gun sequencing “Tear up” the genome and sequence random fragments until it is done Sequence tagged connectors (STC) Sequence the ends of many clones and use this info to pick overlapping clones

41 Making a genomic “library”
Isolate DNA Fragment DNA Cells Clone { “Library”

42 Library Types Chromosome specific libraries Chromosomes can be sorted from one another based on size and GC content. Genomic Libraries - made from the entire genome. Large insert/small insert : combination of vector choice (YAC, BAC, plasmid, m13), fragmentation method (enzymatic, shearing, sonication), and size selection (by gel or other method).

43 Another view of a library
Multiple copies of the genome (streched out) Randomly fragment and clone Can we order these fragments relative to one another?

44 Restriction Enzymes - 1970 Copyright 1998 Access Excellence

45 Physical Mapping : Digest and look for common features in clones
B A B

46 Repeat many times to construct a physical map
Pick a “minimal tiling path” Repeat many times to construct a physical map Sequence these mapped clones (typically by the shotgun method).

47 Path that was used for genome sequencing
map (MBP) YACs BACs or Cosmids map (200kBP) m13, plasmid sequence (kbp)

48 “Shotgun” the genome Genome to sequence Sub- clone Sequence and
“assemble” ….GTCTACCTGTACTGATCTAGC... …. CCTGTACTGATCTAGCATTA... …. GTACTGATCTAGCATTACG...

49 Sequence tagged connectors (STC)
Genome to sequence Sub- clone Sequence the ends and store in a dB Sequence a clone, look for overlaps in the dB

50 Which method? Whole genome shot-gun Physical mapping STC
Very successful for bacteria Celera’s approach to the human genome, but what about repeats? Physical mapping Traditional method STC Hybrid method, not a difficult as physical mapping, can resolve some issue with repeats.

51 Issues with genome sequencing
Whose genome? Quality Contiguity Publication Patenting

52 What are the fruits of genome sequencing?
A nearly complete list of genes A reference against which to compare other sequences Identify polymorphisms in the population Comparative genomics Identify highly conserved regions Evolutionary inferences A tremendously enabling reagent resource PCR primers Microarrays for expression SNP’s for genetic mapping


Download ppt "“First generation" sequencing technologies and genome assembly"

Similar presentations


Ads by Google