DNA Sequencing.

Slides:



Advertisements
Similar presentations
Connection Between Alignment and HMMs. A state model for alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII.
Advertisements

Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
DNA Sequencing. CS273a Lecture 3, Spring 07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
DNA Sequencing and Assembly
DNA Sequencing.
DNA Sequencing. CS262 Lecture 9, Win07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
DNA Sequencing.
DNA Sequencing. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT.
Conditional Random Fields 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
DNA Sequencing. DNA sequencing … ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC.
DNA Sequencing. CS262 Lecture 9, Win06, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
CS273a Lecture 1, Autumn 10, Batzoglou DNA Sequencing.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
DNA Sequencing. CS273a Lecture 3, Autumn 08, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
CS273a Lecture 4, Autumn 08, Batzoglou DNA Sequencing.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
Genetic technology Unit 4 Chapter 13.
DNA basics DNA is a molecule located in the nucleus of a cell Every cell in an organism contains the same DNA Characteristics of DNA varies between individuals.
DNA Technology- Cloning, Libraries, and PCR 17 November, 2003 Text Chapter 20.
Objective 2: TSWBAT describe the basic process of genetic engineering and the applications of it.
CHAPTER 20 BIOTECHNOLOGY: PART I. BIOTECHNOLOGY Biotechnology – the manipulation of organisms or their components to make useful products Biotechnology.
The Clone Age Human Genome Project Recombinant DNA Gel Electrophoresis DNA fingerprints
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Applications of DNA technology
Technological Solutions. In 1977 Sanger et al. were able to work out the complete nucleotide sequence in a virus – (Phage 0X174) This breakthrough allowed.
Manipulating DNA.
Biotechnology Methods Producing Recombinant DNAProducing Recombinant DNA Locating Specific GenesLocating Specific Genes Studying DNA SequencesStudying.
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
CS273a Lecture 4, Autumn 08, Batzoglou CS273a 2011 DNA Sequencing.
Genome sequencing Haixu Tang School of Informatics.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
DNA Technology Chapter 12. Transgenic Organisms Contain recombinant DNA – Nucleotide sequences from 2+ different sources Cells express original AND newly.
Review from last week. The Making of a Plasmid Plasmid: - a small circular piece of extra-chromosomal bacterial DNA, able to replicate - bacteria exchange.
Chapter 5: Exploring Genes and Genomes Copyright © 2007 by W. H. Freeman and Company Berg Tymoczko Stryer Biochemistry Sixth Edition.
DNA TECHNOLOGY AND GENOMICS CHAPTER 20 P
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
DNA Technology Chapter 11. Genetic Technology- Terms to Know Genetic engineering- Genetic engineering- Recombinant DNA- DNA made from 2 or more organisms.
CS273a Lecture 4, Autumn 08, Batzoglou CS273a 2013 DNA Structure.
Human Genome.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
Recombinant DNA Technology. DNA replication refers to the scientific process in which a specific sequence of DNA is replicated in vitro, to produce multiple.
MOLECULAR GENETICS TECHNIQUES. Molecular Genetics Technologies i.Polymerase chain reaction ii.DNA/Genomic sequencing iii.Gel electrophoresis iv.Restriction.
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
Whole Genome Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 13, 2005 ChengXiang Zhai Department of Computer Science University of.
Biotechnology You Will Learn About… Transformation Cloning DNA Fingerprinting by Restriction Fragment Length Polymorphism (RFLP) What is the name of the.
Genetic Changes  Humans have changed the genetics of other species for thousands of years by selective breeding  Causing Artificial Selection  Natural.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
Biotechnology.
DNA Sequencing.
DNA Sequencing Project
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
DNA Technologies (Introduction)
21.8 Recombinant DNA DNA can be used in
Alu insert, PV92 locus, chromosome 16
Human Cells Human genomics
Chapter 14 Bioinformatics—the study of a genome
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
CHAPTER 12 DNA Technology and the Human Genome
DNA and the Genome Key Area 8a Genomic Sequencing.
A Sequenciação em Análises Clínicas
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Tools for Molecular Biology
Presentation transcript:

DNA Sequencing

DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

Human Genome Project 1990: Start 2000: Bill Clinton: 2001: Draft “most important scientific discovery in the 20th century” 2000: Bill Clinton: 2001: Draft 3 billion basepairs 2003: Finished $3 billion now what?

Which representative of the species? Which human? Answer one: Answer two: it doesn’t matter Polymorphism rate: number of letter changes between two different members of a species Humans: ~1/1,000 Other organisms have much higher polymorphism rates Population size! Just beneath the surface of the ocean floats a tiny egg. Within it's hull of follicle cells the embryo of Ciona, a sea squirt or Ascidian, is beginning to develop. The shape of the egg does not give any clue of the creature that it is going to be. Now about a third of a millimeter it is waiting to become a larva, and this larva is a rather surprising creature. The larva is developing within a few weeks. A head and a tail are evolving. When you look carefully you can see a rod that stiffens the tail. We know such a device as a notochord. A trained zoologist would identify the animal as belonging to the Phylum Chordata, and that is the same group we humans belong to. (See footnote). Also the internal organs are beginning to show. But the image is a bit deceiving. What seems to be an eye is in fact a device for equilibrium called the 'Otolith'. Above it we find the light sense organ, the 'Ocellus' So there is something suspicious about these so-called sea squirts. Freed from it's shell a familiar form has appeared. Clearly the resemblance with a tadpole larva is seen. The little creature swims for a few hours to find a good spot somewhere on a solid surface. Then a surprising thing happens. This larva is not going to evolve into a fish, amphibian or anything like that. With the front of it's head it attaches itself to a surface. Within minutes resorption of the larva tail commences and the sea squirt will stay on that same spot for all it's life

Why humans are so similar Out of Africa N A small population that interbred reduced the genetic variation Out of Africa ~ 40,000 years ago Heterozygosity: H H = 4Nu/(1 + 4Nu) u ~ 10-8, N ~ 104  H ~ 410-4

There is never “enough” sequencing Somatic mutations (e.g., HIV, cancer) 100 million species Sequencing is a functional assay 7 billion individuals

Sequencing Growth Cost of one human genome 2004: $30,000,000 2004: $30,000,000 2008: $100,000 2010: $10,000 2014: “$1,000” (???) ???: $300 How much would you pay for a smartphone?

Ancient sequencing technology – Sanger Vectors DNA Shake DNA fragments Known location (restriction site) Vector Circular genome (bacterium, plasmid) + =

Ancient sequencing technology – Sanger Gel Electrophoresis Start at primer (restriction site) Grow DNA chain Include dideoxynucleoside (modified a, c, g, t) Stops reaction at all possible points Separate products with length, using gel electrophoresis

Fluorescent Sanger sequencing trace Lane signal (Real fluorescent signals from a lane/capillary are much uglier than this). A bunch of magic to boost signal/noise, correct for dye-effects, mobility differences, etc, generates the ‘final’ trace (for each capillary of the run) Trace Recombinant DNA: Genes and Genomes. 3rd Edition (Dec06). WH Freeman Press. Slide Credit: Arend Sidow

Making a Library (present) shear to ~500 bases put on linkers Left handle: amplification, sequencing “Insert” Right handle: amplification, sequencing eventual forward and reverse sequence PCR to obtain preparative quantities size selection on preparative gel Final library (~600 bp incl linkers) after size selection Slide Credit: Arend Sidow

Library Library is a massively complex mix of -initially- individual, unique fragments Library amplification mildly amplifies each fragment to retain the complexity of the mix while obtaining preparative amounts (how many-fold do 10 cycles of PCR amplify the sample?) Slide Credit: Arend Sidow

Fragment vs Mate pair (‘jumping’) (Illumina has new kits/methods with which mate pair libraries can be built with less material) Slide Credit: Arend Sidow

Illumina cluster concept Slide Credit: Arend Sidow

Cluster generation (‘bridge amplification’) Slide Credit: Arend Sidow

Clonally Amplified Molecules on Flow Cell Slide Credit: Arend Sidow

Illumina Sequencing: Reversible Terminators fluorophore Detection O HN N 3’ DNA Incorporate Ready for Next Cycle O DNA HN N 3’ free 3’ end OH Deblock and Cleave off Dye cleavage site O HN O N PPP O 3’ 3’ OH is blocked Slide Credit: Arend Sidow

Sequencing by Synthesis, One Base at a Time 3’- …-5’ G T C T A G A G A C T T C T G Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases Detect signal Cycle 2-n: Add sequencing reagents and repeat Slide Credit: Arend Sidow

Read Mapping Slide Credit: Arend Sidow

Variation Discovery Slide Credit: Arend Sidow

Amount of variation – types of lesions Mutation Types 1000 Genomes consortium pilot paper, Nature, 2010 3,000,000 “we’re heterozygous in every thousandth base of our genome” Slide Credit: Arend Sidow

Method to sequence longer regions genomic segment cut many times at random (Shotgun) Get one or two reads from each segment ~900 bp ~900 bp

Two main assembly problems De Novo Assembly Resequencing

Reconstructing the Sequence (De Novo Assembly) reads Cover region with high redundancy Overlap & extend reads to reconstruct the original genomic region

Definition of Coverage Length of genomic segment: G Number of reads: N Length of each read: L Definition: Coverage C = N L / G How much coverage is enough? Lander-Waterman model: Prob[ not covered bp ] = e-C Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides

Repeats Bacterial genomes: 5% Mammals: 50% Repeat types: Low-Complexity DNA (e.g. ATATATATACATA…) Microsatellite repeats (a1…ak)N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 106 copies LINE (Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies

Sequencing and Fragment Assembly AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT 3x109 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions

What can we do about repeats? Two main approaches: Cluster the reads Link the reads

What can we do about repeats? Two main approaches: Cluster the reads Link the reads

What can we do about repeats? Two main approaches: Cluster the reads Link the reads