Sequence the 3 billion base pairs of human

Slides:



Advertisements
Similar presentations
Celera Assembler Arthur L. Delcher Senior Research Scientist CBCB University of Maryland.
Advertisements

Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Doug Brutlag 2011 Sequencing the Human Genome Doug Brutlag Professor Emeritus of Biochemistry.
Lecture 14 Genome sequencing projects
Cloning lab results Cloning the human genome Physical map of the chromosomes Genome sequencing Integrating physical and recombination maps Polymorphic.
9 Genomics and Beyond Brief Chapter Outline
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
DNA Sequencing – “Plus and Minus” Plus –Incubate with T4 DNA Polymerase and single dNTP –T4 Polymerase degrades 3’ ends in absence of dNTP –Fractionated.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
DNA Sequencing. CS273a Lecture 3, Spring 07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
DNA Sequencing and Assembly
The Human Genome Race. Collins vs. Venter Collins Venter.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Lecture 2. Genome sequencing What good is it? 9/2/09.
Genome sequencing and assembling
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey Chapter 4 Genome Sequencing Strategies and procedures for.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Sequencing a genome (a) outline the steps involved in sequencing the genome of an organism; (b) outline how gene sequencing allows for genome-wide comparisons.
“First generation" sequencing technologies and genome assembly
Genome Sequencing and Assembly High throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Mouse Genome Sequencing
Large-scale genome projects
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
CO 10.
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey Chapter 3 Fundamentals of Mapping and Sequencing Basic principles.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
Status report on gap closure of the human chromosome 5 BAC map Authentication of C5 BAC maps Map and sequence status Gap status and steps used to close.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.
Theobroma cacao Integrated Physical and Genetic Map 2 BAC Libraries 250 Genetic Markers.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006.
-Know that we can manipulate genomes by inserting or deleting certain genes. -What about synthesizing an entirely novel genome using sequencing technology?
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Human Genome.
Automatic DNA and Genome Sequencing
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Accessing and visualizing genomics data
Genome Analysis Assaad text book slides only Lectures by F. Assaad can be downlaoded from muenchen.de/~farhah/index.htm.
16 th April 2007 Christine Nicholson, Mapping Core Group Wellcome Trust Sanger Institute Tomato Chromosome 4 Mapping & Use of FPC Copyright Wellcome Trust.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
Virginia Commonwealth University
Pre-genomic era: finding your own clones
Stuff to Do.
Bioinformatics: Buzzword or Discipline (???)
Genomics Genetic Analysis on a Genome-wide (global) scale
A Sequenciação em Análises Clínicas
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Human Genome Project Seminal achievement. Scientific milestone.
A Lot More Advanced Biotechnology Tools
Presentation transcript:

Sequence the 3 billion base pairs of human DNA and identify the 100,000 genes contained in the human genome

Goals of the Human Genome Project 1. Sequence: Human 3.0 x 109 Mouse 3.0 x 109 Drosophila 1.1 x 108 Worm 1.0 x 108 Dictyostellium 3.4 x 107 Yeast 1.2 x 107 Bacteria 1.0 - 5.0 x 106 BCM- HGSC 2

Goals of the Human Genome Project 2. Characterize all genes and enable studies of genetics, evolution and function. BCM- HGSC 2

BERMUDA 1996 ‘Primary Genomic Sequence Should be in the Public Domain’ Should be Rapidly Released’ BCM- HGSC

Quality < 1 error/ 10,000 (Polymorphism rate is 1/1,000) No gaps or ‘mis-assemblies’ Merit for high quality data only ‘Slippery Slope’ Arguments BCM- HGSC

Technology ABD 4 color Fluorescence Mapped-Clone Approach Random Phase Directed Phase Modular, 96 well Automation BCM- HGSC

3.0 Gb by Oct 2005? (Feb ‘98) ? x X X X BCM- HGSC

- New Capillary Instrument - >10 runs/day x 96 samples May 98: P/E :Celera Scheme - New Capillary Instrument - >10 runs/day x 96 samples - Total 230 Instruments - $330M Private Funds - Total 250,000 reads/day - Whole Genome Shotgun BCM- HGSC

- ‘Public Release’, 3 months Delay P/E ‘Celera’ Scheme:Release Policy - ‘Public Release’, 3 months Delay - Consensus sequence only - All SNPs held - Drosophila, Mouse BCM- HGSC

Regional mapping

Regional mapping

Regional mapping Minimal tiling path selected for sequencing.

Restriction fragment fingerprinting Molecular weight marker every 5th lane Restriction fragment fingerprinting >20 kbp ~300 bp - BAC clones are grown in 96-well format - Hind III digest - 1% agarose

Contig assembly FPC* Overlap identification by restriction pattern similarities Facilitated contig assembly *Sanger Centre C. Soderlund, I Longden and R. Mott Clone A B C D E F G * All restriction fragments within a clone selected for the tiling path must be verified by their presence in overlapping clones. : insert fragments : vector fragments

Shotgun Sequencing I :RANDOM PHASE Sheared DNA: 1.0-2.0 kb Bac Clone: 100-200 kb Random Reads Sequencing Templates: BCM- HGSC

Shotgun Sequencing II:ASSEMBLY Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

Shotgun Sequencing III: FINISHING Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

Shotgun Sequencing III: FINISHING Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

Shotgun Sequencing III: FINISHING Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

Shotgun Sequencing III: FINISHING Mis-Assembly (Inverted) Consensus BCM- HGSC

Shotgun Sequencing III: FINISHING High Accuracy Sequence: < 1 error/ 10,000 bases BCM- HGSC

Whole Genome Shotgun Sequencing Sheared DNA: 1.0-2.0 kb Whole Genome: 3,000 Mb Random Reads Sequencing Templates: BCM- HGSC

Whole Genome Shotgun Sequencing:Assembly Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

Whole Genome Shotgun Sequencing:Assembly Sequence Gap Low Base Quality Consensus BCM- HGSC

- Regions very densely covered - Contigs 1.0 -15 kb P/E ‘Celera’ Scheme:10 X coverage in three years - Regions not covered - Regions very densely covered - Contigs 1.0 -15 kb - # Gaps? >100,000? - Base Quality High or Low? - Mis-Assemblies? - Duplications? BCM- HGSC

‘That (draft) sequence will be of lower accuracy and contiguity….. ‘Complete an accurate, high quality sequence of the human genome by the end of 2003, …….a working draft can be completed…within the next three years…’ ‘That (draft) sequence will be of lower accuracy and contiguity….. …will be useful for finding genes…and other features….’ BCM- HGSC

Integrating Multiple Sources of Data Human Genome Sequencing Project Integrating Multiple Sources of Data Chromosome Map location Clone Fingerprint Project XYZ ??? Celera NHGRI Random sequences 500 bp reads consensus (3-5 kb) Mapped projects (~100kb) 5-20 contigs (10-20kb) How to use Celera data in NHGRI assemblies? Lichtarge Lab - HGSC Baylor College of Medicine