Genome Sequencing and Assembly High throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Slides:



Advertisements
Similar presentations
Celera Assembler Arthur L. Delcher Senior Research Scientist CBCB University of Maryland.
Advertisements

Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Doug Brutlag 2011 Sequencing the Human Genome Doug Brutlag Professor Emeritus of Biochemistry.
Next-generation sequencing
The past, present, and future of DNA sequencing Dan Russell.
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
What Is Genomics? Genomics is the study of how the entire genome of a species functions as a unit and evolves over time. It is the study of life’s blueprint,
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
CISC667, F05, Lec4, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Whole genome sequencing Mapping & Assembly.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Stuff to Do. Midterm I questions due 1/31 me your question (with answers), –if you have the capability, mail complete questions, figures, etc. and.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Genomics. Gene expression DNA (Genome) pre-mRNA mRNA mRNA (Transcriptome) Proteins (Proteome) Metabolites (Metabolome) Regulation Nucleus Cytoplasm Chromatography.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey Chapter 4 Genome Sequencing Strategies and procedures for.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
High Throughput Sequencing
CS 6293 Advanced Topics: Current Bioinformatics
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
Mouse Genome Sequencing
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
High Throughput Sequencing Methods and Concepts
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Phred/Phrap/Consed Analysis A User’s View Arthur Gruber International Training Course on Bioinformatics Applied to Genomic Studies Rio de Janeiro 2001.
Genomics BIT 220 Chapter 21.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
The Changing Face of Sequencing
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
Human Genome.
2nd TOMATO FINISHING WORKSHOP chromosome 9 Wageningen, April 24-25, 2008.
Automatic DNA and Genome Sequencing
Ultra-High Throughput DNA Sequencing on the 454/Roche GS-FLX
Locating and sequencing genes
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
CISC667, S07, Lec4, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Whole genome sequencing Mapping & Assembly.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
Next-generation sequencing technology
Virginia Commonwealth University
Sequencing technologies
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Genomics Sequencing genomes.
Next-generation sequencing technology
Pre-genomic era: finding your own clones
Stuff to Do.
How to Build a Horse: Final Report
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Presentation transcript:

Genome Sequencing and Assembly High throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

2 Competing Sequencing Strategies Clone-by-clone and whole-genome shotgun

3 Clone-by-Clone Shotgun Sequencing E.g. Human genome project Map construction Clone selection Subclone library construction Random shotgun phase Directed finishing phase and sequence authenticationDirected finishing phase and sequence authentication

4 Map Construction Clone genomic DNA in YACs (~1MB) or BACs (~200KB) Map the relative location of clones –Sequenced-tagged sites (STS, e.g. EST) mapping –PCR or probe hybridization to screen STS –Restriction site fingerprint Most time consuming – to generate physical maps for human

5 Resolve Clone Relative Location Find a column permutation in the binary hybridization matrix, all ones each row are located in a block DNA clone 1 clone 2 clone 3 clone 4 STS: STS Clone

6 Clone Selection Based on clone map, select authentic clones to generate a minimum tiling path –Most important criteria: authentic

7 Subclone Library Construction DNA fragmented by sonication or RE cut Fragment size ~ 2-5 KB

8 Random Shotgun Phase Dideoxy termination reaction Informatics programs Coverage and contigs

9 Dideoxy Termination Method invented by Fred Sanger Automated sequencing developed by Leroy Hood (Caltech) and Michael Hunkapiller (ABI) Sequencing primers

10 Bioinformatics Programs Developed at Univ. Wash Phred –Base calling –Phil Green Phrap –Assembly –Brent Ewing Consed –Viewing and editing –David Gordon

11 Coverage and contigs Coverage: sequenced bp / fragment size –E.g. 200KB BAC, sequenced 1000 x 500bp subclones, coverage = 1000 x 500bp / 200KB = 2.5X Lander-Waterman curve

12 Directed Finishing Phase David Gordon: auto-finish –Deign primers at gap 2 ends, PCR amplify, and sequence the two ends until they meet Sequence authentication: verify STS and RE sites –Finished: < 1 error (or ambiguity) in 10,000bp, in the right order and orientation along a chromosome, almost no gaps.

13 Genome-Shotgun Sequencing Celera human and drosophila genomes No physical map Jigsaw puzzle assembly Coverage ~7- 10X

14 Shotgun Assembly Screener –Identify low quality reads, contamination, and repeats Overlapper –>= 40bp overlap with <= 6% mismatches Unitigger –Combine the easy (unique assembly) subset first Scaffolder & repeat resolution –Generate different sized-clone libraries, and just sequence the clone ends (read pairs) –Use physical map information if available Consensus..ACGATTACAATAGGTT..

15 Hybrid Method

16 Hybrid Method Optimal mixture of clone-by-clone vs whole-genome shotgun not established –Still need 8-10X overall coverage –Bacteria genomes can be sequenced WGS alone –Higher eukaryotes need more clone-by-clone –Comparative genomics can reduce the physical mapping (clone-by-clone) need

First Generation Sequencing

Human Genome Project Cost $3,000,000,000 Thousands of scientists in six countries Triumph of automation and bioinformatics More significant than the Manhattan Project and moon landing 18

Second Generation Sequencing

20 2 nd Gen Sequencing Tech Traditional sequencing: 384 reads ~1kb / 3 hours 454 (Roche): –1M reads bp / hours HiSeq (Illumina): – – M reads of bp / 3-8 days * 16 samples SOLiD (Applied Biosystems) –>100M reads of 50-60bp / 2-8 days * 12 samples Ion Torrent (Roche): – –5-10M reads of bp / < 2 hours

Illumina HiSeq2000 Throughput: –$ / lane (depends on read length, SE / PE) – bp / read –16 lanes (2 flow cells) / run – million reads / lane –Sequencing a human genome: $3000, 1 week Bioinfo challenges –Very large files –CPU and RAM hungry –Sequence quality filtering –Mapping and downstream analysis 21

(Potential) Applications Metagenomics and infectious disease Ancient DNA, recreate extinct species Comparative genomics (between species) and personal genomes (within species) Genetic tests and forensics Circulating nucleic acids Risk, diagnosis, and prognosis prediction Transcriptome and transcriptional regulation –More later in the semester… 22

Third Generation Sequencing Single molecule sequencing (no amplification needed) Oxford Nanopore: Read fewer but longer sequences In 1-2 years, the cost of sequencing a human genome will drop below $1000, storage will cost more than sequencing Personal genome sequencing might become a key component of public health in every developed country Bioinformatics will be key to convert data into knowledge 23

24 Summary Genome sequencing and assembly –Clone-by-clone: HGP Map big clones, find path, shotgun sequence subclones, assemble and finish Sequencing: dideoxy termination –Whole genome shotgun: Celera Massively Parallel Sequencing –454, Solexa, SOLiD, Ion Torrent, Oxford Nanopore… –Many opportunities and many challenges

25 Acknowledgement Fritz Roth Dannie Durand Larry Hunter Richard Davis Wei Li Jarek Meller Stefan Bekiranov Stuart M. Brown Rob Mitra