SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Celera Assembler Arthur L. Delcher Senior Research Scientist CBCB University of Maryland.
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Genome Assembly: a brief introduction
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Lecture 14 Genome sequencing projects
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
DNA Sequencing – “Plus and Minus” Plus –Incubate with T4 DNA Polymerase and single dNTP –T4 Polymerase degrades 3’ ends in absence of dNTP –Fractionated.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Genome sequence assembly
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Assembly.
Whole Genome Sequencing, Comparative Genomics, & Systems Biology Gene Myers University of California Berkeley.
The Human Genome Race. Collins vs. Venter Collins Venter.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Whole Genome Assembly. WGA 1. Screener 2. Overlapper 3. Unitigger, 4. Scaffolder, 5. Repeat Resolver.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Genome sequencing and assembling
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
De-novo Assembly Day 4.
How to Build a Horse Megan Smedinghoff.
Genomic sequencing and its data analysis Dong Xu Digital Biology Laboratory Computer Science Department Christopher S. Life Sciences Center University.
CS 394C March 19, 2012 Tandy Warnow.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
Human Genome.
Automatic DNA and Genome Sequencing
Today Please read… Science 291: Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Virginia Commonwealth University
Human Genome Project.
DNA Sequencing Project
Genomics Sequencing genomes.
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Genome sequence assembly
Pre-genomic era: finding your own clones
Introduction to Genome Assembly
CS 598AGB Genome Assembly Tandy Warnow.
How to Build a Horse: Final Report
Discovery tools for human genetic variations
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Presentation transcript:

SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Shotgun DNA Sequencing (Computation) Unknown “ Target ” DNA Sequence Layout Consensus Contigs Fragments Randomly Sample ( “ Shotgun ” ) Fragments l UNKNOWN ORIENTATION l SEQUENCING ERRORS l INCOMPLETE COVERAGE l CONSTRAINTS (MATES) l REPEATS G = 100KbpTarget Length (e.g., BAC, P1, PAC) (e.g., BAC, P1, PAC) F = 1600# of Fragments L = 500Avg. Fragment Length N = FL = 800KbpTotal Bases Sequenced c = N/G = 8Avg. Coverage

Physical Mapping Whole Genome Sequencing Approaches u Hierarchical HGP Approach: Target – 2 separate processes – maps very hard to complete, libraries unstable – must make shotgun library of each BAC + infrastructure is already developed + quality of outcome is known Minimum Tiling Set (~30,000 BACs for human) for human) Shotgun Sequencing

– Early simulations showed that if repeats were considered black boxes, one could still cover 99.7% of the genome unambiguously. BAC 5 ’ BAC 3 ’ – Collect 10-15x BAC inserts and end sequence: ~ 300K pairs for Human. – Collect 10x sequence in a 1 -1 ratio of two types of read pairs: ~ 70million reads for Human. Short Long 2Kbp 10Kbp u Whole Genome Shotgun Sequencing: Whole Genome Sequencing Approaches + single process, three library constructions – assembly is much more difficult

Sequencing Factory  300 ABI 3700 DNA Sequencers installed  50 Production Staff  40 Support Staff (R&D, QC/QA, Service)  20,000 sq. ft. of wet lab  20,000 sq. ft. of sequencing space  800 tons of A/C (160,000 cfm)  4,000 amps electrical service

True vs. Repeat-Induced Overlaps implies A B A BTRUE ORA B REPEAT- INDUCED

Assembly Pipeline Screener Mask heterochromatin and ribo-DNA, Tag known interspersed repeats. Overlapper Find all overlaps  40bp allowing 6% mismatch. (1000X Blast) Unitiger ASSEMBLER CORE: Compute all consistent sub-assemblies = unitigs Compute all consistent sub-assemblies = unitigs Identify those that cover unique DNA = U-unitigs Identify those that cover unique DNA = U-unitigs Scaffold U-unitigs with confirmed shorts & longs Scaffold U-unitigs with confirmed shorts & longs Then with BAC ends Then with BAC ends Fill repeat gaps with: Fill repeat gaps with: I. Doubly anchored mates Scaffolder Repeat Rez I 8:37 86:25 38:29 4:12 4:12 5:44+4:21+19:53 Consensus Bayesian “ SNP ” consensus using quality values. Occurs throughout assembler core. (~25) (~25) 167:41 cpu hrs. for Dros Repeat Rez I, II, III II. O-path confirmed singly-anchored mates III. Greedy path completion using QVs

Assembly Progression (Macro View)

Proteomics Discovery (insert browser slide) 3.0 Homology based exon predictions Consensus gene structure (both strands) start and stop site predictions Splice site predictions computational exon predictions Tracking information Unique identifiers