Today Please read… Science 291: 1304-1315. Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects.

Slides:



Advertisements
Similar presentations
The Human Genome Project
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Celera Assembler Arthur L. Delcher Senior Research Scientist CBCB University of Maryland.
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Today’s Lecture Topics
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Cloning lab results Cloning the human genome Physical map of the chromosomes Genome sequencing Integrating physical and recombination maps Polymorphic.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
DNA Sequencing – “Plus and Minus” Plus –Incubate with T4 DNA Polymerase and single dNTP –T4 Polymerase degrades 3’ ends in absence of dNTP –Fractionated.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Assembly.
Stuff to Do. Midterm I questions due 1/31 me your question (with answers), –if you have the capability, mail complete questions, figures, etc. and.
The Human Genome Race. Collins vs. Venter Collins Venter.
Central Dogma Information storage in biological molecules DNA RNA Protein transcription translation replication.
3 September, 2004 Chapter 20 Methods: Nucleic Acids.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Whole Genome Assembly. WGA 1. Screener 2. Overlapper 3. Unitigger, 4. Scaffolder, 5. Repeat Resolver.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Making, screening and analyzing cDNA clones Genomic DNA clones
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Today’s Lecture Genetic mapping studies: two approaches
Relationship between Genotype and Phenotype
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
AP Biology: Chapter 14 DNA Technologies
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
CS 394C March 19, 2012 Tandy Warnow.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Todd J. Treangen, Steven L. Salzberg
DNA Technology Chapter 20.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
Genomics BIT 220 Chapter 21.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Genomics.
Human Genome.
DNA LIBRARIES Dr. E. What Are DNA Libraries? A DNA library is a collection of DNA fragments that have been cloned into a plasmid and the plasmid is transformed.
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
Plasmids that contain l cos sites.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Rest of Chapter 11 Chapter 12 Genomics, Proteomics, and Transgenics Jones and Bartlett Publishers © 2005.
Engineering magnetosomes to express novel proteins Which ones? Tweaking p18 Linker Deleting or replacing GFP Something else? TRZN Oxalate decarboxylases.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
DNA Technology & Genomics CHAPTER 20. Restriction Enzymes enzymes that cut DNA at specific locations (restriction sites) yielding restriction fragments.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Human Genome Project.
Pre-genomic era: finding your own clones
Stuff to Do.
Relationship between Genotype and Phenotype
Today… Review a few items from last class
CSCI 1810 Computational Molecular Biology 2018
Presentation transcript:

Today Please read… Science 291:

Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects biggest critics were… –Sydney Brenner: believed that the HGP should focus on human EST collections, and sequence the genome of a simple vertebrate (Fugu). –Craig Venter: believed that the clone-by-clone approach was not the most efficient way to proceed, suggested that shotgun approaches, and even a whole genome approach was feasible. …they were both right.

Sydney Brenner 2002 Nobel Prize (Medicine/Physiology) Sydney Brenner and John E. Sulston, Britain H. Robert Horvitz, United States –for discoveries concerning how genes regulate organ development and a process of programmed cell death.

End sequenced cDNAs (complementary DNA) Expressed Sequence Tags ESTs cDNA: synthetic DNA transcribed from a mRNA template, –through the action of an RNA dependant DNA polymerase called reverse transcriptase. Online Primer: est.html Brenner was right….

Still Sequencing cDNAs, - first and easiest look into any genome, - useful in understanding genomic sequence (gene finding), - helps determine splice site variants, - shorter than genomic clones, fits in plasmids, - etc.

…tissue specific ESTs are very useful. Used for microarrays… …an array of DNA that can be hybridized with probes to study patterns of gene expression.

Whole Genome Assembly 1995: 1.8 Mbp Haemophilus influenza genome sequenced, on : Mycoplasma, E. coli and others*, 1999: Chromosome 2 of Arabidopsis, 2000: Drosophila (120 Mbp) genome, …Human, Mosquito, etc… Lots of genomes, several applications... *WGA of bacterial, viral populations... Venter was right…. J. Craig Venter

1 year, 120 megabases, Assembly algorithms could generate accurate genomic sequences, Interim assemblies (or mapping) were not necessary. 24 MARCH 2000 VOL 287 SCIENCE

Big Biology

Think About This… …the plasmid library construction is the first critical step in WGA sequencing, –“if the DNA libraries are not uniform in size, non-chimeric, and do not randomly represent the genome, then the subsequent steps cannot accurately reconstruct the genome sequence.” –“We used automated high-throughput DNA sequencing and the computational infrastructure to enable efficient tracking of enormous amounts of sequence information (27.3 million sequence reads; 14.9 billion bp of sequence).”

Who’s DNA? 21 enrolled donors, –age, sex, ethnographic group, –one African-American, –one Asian-Chinese, –one Hispanic-Mexican, –two Caucasions*.

Who’s Mostly? J. Craig Venter

8, September , June bp average sequence read …back to humans… What to know? Individuals, Libraries, Sequence coverage, Clone coverage, Other?

WGA Outline Online Primer: snps.html

5’- actgtacgtgtagctgaca… - 3’5’- tagcgtagttattttgc… - 3’ = sequenced ends ~543 bp unsequenced insert ~ known size = 5’- actgtacgtgtagctgaca actgtacgtgtagctgaca - 3’ insert vector sequencing primers DNA in sized libraries… DNA sequence in mate-pairs… cartoons

8, September , June bp average sequence read …back to humans… What to know? Individuals, Libraries, Sequence coverage, Clone coverage, Other?

Whole Genome Assembly What does Shredder Do? Why? 1. Screener 2. Overlapper 3. Unitigger/Discriminator, 4. Scaffolder, 5. Repeat Resolver.

Screener...finds and “masks” microsatellite repeats, known repeated regions and ribosomal DNA, –“masked” regions not used to make contigs, –“marks” the rest for overlapping. atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgt gtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgca tgtga read: atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgt gtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgca tgtga masked: atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgt gtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgca tgtga marked:

Overlapper...looks for end-to end overlaps of at least 40 bp with no more than 6% differences in match, What’s the significance?...a one in event. <--tactgtacgtagctgtgatgttcctcggatatagcgggcatatttattacgctattgtacgtgt-3’ 5’- gttcctcggatatagcgggcatatttattacgctattgtacgtgtaaagtatcgt--> > 40 bp, < 6% mismatch …given perfect randomness.

Good News... uniquely assembled contigs (unitigs) are readily identifiable, –all of the assembled sequences match over all of the known sequence, - and -...are consistent with an 8x sequence coverage.

Whole Genome Assembly What does Shredder Do? Why? 1. Screener 2. Overlapper 3. Unitigger/Discriminator, 4. Scaffolder, 5. Repeat Resolver.

Unitigs...contig cluster is consistent with expected size (+8),...no dissimilar sequences between any members....the Screener doesn’t include all of the “low frequency” level repeats,...so, a majority of the Overlapper outputs turned out to be bogus. But(t):

What Now? –“over-collapsed” assemblies are identified and broken down into unitigs when possible... –…these “too-large” contig sets are sent to the Unitigger/Discriminator.

...over-collapsed....in a world where real data matches expected data, each locus would have 8X coverage,...if there are genomic repeats, then sequences would be “over-represented”, on average, 8 more per repeat, per contig. Unitigger...differentiates between a true overlap, and an overlap that includes more than one loci.

Discriminator...parses the “over- collapsed” contig by using sequence outside of the overlap region

Discriminator...may yield u-unitigs. Unitigger/Discriminator Output: correctly assembled contigs covering 73.6% of the genome.

Scaffolder...contigs the contigs, –uses mate-pair information, two or more consistent mate-pair matches yields 1 in odds of being chance.

Repeat Resolver...most of the remaining gaps were due to repeats. “Rocks” Use “low Discriminator Value” contig sets to fill gaps, - find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 10 7 ), “Stones” - find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences. confirm matches

Repeat Resolver...most of the remaining gaps were due to repeats. “Rocks” Use “low Discriminator Value” contig sets to fill gaps, - find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 10 7 ), “Stones” - find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.

If that Doesn’t Work...find a mate-pair that spans the gap, and sequence it, Chromosome Walking...make sequencing primer from BES...

Today/Friday Questions about WGA, CSA, Comparisons, Quality Control, etc.