Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today Please read… Science 291: 1304-1315. Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects.

Similar presentations


Presentation on theme: "Today Please read… Science 291: 1304-1315. Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects."— Presentation transcript:

1 Today Please read… Science 291: 1304-1315

2 Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects biggest critics were… –Sydney Brenner: believed that the HGP should focus on human EST collections, and sequence the genome of a simple vertebrate (Fugu). –Craig Venter: believed that the clone-by-clone approach was not the most efficient way to proceed, suggested that shotgun approaches, and even a whole genome approach was feasible. …they were both right.

3 Sydney Brenner 2002 Nobel Prize (Medicine/Physiology) Sydney Brenner and John E. Sulston, Britain H. Robert Horvitz, United States –for discoveries concerning how genes regulate organ development and a process of programmed cell death.

4 End sequenced cDNAs (complementary DNA) Expressed Sequence Tags ESTs cDNA: synthetic DNA transcribed from a mRNA template, –through the action of an RNA dependant DNA polymerase called reverse transcriptase. Online Primer: est.html Brenner was right….

5 Still Sequencing cDNAs, - first and easiest look into any genome, - useful in understanding genomic sequence (gene finding), - helps determine splice site variants, - shorter than genomic clones, fits in plasmids, - etc.

6 …tissue specific ESTs are very useful. Used for microarrays… …an array of DNA that can be hybridized with probes to study patterns of gene expression.

7 Whole Genome Assembly 1995: 1.8 Mbp Haemophilus influenza genome sequenced, 1996 - on : Mycoplasma, E. coli and others*, 1999: Chromosome 2 of Arabidopsis, 2000: Drosophila (120 Mbp) genome, …Human, Mosquito, etc… Lots of genomes, several applications... *WGA of bacterial, viral populations... Venter was right…. J. Craig Venter

8

9 1 year, 120 megabases, Assembly algorithms could generate accurate genomic sequences, Interim assemblies (or mapping) were not necessary. 24 MARCH 2000 VOL 287 SCIENCE

10 Big Biology

11 Think About This… …the plasmid library construction is the first critical step in WGA sequencing, –“if the DNA libraries are not uniform in size, non-chimeric, and do not randomly represent the genome, then the subsequent steps cannot accurately reconstruct the genome sequence.” –“We used automated high-throughput DNA sequencing and the computational infrastructure to enable efficient tracking of enormous amounts of sequence information (27.3 million sequence reads; 14.9 billion bp of sequence).”

12 Who’s DNA? 21 enrolled donors, –age, sex, ethnographic group, –one African-American, –one Asian-Chinese, –one Hispanic-Mexican, –two Caucasions*.

13 Who’s Mostly? J. Craig Venter

14

15 8, September 1999 - 25, June 2000 543 bp average sequence read …back to humans… What to know? Individuals, Libraries, Sequence coverage, Clone coverage, Other?

16

17 WGA Outline Online Primer: snps.html

18 5’- actgtacgtgtagctgaca… - 3’5’- tagcgtagttattttgc… - 3’ = sequenced ends ~543 bp unsequenced insert ~ known size = 5’- actgtacgtgtagctgaca actgtacgtgtagctgaca - 3’ insert vector sequencing primers DNA in sized libraries… DNA sequence in mate-pairs… cartoons

19 8, September 1999 - 25, June 2000 543 bp average sequence read …back to humans… What to know? Individuals, Libraries, Sequence coverage, Clone coverage, Other?

20 Whole Genome Assembly What does Shredder Do? Why? 1. Screener 2. Overlapper 3. Unitigger/Discriminator, 4. Scaffolder, 5. Repeat Resolver.

21 Screener...finds and “masks” microsatellite repeats, known repeated regions and ribosomal DNA, –“masked” regions not used to make contigs, –“marks” the rest for overlapping. atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgt gtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgca tgtga read: atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgt gtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgca tgtga masked: atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgt gtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgca tgtga marked:

22 Overlapper...looks for end-to end overlaps of at least 40 bp with no more than 6% differences in match, What’s the significance?...a one in 10 17 event. <--tactgtacgtagctgtgatgttcctcggatatagcgggcatatttattacgctattgtacgtgt-3’ 5’- gttcctcggatatagcgggcatatttattacgctattgtacgtgtaaagtatcgt--> > 40 bp, < 6% mismatch …given perfect randomness.

23 Good News... uniquely assembled contigs (unitigs) are readily identifiable, –all of the assembled sequences match over all of the known sequence, - and -...are consistent with an 8x sequence coverage.

24 Whole Genome Assembly What does Shredder Do? Why? 1. Screener 2. Overlapper 3. Unitigger/Discriminator, 4. Scaffolder, 5. Repeat Resolver.

25 Unitigs...contig cluster is consistent with expected size (+8),...no dissimilar sequences between any members....the Screener doesn’t include all of the “low frequency” level repeats,...so, a majority of the Overlapper outputs turned out to be bogus. But(t):

26 What Now? –“over-collapsed” assemblies are identified and broken down into unitigs when possible... –…these “too-large” contig sets are sent to the Unitigger/Discriminator.

27 ...over-collapsed....in a world where real data matches expected data, each locus would have 8X coverage,...if there are genomic repeats, then sequences would be “over-represented”, on average, 8 more per repeat, per contig. Unitigger...differentiates between a true overlap, and an overlap that includes more than one loci.

28 Discriminator...parses the “over- collapsed” contig by using sequence outside of the overlap region

29 Discriminator...may yield u-unitigs. Unitigger/Discriminator Output: correctly assembled contigs covering 73.6% of the genome.

30 Scaffolder...contigs the contigs, –uses mate-pair information, two or more consistent mate-pair matches yields 1 in 10 10 odds of being chance.

31 Repeat Resolver...most of the remaining gaps were due to repeats. “Rocks” Use “low Discriminator Value” contig sets to fill gaps, - find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 10 7 ), “Stones” - find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences. confirm matches

32 Repeat Resolver...most of the remaining gaps were due to repeats. “Rocks” Use “low Discriminator Value” contig sets to fill gaps, - find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 10 7 ), “Stones” - find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.

33 If that Doesn’t Work...find a mate-pair that spans the gap, and sequence it, Chromosome Walking...make sequencing primer from BES...

34 Today/Friday Questions about WGA, CSA, Comparisons, Quality Control, etc.


Download ppt "Today Please read… Science 291: 1304-1315. Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects."

Similar presentations


Ads by Google