Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?

Similar presentations


Presentation on theme: "Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?"— Presentation transcript:

1 Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?

2 Eukaryotes Large Have organelles Diploid (mostly) linear chromosomes lower % coding Genes have introns

3 Genomes—How Big? Genome Size # of Genes H. influenzae 1.8 Mb1700 E. coli4.7 Mb4400 Yeast12 Mb6300 Diatom (Thaps)34 Mb11,000 Fruit Fly180 Mb13,600 Fugu400 Mb30,000 Human3000 Mb30,000

4 http://www.genomesize.com/ Gregory, 2004 Paleobiology 30:179-202 1pg ~= 1 billion base pairs (1000 Mbp).

5 Eukaryotic genomes are big What does this mean for sequencing? Strategies are similar  Low coverage of large insert library (BACs, fosmids)  Higher coverage of small insert library Finishing is harder  Often additional mapping tools, RE maps, optical maps employed to map scaffolds to chromosomes  Genomes released in “versions” (Thaps 3.0)  Publications often based on draft versions

6 Where are draft Versions in GenBank? Model organisms have their own web sites YeastDB WormDB FlyBase

7 Eukaryotic genomes are diploid What does this mean for sequencing? Finishing is harder  Will never get a 100% consensus  Instead identify “high quality discrepancies”  What is the sequence in the released genome?  How to find where the SNPs are?  T. pseudonana 0.75% of nuclear genome polymorphic

8 Eukaryotic genomes are arranged in linear chromosomes Finishing is harder  Need to use additional maps to decide if contigs shoulf be joined or belong on their own chromosoms Additional mechanisms of gene duplication available/common

9 Eukaryotic genomes have low % coding Finishing is harder  Much of non coding DNA made up of “selfish DNA”  Repeats  make assembly problematic  Thaps: 2% of genome is retrotransposons Mammalian cells—less than 1% of genomic DNA is coding

10 Eukaryotic gene structure

11 Gene finding in eukaryotic genomes Relies on both signal sequences and coding statistics  Signals: promoters, start and stop codons, splice sites, poly A sites  These are all relatively weak signals  Need to combine with codon statistics Organisms Specific Training Set is crucial  Generated from cDNA library sequenced in conjucntion with genome project

12 Implications for Environmental genomics Need even more sequencing to get adequate coverage For any given piece of DNA, likely to have fewer genes than if were prokaryotic in origin Current state of gene finding and available genomes for comparison mean gene finders likely have very poor perfomrance on DNA of unknown origin


Download ppt "Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?"

Similar presentations


Ads by Google