Presentation is loading. Please wait.

Presentation is loading. Please wait.

chromosome organization, what about genome organization?

Similar presentations


Presentation on theme: "chromosome organization, what about genome organization?"— Presentation transcript:

1 chromosome organization, what about genome organization?
We have talked about chromosome organization, what about genome organization?

2 Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.

3 C value paradox: the amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes.

4

5

6 There are different classes of eukaryotic DNA based on sequence complexity.

7 Reassociation Kinetics

8 3 Main Components in Eukaryotic Genomes

9

10 The human genome - Two versions of human genome sequences were published in February DNA sequences that encode proteins make up only 5% of the genome - ~50% sequences are transposable elements; clusters of gene-rich regions are separated by gene deserts - CH 19 has the highest gene density, CH 13 & Y show the lowest gene density

11

12 The human genome -Gene total estimated 30,000-40,000, w/ an average gene size of 27 Kb - Hundreds of genes share homology w/ those of bacteria - The number of introns vary greatly (from 0 for histone to 234 for titin)

13 The human genome -Genes larger & contain more and larger introns compared to these in invertebrates (dystrophin gene is 2.5 Mb) - Genes are not evenly spaced on CHs - The most common genes include those: involved in nucleic acid metabolism-7.5%; receptors-5%; protein kinases-2.8% & cytoskeletal structural proteins-2.8%

14

15 Genome organization in plants
- Size of genome varies widely (100 Mb-5,500 Mb) - Many tandem gene duplications & larger duplications; some interchromosomal duplications also observed - Large-genome plants also have genes clustered with long stretches of intergenic DNA - In maize, the intergenic sequences are composed mainly of transposons

16

17

18 Single Copy Sequences

19 Genes can be difficult to identify/predict. Why?

20 The human genome turns out to have only about half or fewer (30,000 to 40,000) genes than we predicted (100,000). Why? Drosophila – 13,000 Nematode – 19,000

21

22

23

24 Problems? It is more complicated than that.
Some gene products are RNA (tRNA, rRNA, others) instead of protein Some nucleic acid sequences that do not encode gene products (noncoding regions) are necessary for production of the gene product (protein or RNA).

25 Coding region

26 Noncoding regions Regulatory regions Introns
RNA polymerase binding site Transcription factor binding sites Introns Polyadenylation [poly(A)] sites

27 Unique genes

28

29

30 Promoters Sequences can be quite distant from coding region

31 Introns/exons Most eukaryotic genes have introns
Introns are often much longer than exons Often many introns mRNA much shorter than genomic DNA Can vary between the same gene in different species

32

33

34

35 Splice Sites Eukaryotes only
Removal of internal parts of the newly transcribed RNA. Takes place in the cell nucleus Splice sites difficult to predict

36 Alternative splicing Different splice patterns from the same sequence, therefore different products from the same gene.

37

38

39 Alternative splicing Multiple promoters Multiple terminators
Alternatively spliced introns 59% of genes Average of ~3 forms

40

41 Exon Shuffling

42 Why genome size doesn’t matter
More sophisticated regulation of expression? Proteome vastly larger than genome? Alternate splicing RNA editing Postranslational modifications? Cellular location? Moonlighting

43 Gene Identification Open reading frames Sequence conservation
Database searches Synteny Sequence features CpG islands Evidence for transcription ESTs, microarrays, SAGE Gene inactivation Transformation, TEs, RNAi

44 Open reading frames 5'                                                   3'    atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa  1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa     M   P   K   L   N   S   V   E   G   F   S   S   F   E   D   D   V   *   2  tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat      C   P   S   *   I   A   *   R   G   F   H   H   L   R   T   M   Y   3   gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata       A   Q   A   E   *   R   R   G   V   F   I   I   *   G   R   C   I 

45 Database searches

46 Synteny

47 CpG islands CpG is subject to methylation, and most eukaryotes (not Drosophila) show less of this nonmethylated dinucleotide than base composition would indicate. Concentrations of CpG may be detected using restriction enzymes whose recognition sequences include CpG.

48 CpG islands Defined as regions of DNA of at least 200 bp in length that have a G+C content above 50% and a ratio of observed vs. expected CpGs close to or above 0.6. Used to help predict gene sequences, especially promoter regions.

49

50

51

52 Evidence for Transcription
cDNAs, ESTs (expressed sequence tags) microarrays

53 Gene families E.g. globins, actin, myosin Clustered or dispersed
Pseudogenes

54 Pseudogenes Nonfunctional copies of genes
Formed by duplication of ancestral gene, or reverse transcription (and integration) Not expressed due to mutations that produce a stop codon (nonsense or frameshift) or prevent mRNA processing, or due to lack of regulatory sequences

55 Duplicated genes Encode closely related (homologous) proteins
Formed by duplication of an ancestral gene followed by mutation Five functional genes and two pseudogenes

56

57 Coding sequences less than 5% of the genome!

58 Noncoding RNAs Do not have translated ORFs Small Not polyadenylated

59 Noncoding RNAs Transfer RNAs Ribosomal RNAs
< 500 Ribosomal RNAs Tandem arrays on several chromosomes Small nucleolar RNAs (snoRNAs) Single genes Small nuclear RNAs (snRNAs) Spliceosomes Multiple dispersed copies Many pseudogenes

60

61

62

63

64

65 Some noncoding sequences are being found to be highly evolutionarily conserved across diverse species over millions of years. Some of them are in “gene deserts”. They must have a function to be maintained. What is it?

66 Repetitive DNA Moderately repeated DNA Simple-sequence DNA
Tandemly repeated rRNA, tRNA and histone genes (gene products needed in high amounts) Large duplicated gene families Mobile DNA Simple-sequence DNA Tandemly repeated short sequences Found in centromeres and telomeres (and others) Used in DNA fingerprinting to identify individuals

67 Segmental duplications
Found especially around centromeres and telomeres Often come from nonhomologous chromosomes Many can come from the same source Tend to be large (10 to 50 kb) Unique to humans?

68

69 Repeat sequences – 50% or more of the genome

70 Mobile DNA Moves within genomes
Most of the moderately repeated DNA sequences found throughout higher eukaryotic genomes L1 LINE is ~5% of human DNA (~50,000 copies) Alu is ~5% of human DNA (>500,000 copies) Some encode enzymes that catalyze movement

71 Transposon derived repeats
Long interspersed elements – LINEs Short interspersed elements - SINEs LTR (long terminal repeat) retrotransposons DNA transposons 45% or more of genome

72

73

74 LINEs LINE1 – active Line2 – inactive Line 3 – inactive
Many truncated inactive sequences

75

76 Exception – Alu elements
Derived from signal recognition particle 7SL Does not share its 3’ end with a LINE Only active SINE in the human genome

77 LTR (long terminal repeat)
Flank viral retrotransposons and retroviruses Repeats contain genes necessary for movement and replication Retroviruses have acquired a CP gene Many fossils

78

79 DNA transposons Terminal inverted repeats Transposase 7 major classes
Transposition doesn’t occur in humans anymore Horizontal transfer

80 Different regions of the genome differ in density of repeats
Most LINEs accumulate in AT rich regions Alu elements accumulate in GC rich regions – why? Promote protein translation under stress?

81 Simple sequence repeats
Tamdem repeats of a particular k-mer 1 – 13 base repeat unit – microsatellite Trinucleotide repeats 14 – 500 repeats – minisatellites “variable numbers of tandem repeats” 3% of genome Used in mapping

82

83


Download ppt "chromosome organization, what about genome organization?"

Similar presentations


Ads by Google