Presentation on theme: "Major insights from the HGP on Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & 875-914. 1)Gene content 2)Proteome content 3)SNP identification."— Presentation transcript:
Major insights from the HGP on Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & 875-914. 1)Gene content 2)Proteome content 3)SNP identification 4)Distribution of GC content 5)CpG islands 6)Recombination rates 7)Repeat content
1) Gene content 30 - 40,000 protein-coding genes estimated based on known genes and predictions IHGSCCelera definite genes 24,500 26,383 possible genes 500012,000 Genes encode either protein or noncoding RNAs rRNA, tRNA, snRNA, snoRNA Nature (2001) 15 th Feb Vol 409 special issue; pg 814-816 and 860-914.
More genes: Twice as many as drosophila / C.elegans Uneven gene distribution: Gene-rich and gene- poor regions More paralogs: some gene families have extended the number of paralogs e.g. olfactory gene family has 1000 genes More alternative transcripts: Increased RNA splice variants produced thereby expanding the primary proteins by 5 fold (e.g. neurexin genes) Nature (2001) 409: pp 892 Gene content….
Gene-rich E.g. MHC on chromosome 6 has 60 genes with a GC content of 54% Gene-poor regions 82 gene deserts identified ? Large or unidentified genes What is the functional significance of these variations? Uneven gene distribution Genetics by Hartwell: pp 341-347 Gene content
2) Proteome content proteome more complex than invertebrates Nature (2001) 15 th Feb Vol 409 special issue; pg 847 Protein Domains (sections with identifiable shape/function) Domain arrangements in humans largest total number of domains is 130 largest number of domain types per protein is 9 Mostly identical arrangement of domains AABBCBCCCC Protein X
proteome more complex than invertebrates…… Nature (2001) 15 th Feb Vol 409 special issue; pg 847 no huge difference in domain number in humans BUT, frequency of domain sharing very high in human proteins (structural proteins and proteins involved in signal transduction and immune function) However, only 3 cases where a combination of 3 domain types shared by human & yeast proteins. e.g carbomyl-phosphate synthase (involved in the first 3 steps of de novo pyrimidine biosynthesis) has 7 domain types, which occurs once in human and yeast but twice in drosophila 2) Proteome content….
3) SNPs (single nucleotide polymorphisms) More than 1.4million SNPs identified One every 1.9kb length on average Densities vary over regions and chromosomes e.g. HLA region has a high SNP density, reflecting maintenance of diverse haplotypes over many millions of years Nature (2001) 15 th Feb Vol 409 special issue; pgs 821-823 & 928
How does one distinguish sequence errors from polymorphisms? sequence errors Each piece of genome sequenced at least 10 times to reduce error rate (0.01%) Polymorphisms Sequence variation between individuals is 0.1% To be defined as a polymorphism, the altered sequence must be present in a significant population Rate of polymorphism in diploid human genome is about 1 in 500 bp Nature (2001) 15 th Feb Vol 409 special issue; pgs 821-823 & 928
3) SNPs…… Sites that result from point mutations in individual base pairs biallelic ~60,000 SNPs lie within exons and untranslated regions (85% of exons lie within 5kb of a SNP) May or may not affect the ORF Most SNPs may be regulatory Nature (2001) 15 th Feb Vol 409 special issue; pg 821 & 928 http://www.genetics.gsk.com/kids/medicine01.htm
4) Distribution of GC content Genome wide average of 41% Huge regional variations exist E.g.distal 48Mb of chromosome 1p-47% but chromosome 13 has only 36% Confirms cytogenetic staining with G-bands (Giemsa) dark G-bands – low GC content (37%) light G-bands – high GC content (45%) Nature (2001) 15 th Feb Vol 409 special issue; pg 876-877
5) CpG islands Significance of CpG islands 1)Non-methylated CpG islands associated with the 5’ ends of genes 2)Aberrant methylation of CpG islands is one mechanism of inactivating tumor suppressor genes (TSGs) in neoplasia http://www.sanger.ac.uk/HGP/cgi.shtml CpG Methyl CpG TpG methylated at C Deamination CpG islands show no methylation
CpG islands Greatly under-represented in human genome ~28,890 in number Variable density e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/Mb Average is 10.5/Mb Nature (2001) 15 th Feb Vol 409 special issue; pg 877-888
6) Recombination rates 2 main observations Recombination rate increases with decreasing arm length Recombination rate suppressed near the centromeres and increases towards the distal 20-35Mb
7) Repeat content a)Age distribution b)Comparison with other genomes c)Variation in distribution of repeats d)Distribution by GC content e)Y chromosome Nature (2001) 409: pp 881-891