Presentation is loading. Please wait.

Presentation is loading. Please wait.

BB30055: Genes and genomes Major insights from the HGP.

Similar presentations


Presentation on theme: "BB30055: Genes and genomes Major insights from the HGP."— Presentation transcript:

1 BB30055: Genes and genomes Major insights from the HGP

2 What makes us human? How does this…. …..become this?

3 What makes us human? SNPS occur at a mean rate of 1.23% Nature 437, 50-51 (1 September 2005)

4 Major insights from the HGP Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & 875-914. 1)Gene size, content and distribution 2)Proteome content 3)SNP identification 4)Distribution of GC content 5)CpG islands 6)Recombination rates 7)Repeat content

5 1) Gene size

6 More genes: Twice as many as drosophila / C.elegans Uneven gene distribution: Gene-rich and gene-poor regions More paralogs: some gene families have extended the number of paralogs e.g. olfactory gene family has 1000 genes More alternative transcripts: Increased RNA splice variants produced thereby expanding the primary proteins by 5 fold (e.g. neurexin genes) Gene content….

7 Gene-poor regions 82 gene deserts identified ? Large or unidentified genes Uneven gene distribution Gene-rich E.g. MHC on chromosome 6 has 60 genes with GC content of 54% What is the functional significance of these variations?

8 2) Proteome content proteome more complex than invertebrates Protein Domains (sections with identifiable shape/function) Domain arrangements in humans largest total number of domains is 130 largest number of domain types per protein is 9 Mostly identical arrangement of domains

9 Pr oteome more complex than invertebrates……  no huge difference in domain number in humans  BUT, frequency of domain sharing very high in human proteins (structural proteins and proteins involved in signal transduction and immune function) However, only 3 cases where a combination of 3 domain types shared by human & yeast proteins. e.g carbomyl-phosphate synthase (involved in the first 3 steps of de novo pyrimidine biosynthesis) has 7 domain types, which occurs once in human and yeast but twice in drosophila

10 3) SNPs (single nucleotide polymorphisms) Densities vary over regions and chromosomes e.g. HLA region has a high SNP density, reflecting maintenance of diverse haplotypes over many MYears Nature (2001) 15 th Feb Vol 409 special issue; pgs 821-823 & 928  Point mutations in single base pairs  > 1.4million SNPs identified (~ 1 in every 1.9kb length on average)  ~60,000 SNPs lie within exons and untranslated regions (85% of exons lie within 5kb of a SNP)  May or may not affect the ORF (synonymous or non synonymous)  Most SNPs may be regulatory

11 How does one distinguish sequence errors from polymorphisms? sequence errors Each piece of genome sequenced at least 10 times to reduce error rate (0.01%) Polymorphisms Sequence variation between individuals (0.1%) To be defined as a polymorphism, the altered sequence must be present in a significant population Rate of polymorphisms in diploid human genome is about 1 in 500 bp Nature (2001) 15 th Feb Vol 409 special issue; pgs 821-823 & 928

12 – identifying common haplotypes in four populations from different parts of the world. - identifying "tag" SNPs with unique haplotype identities Haplotype( haploid genotype) Haplotype is a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated. Haplotypes are generally shared between populations but their frequency can vary International HapMap Project (www.hapmap.org)

13 Copy number variants (CNVs) challenge SNP concept DNA segment > 1 kb, with a variable copy number compared with a reference genome Variations in the copy number of sequences (>500bp) Caused by insertions/deletions (‘indels’) inversions / translocations NATURE|Vol 447|10 May 2007 pp161

14 4) Distribution of GC content Genome wide average of 41% Huge regional variations exist E.g.distal 48Mb of chromosome 1p-47% but chromosome 13 has only 36% Confirms cytogenetic staining with G-bands (Giemsa) dark G-bands – low GC content (37%) light G-bands – high GC content (45%) Nature (2001) 15 th Feb Vol 409 special issue; pg 876-877

15 5) CpG islands Significance of CpG islands 1)Non-methylated CpG islands associated with the 5’ ends of genes 2)Usually overlap the promoter region 3)Aberrant methylation of CpG islands linked to pathologies like cancer or epigenetic diseases like Rhett’s syndrome http://www.sanger.ac.uk/HGP/cgi.shtml CpG Methyl CpG TpG methylated at C Deamination CpG islands show no methylation CT

16 CpG islands Greatly under-represented in human genome –~28,890 in number (5 times less than expected) ~ 56% of human genes and 47% of the mouse genes have CpG islands Variable density e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/Mb Average is 10.5/Mb Nature (2001) 15 th Feb Vol 409 special issue; pg 877-888

17 6) Recombination rates 2 main observations Recombination rate increases with decreasing arm length Recombination rate suppressed near the centromeres and increases towards the distal 20-35Mb

18 7) Repeat content a)Age distribution b)Comparison with other genomes c)Variation in distribution of repeats d)Distribution by GC content e)Y chromosome Nature (2001) 409: pp 881-891

19 overall decline in interspersed repeat activity in hominid lineage in the past 35-40MYr compared to mouse genome, which shows a younger and more dynamic genome a) Age distribution

20 Repeat content…….  Most interspersed repeats predate eutherian radiation (confirms the slow rate of clearance of nonfunctional sequence from vertebrate genomes)  LINEs and SINEs have extremely long lives  2 major peaks of transposon activity  No DNA transposition in the past 50MYr  LTR retroposons teetering on the brink of extinction a) Age distribution

21 b) Comparison with other genomes  Higher density of transposable elements in euchromatic portion of genome  Higher abundance of ancient transposons  60% of IR made up of LINE1 and Alu repeats  whereas DNA transposons represent only 6%

22 c) Variation in distribution of repeats Some regions show either High repeat density e.g. chromosome Xp11 – a 525kb region shows 89% repeat density Low repeat density e.g. HOX homeobox gene cluster (<2% repeats) (indicative of regulatory elements which have low tolerance for insertions)

23 High GC – gene rich ; High AT – gene poor LINEs abundant in AT-rich regions SINEs lower in AT-rich regions Alu repeats in particular retained in actively transcribed GC rich regions E.g. chromosme 19 has 5% Alus compared to Y chromosome d) Distribution by GC content

24 Unusually young genome (high tolerance to gaining insertions) Mutation rate is 2.1X higher in male germline e) The Y chromosome !

25 Working draft published – Feb 2001 Finished sequence – April 2003 Annotation of genes going on (refer: International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 21 October 2004 (doi: 10.1038/nature03001)

26 References Chapter 9 pp 265-268 HMG 3 by Strachan and Read Chapter 10: pp 339-348 Genetics from genes to genomes by Hartwell et al (3/e) Nature (2001) 409: pp 879-891 Nature (2005) for Chimp genome


Download ppt "BB30055: Genes and genomes Major insights from the HGP."

Similar presentations


Ads by Google