Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.

Similar presentations


Presentation on theme: "Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis."— Presentation transcript:

1 Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

2 Variation in the mutation rate: Between different chromosomes Between regions on chromosomes Neighbouring nucleotides

3 Simple context effects: Hwang and Green (2004) PNAS 101: 13994-14001

4 Cryptic Variation: Remote context: AGTCGGTTACCGTGACGTTGAACGTGT

5 Cryptic Variation: Remote context: AGTCGGTTACCGTGACGTTGAACGTGT Degenerate context: AGTCGGTTACCGTGYSRGYGAACGTGT

6 Cryptic Variation: Remote context: AGTCGGTTACCGTGACGTTGAACGTGT Degenerate context: AGTCGGTTACCGTGYSRGYGAACGTGT No context / Complex context

7 Our approach to the problem Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp. Human Chimp

8 Our approach to the problem Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp. Human Chimp Do we see more coincident SNPs than expected by chance?

9 The method Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.

10 The method Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. Extract all chimp SNPs from dbSNP with 50bp either side of SNP.

11 The method Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. Extract all chimp SNPs from dbSNP with 50bp either side of SNP. BLAST chimp SNPs against human database.

12 The method Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. Extract all chimp SNPs from dbSNP with 50bp either side of SNP. BLAST chimp SNPs against human database. Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position.

13 The method Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. Extract all chimp SNPs from dbSNP with 50bp either side of SNP. BLAST chimp SNPs against human database. Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position. Repeating both including and excluding CpG effects.

14 Results ~1.5 million chimp SNPs. ~310,000 81bp alignments containing a human and chimp SNP.

15 Results ~1.5 million chimp SNPs. ~310,000 81bp alignments containing a human and chimp SNP. Observe the number of coincident SNPs. Calculate the expected number, taking into account the effects of neighbouring nucleotides.

16 Results ObsExpRatio All1157165921.76 (1.72,1.79) No-CpG502825331.98 (1.93,2.04)

17 Results C/TG/AC/AG/TC/GA/T C/T 1.911.041.191.210.96 G/A 1.831.241.021.141.40 C/A 1.231.084.811.281.39 G/T 1.151.384.951.270.77 C/G 1.091.141.241.402.79 A/T 0.941.061.790.9915.43

18 Alternative Explanations Bias in the Method Selection Ancestral Polymorphism Paralogous SNPs

19 Alternative Explanations Bias in the Method Selection Ancestral Polymorphism Paralogous SNPs

20 Methodological Bias Simulated data with same density of human and chimp SNPs as dbSNP under different divergence and mutation patterns. Method worked well under realistic conditions.

21 Methodological Bias DivObsExpRatio95% CI 08398121.033(0.963,1.103) 1241923161.040(1.003,1.086) 26816850.995(0.920,1.069) DivObsExpRatio95% CI 04014280.936(0.844,1.028) 1118212280.963(0.908,1.018) 23744000.935(0.840,1.030) All sites (H&G): Non CpG sites (H&G):

22 Methodological Bias DivObsExpRatio95% CI 08398121.033(0.963,1.103) 1241923161.040(1.003,1.086) 26816850.995(0.920,1.069) DivObsExpRatio95% CI 04014280.936(0.844,1.028) 1118212280.963(0.908,1.018) 23744000.935(0.840,1.030) All sites (H&G): Non CpG sites (H&G):

23 Alternative Explanations Bias in the method Selection Ancestral Polymorphism Paralogous SNPs

24 Selection Areas of low SNP density result in clustering: Human Chimp

25 Selection Areas of low SNP density result in clustering: Human Chimp Apparent excess of coincident SNPs

26 Selection No clustering:

27 Alternative Explanations Bias in the method Selection Ancestral Polymorphism Paralogous SNPs

28 Ancestral Polymorphism SNP inherited from common ancestor of chimp and human: T T T A T T T A T A T A Common Ancestor Human Chimp

29 Ancestral Polymorphism SNP inherited from common ancestor of chimp and human: T T T A T T T A T A T A Common Ancestor Human Chimp Increase in coincident SNPs

30 Ancestral Polymorphism Expect observed/expected ratio to be same for all transitions: C/TG/AC/AG/TC/GA/T C/T1.911.041.191.210.96 G/A1.831.241.021.141.40 C/A1.231.084.811.281.39 G/T1.151.384.951.270.77 C/G1.091.141.241.402.79 A/T0.941.061.790.9915.43

31 Ancestral Polymorphism Repeated initial analysis with macaque data. Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms.

32 Ancestral Polymorphism Repeated initial analysis with macaque data. Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms. ObsExpRatio All77471.64 (1.27,2.00) No-CpG34231.51 (1.001,2.02)

33 Alternative Explanations Bias in the method Selection Ancestral Polymorphism Paralogous SNPs

34 Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions.

35 Paralogous SNPs Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions. Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy.

36 Paralogous SNPs Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions. Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy. AGCTGCACGT Y CGGCATCCAA SNP AGCTGCACGT T CGGCATCCAA Chromosome 1 AGCTGCACGT A CGGCATCCAA Chromosome 7 Artifactual SNP

37 Paralogous SNPs AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT A CGGCATCCAA

38 Paralogous SNPs AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT A CGGCATCCAA 3.6% of coincident SNPs are possibly a consequence of paralogous sequences

39 Alternative Explanations Bias in the method Selection Ancestral Polymorphism Paralogous SNPs Cryptic variation in the mutation rate

40 Context Analysis 4517 sequences containing non-CpG coincident SNPs flanked by 200bp. Tabulate triplet frequencies at each position in surrounding sequences. Test whether the proportions of triplets we observe at each position significantly different from the proportions in the sequences as a whole.

41 Context Analysis Coincident SNP in central position:

42 Context Analysis Coincident SNP in central position: No obvious context surrounding coincident SNPs

43 Genomic Distribution Tallied the number of coincident SNPs per MB: - 3.91 coincident SNPs per MB. - 1.68 non-CpG coincident SNPs per MB.

44 Genomic Distribution Tallied the number of coincident SNPs per MB: - 3.91 coincident SNPs per MB. - 1.68 non-CpG coincident SNPs per MB. If randomly distributed expect Poisson distribution and  =  2 = 3.91

45 Genomic Distribution Tallied the number of coincident SNPs per MB: - 3.91 coincident SNPs per MB. - 1.68 non-CpG coincident SNPs per MB. If randomly distributed expect Poisson distribution and  =  2 = 3.91  2 = 13.27 (p<0.001) and so sampling variance explains approximately 30% of total variance.

46 Genomic Distribution Featurerr2r2 p SNP density0.2560.0655<0.001** Distance to Telomere -0.0220.00040.226 Distance to Centromere 0.0110.00010.565 Recombination Rate 0.1070.0114<0.001** Nucleosome Association 0.0040.00000.832 Gene Density-0.0220.00040.230 GC content-0.0060.00000.741

47 Genomic Distribution SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone.

48 Genomic Distribution SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone. Recombination rate positively correlated with SNP density (r = 0.242, p<0.001). Partial correlation controlling for SNP density: r = 0.048, p=0.011**.

49 Genomic Distribution SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone. Recombination rate positively correlated with SNP density (r = 0.242, p<0.001). Partial correlation controlling for SNP density: r = 0.048, p=0.011**. SNP densities explain 6.5% of the variance, recombination rate explains 0.2% of the variance of coincident SNPs.

50 Genomic Distribution Featurerr2r2 p Coincident SNP Density 0.2560.0655<0.001** Distance to Telomere -0.1710.0292<0.001** Distance to Centromere -0.0470.00220.012** Recombination Rate 0.2340.0548<0.001** Nucleosome Association 0.1870.0350<0.001** Gene Density0.0640.00410.001** GC content0.1840.0339<0.001**

51 Quantification Use Log-normal distribution of relative mutation rates due to cryptic variation. Model the number of coincident SNPs under the effects of cryptic variation. Incorporate effects of divergence.

52 Quantification Use Log-normal distribution of relative mutation rates due to cryptic variation. Model the number of coincident SNPs under the effects of cryptic variation. Incorporate effects of divergence. What level of variation in the log-normal distribution explains our results?

53 Log-normal model Fastest 5% of sites mutate ~16.4 times faster than slowest 5% of sites.

54 Summary Cryptic variation in the mutation rate.

55 Summary Cryptic variation in the mutation rate. No obvious context surrounding coincident SNPs.

56 Summary Cryptic variation in the mutation rate. No obvious context surrounding coincident SNPs. Variation is truly cryptic.

57 Summary Cryptic variation in the mutation rate. No obvious context surrounding coincident SNPs. Variation is truly cryptic. Genomic distribution of coincident SNPs is over-dispersed

58 Summary Cryptic variation in the mutation rate. No obvious context surrounding coincident SNPs. Variation is truly cryptic. Genomic distribution of coincident SNPs is over-dispersed Variation in mutation rate is substantial.

59 Acknowledgments Manolis Ladoukakis BBSRC People: Adam Eyre-Walker


Download ppt "Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis."

Similar presentations


Ads by Google