Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Similar presentations


Presentation on theme: "Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03."— Presentation transcript:

1 Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03

2 What is Linkage Disequilibrium? When the occurrence of pairs of specific alleles at different loci on the same haplotype is not independent, the deviation form independence is termed linkage disequilibrium In general, linkage disequilibrium is usually seen as an association between one specific allele at one locus and another specific allele at a second locus

3 LinkageDisequilibrium Coefficient Definitions Linkage Disequilibrium Coefficient Definitions Marker 2 Marker1 Allele1 (probability = p2) Allele2 (probability = 1-p2) Allele1 (probability = p1) X1 p1*p2+D11 X2 p1*(1-p2)-D11 Allele2 (probability = 1-p1) X3 (1-p1)*p2-D11 X4 (1-p1)*(1-p2)+D11 Xi -number of observations in cell i (X1+X2+X3+X4)=n D11 -coefficient of gametic linkage disequilibrium between allele 1 at locus 1 and allele 1 at locus 2 D11=E[X1X4-X2X3|n=1]

4 Population-based sampling and the EH program We wish to test the absence of disequilibrium between allele A at locus 1 and allele B at locus 2 (D AB =0) The sample of individuals we have consist of genotyping data with no possibility to fully distinguish all of the haplotypes in each individual

5 Table of all possible two-locus genotypes Locus2 AAAaaa BBk1k2k3 Bbk4k5k6 bbk7k8k9 In cell 5 there can be either of two phases, AB/ab or Ab/aB

6 Analysis of likelihood We maximize the log likelihood of the data observed: For cell 1: p 1 =[P(A B)] For cell 4: p 4 =2P(A B)P(A b) For cell 5: p 5 =P(A B/a b)+P(A b/a B) = =2P(A B)P(a b)+2P(A b)P(a B) 2 2

7 Table of probabilities in each cell Locus 1 Locus 2 AAAaaa BBp(A B) 2p(A B)p(a B) P(a B) Bb 2p(A B)p(A b)2P(A B)P(a b)+ +2P(A b)P(a B) 2p(a B)p(a b) bbP(A b) 2p(A b)p(a b) P(a b) 2 2 2 2

8 Analysis of likelihood We maximize the likelihood above over the possible haplotype frequencies (p(A), p(B) and D AB. This likelihood is then compared with the maximum likelihood when D AB is set equal to 0 (absence of linkage disequilibrium)

9 Example Locus 1 Locus 2 AAAaaa BB K1=10K2 = 10K3=3 Bb K4=15K5=50K6=13 bb K7=5K8=13K9=10 Aa B4529 b3846 Aa B0.280.18 b0.240.29 *When censoring k5 all the haplotypes can be uniquely determined

10 Example cont. P(A) = 0.28+0.24 = 0.525 P(B) = 0.28+0.18 = 0.468 D AB = p(A B) –p(A)p(B) = 0.28 – 0.525*0.468 = 0.0387 * Biased example due to the elimination of the 50 observations in k 5.

11 EH program input file format EH = estimated haplotype. –Input file EH.dat Line 1: Number of alleles at each of the two loci Line 2: k1 k4 k7 Line 3: k2 k5 k8 Line 4: k3 k6 k9

12 EH program output file Output – Estimates of Gene Frequencies (including k 5) Allele Locus 12 10.5150.484 20.4800.519 # of typed Individuals: 129

13 EH program output file Allele at locus 1 Allele at locus 2 Haplotype frequency Independent w/association 110.2480.328 120.2680.188 210.2320.153 220.2520.332

14 Chi square test dfLn(L)Chi-square H0: No association2-252.680.00 H1: Allelic association allowed 3-248.238.89 The difference between the 2 chi-square is 8.89 The P-value associated with chi-square (with 1 df) is 0.002873 It is clear the k5 contributes siginificant information

15 Haplotype frequencies Without k5With k5 HaplotypeIndepe ndent associateIndepe ndent associate A B 0.2460.2840.2470.327 A b 0.2790.240.2670.187 a B 0.2220.1830.2320.152 a B 0.2520.2910.2510.331 p(A) 0.5250.515 p(B) 0.4680.48 D ab 0.0380.079 Summary

16 Multiallelic genotype information in EH program Locus 2 Locus 11/11/22/21/32/33/3 1/1a1b1c1d1e1f1 1/2a2b2c2d2e2f2 2/2a3b3c3d3e3f3 1/3a4b4c4d4e4f4 2/3a5b5c5d5e5f5 3/3a6b6c6d6e6f6 Line 1: Number of alleles at each locus Subsequent lines :

17 Multilocus genotype data Locus 3 Locus 1Locus 21/11/22/2 1/1 a1b1c1 1/2a2b2c2 2/2a3b3c3 1/21/1a4b4c4 1/2a5b5c5 2/2a6b6c6 2/21/1a7b7c7 1/2a8b8c8 2/2a9b9c9

18 Ex. 23 Full data Solution file:Solution file: Censored data solution file.Censored data solution file Censored data 1/1 haplotype data Locus 2 Locus 1 1/11/21/31/42/22/32/43/33/44/4 1/110564123120 1/26333121121 2/2129811325103 1/31221111042 2/30228229368 3/3864103385913

19 Haplotypes from censored genotype data Allele at locus 2 Allele at locus 11234 142141312 258251631 337262963 Allele at locus 2 Allele at locus 1 1234 10.110.0380.0350.032 20.1580.0680.0440.085 30.100.070.0790.172

20 תודה רבה!!!


Download ppt "Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03."

Similar presentations


Ads by Google