Presentation is loading. Please wait.

Presentation is loading. Please wait.

Calculation of IBD probabilities David Evans University of Oxford Wellcome Trust Centre for Human Genetics.

Similar presentations


Presentation on theme: "Calculation of IBD probabilities David Evans University of Oxford Wellcome Trust Centre for Human Genetics."— Presentation transcript:

1 Calculation of IBD probabilities David Evans University of Oxford Wellcome Trust Centre for Human Genetics

2 This Session … Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm (MERLIN) Single locus probabilities Hidden Markov Model => Multipoint IBD Other ways of calculating IBD status Elston-Stewart Algorithm MCMC approaches MERLIN Practical Example IBD determination Information content mapping SNPs vs micro-satellite markers?

3 14 2413 31 Identical by Descent 21 2311 31 Identical by state only Two alleles are IBD if they are descended from the same ancestral allele Identity By Descent (IBD)

4 Consider a mating between mother AB x father CD: IBD 0 : 1 : 2 = 25% : 50% : 25% Sib 2 Sib1 ACADBCBD AC2110 AD1201 BC1021 BD0112 Example: IBD in Siblings

5 1/23/4 4/41/42/41/33/4 4/4 1/4 Affected relatives not only share disease alleles IBD, but also tend to share marker alleles close to the disease locus IBD more often than chance IBD sharing forms the basis of non- parametric linkage statistics Why is IBD Sharing Important?

6 Crossing over between homologous chromosomes

7 Cosegregation => Linkage A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 Non-recombinant Parental genotypes (many, 1 – θ) A1A1 A2A2 Q1Q1 Q2Q2 Recombinant genotypes (few, θ ) Parental genotype Alleles close together on the same chromosome tend to stay together in meiosis; therefore they tend be co-transmitted.

8 Segregating Chromosomes MARKER DISEASE GENE

9 Marker Shared Among Affecteds Genotypes for a marker with alleles {1,2,3,4} 1/23/4 4/41/42/41/33/4 4/4 1/4

10 Linkage between QTL and marker Marker QTL IBD 0IBD 1IBD 2

11 NO Linkage between QTL and marker Marker

12 IBD can be trivial… 1 11 1 / 22 / 2 / 2 / IBD=0

13 Two Other Simple Cases… 1 11 1 / 2 / 2 / 11 / 112 / 2 / IBD=2 22 / 22 /

14 A little more complicated… 12 / IBD=1 (50% chance) 22 / 12 / 12 / IBD=2 (50% chance)

15 And even more complicated… 11 / IBD=? 11 /

16 Bayes Theorem for IBD Probabilities         j=0,1,2 jIBDGPj P i GPi P GP i GPi P GP Gi Gi P )|()( )|()( )( )|()( )( ), P(IBD )|(

17 Sib 1Sib 2 P(observing genotypes | k alleles IBD) k=0k=1k=2 A1A1A1A1 A1A1A1A1 p14p14 p13p13 p12p12 A1A1A1A1 A1A2A1A2 2p 1 3 p 2 p12p2p12p2 0 A1A1A1A1 A2A2A2A2 p12p22p12p22 00 A1A2A1A2 A1A1A1A1 p12p2p12p2 0 A1A2A1A2 A1A2A1A2 4p 1 2 p 2 2 p1p2p1p2 2p 1 p 2 A1A2A1A2 A2A2A2A2 2p 1 p 2 3 p1p22p1p22 0 A2A2A2A2 A1A1A1A1 p12p22p12p22 00 A2A2A2A2 A1A2A1A2 p1p22p1p22 0 A2A2A2A2 A2A2A2A2 p24p24 p23p23 p22p22 P(Genotype | IBD State)

18 Worked Example 11 / 11 / )|2( )|1( )|0( )( )2|( )1|( )0|( 5.0 1         GIBDP G P G P GP GP GP GP p

19 Worked Example 11 / 11 /

20 For ANY PEDIGREE the inheritance pattern at any point in the genome can be completely described by a binary inheritance vector of length 2n: v(x) = (p 1, m 1, p 2, m 2, …,p n,m n ) whose coordinates describe the outcome of the paternal and maternal meioses giving rise to the n non-founders in the pedigree p i (m i ) is 0 if the grandpaternal allele transmitted p i (m i ) is 1 if the grandmaternal allele is transmitted ac / bd / ab / cd / v(x) = [0,0,1,1]

21 Inheritance Vector Inheritance vectorPriorPosterior ------------------------------------------------------------------- 00001/161/8 00011/161/8 00101/160 00111/160 01001/161/8 01011/161/8 01101/160 01111/160 10001/161/8 10011/161/8 10101/160 10111/160 11001/161/8 11011/161/8 11101/160 11111/160 In practice, it is not possible to determine the true inheritance vector at every point in the genome, rather we represent partial information as a probability distribution of the possible inheritance vectors ab p1p1 m2m2 p2p2 m1m1 bbac acab 1 2 3 5 4

22 Computer Representation At each marker location ℓ Define inheritance vector v ℓ Meiotic outcomes specified in index bit Likelihood for each gene flow pattern Conditional on observed genotypes at location ℓ 2 2n elements !!! 0000000100100011010001010110011110001001101010111100110111101111 LLLLLLLLLLLLLLLL

23 Abecasis et al (2002) Nat Genet 30:97-101

24 Multipoint IBD IBD status may not be able to be ascertained with certainty because e.g. the mating is not informative, parental information is not available IBD information at uninformative loci can be made more precise by examining nearby linked loci

25 ac / bd / 11 / 12 / ab / 11 / cd / 12 / Multipoint IBD IBD = 0 IBD = 0 or IBD =1?

26 Complexity of the Problem in Larger Pedigrees For each person 2n meioses in pedigree with n non-founders Each meiosis has 2 possible outcomes Therefore 2 2n possibilities for each locus For each genetic locus One location for each of m genetic markers Distinct, non-independent meiotic outcomes Up to 4 nm distinct outcomes!!!

27 0000 0001 0010 1111 234m = 10… … Marker Inheritance vector (2 2xn ) m = (2 2 x 2 ) 10 =~ 10 12 possible paths !!! Example: Sib-pair Genotyped at 10 Markers 1 P(G | 0000)(1 – θ) 4

28 0000 0001 0010 1111 234m = 10… … Marker Inheritance vector (L[0000] + L[0101] + L[1010] + L[1111] ) / L[ALL] P(IBD) = 2 at Marker Three 1 (2) (1) (2) IBD

29 0000 0001 0010 1111 234m = 10… … Marker Inheritance vector P(IBD) = 2 at arbitrary position on the chromosome 1 (L[0000] + L[0101] + L[1010] + L[1111] ) / L[ALL]

30 Lander-Green Algorithm The inheritance vector at a locus is conditionally independent of the inheritance vectors at all preceding loci given the inheritance vector at the immediately preceding locus (“Hidden Markov chain”) The conditional probability of an inheritance vector v i+1 at locus i+1, given the inheritance vector v i at locus i is θ i j (1-θ i ) 2n-j where θ is the recombination fraction and j is the number of changes in elements of the inheritance vector [0000] Conditional probability = (1 – θ) 3 θ Locus 2Locus 1 [0001] Example:

31 0000 0001 0010 1111 234m = 10… … Marker Inheritance vector M(2 2n ) 2 = 10 x 16 2 = 2560 calculations Lander-Green Algorithm 1

32 0000 0001 0010 1111 123m… … Total Likelihood = 1’Q 1 T 1 Q 2 T 2 …T m-1 Q m 1 P(G|[0000]) … 0 Q i = 0 P(G|[1111]) P(G|[0001]) 0 0 0 0 0 00 0 0 0 2 2n x 2 2n diagonal matrix of single locus probabilities at locus i (1-θ) 4 … θ4θ4 T i = (1-θ) 3 θ (1-θ) 4 … θ4θ4 … … … …(1-θ)θ 3 … (1-θ) 3 θ (1-θ)θ 3 2 2n x 2 2n matrix of transitional probabilities between locus i and locus i+1 ~m(2 2n ) 2 operations = 2560 for this case !!!

33 Further speedups… Trees summarize redundant information Portions of vector that are repeated Portions of vector that are constant or zero Speeding up convolution Use sparse-matrix by vector multiplication Use symmetries in divide and conquer algorithm (Idury & Elston, 1997)

34 Lander-Green Algorithm Summary Factorize likelihood by marker Complexity  m·e n Strengths Large number of markers Relatively small pedigrees

35 Elston-Stewart Algorithm Factorize likelihood by individual Complexity  n·e m Small number of markers Large pedigrees With little inbreeding VITESSE, FASTLINK etc

36 Other methods Number of MCMC methods proposed ~Linear on # markers ~Linear on # people Hard to guarantee convergence on very large datasets Many widely separated local minima E.g. SIMWALK

37 MERLIN-- Multipoint Engine for Rapid Likelihood Inference

38 Capabilities Linkage Analysis NPL and K&C LOD Variance Components Haplotypes Most likely Sampling All IBD and info content Error Detection Most SNP typing errors are Mendelian consistent Recombination No. of recombinants per family per interval can be controlled Simulation

39 MERLIN Website Reference FAQ Source Binaries Tutorial Linkage Haplotyping Simulation Error detection IBD calculation www.sph.umich.edu/csg/abecasis/Merlin

40 Input Files Pedigree File Relationships Genotype data Phenotype data Data File Describes contents of pedigree file Map File Records location of genetic markers

41 Example Pedigree File 1 1 0 0 1 1 x 3 3 x x 1 2 0 0 2 1 x 4 4 x x 1 3 0 0 1 1 x 1 2 x x 1 4 1 2 2 1 x 4 3 x x 1 5 3 4 2 2 1.234 1 3 2 2 1 6 3 4 1 2 4.321 2 4 2 2 Encodes family relationships, marker and phenotype information

42 Data File Field Codes CodeDescription MMarker Genotype AAffection Status. TQuantitative Trait. CCovariate. ZZygosity. S[n]Skip n columns.

43 Example Data File T some_trait_of_interest M some_marker M another_marker Provides information necessary to decode pedigree file

44 Example Map File CHROMOSOME MARKER POSITION 2 D2S160 160.0 2 D2S308 165.0 … Indicates location of individual markers, necessary to derive recombination fractions between them

45 Worked Example 11 / 11 / 9 4 )|2( 9 4 )|1( 9 1 )|0( 5.0 1     GIBDP G P G P p merlin –d example.dat –p example.ped –m example.map --ibd

46 Application: Information Content Mapping Information content: Provides a measure of how well a marker set approaches the goal of completely determining the inheritance outcome Based on concept of entropy E = -ΣP i log 2 P i where P i is probability of the ith outcome I E (x) = 1 – E(x)/E 0 Always lies between 0 and 1 Does not depend on test for linkage Scales linearly with power

47 Application: Information Content Mapping Simulations ABI (1 micro-satellite per 10cM) deCODE (1 microsatellite per 3cM) Illumina (1 SNP per 0.5cM) Affymetrix (1 SNP per 0.2 cM) Which panel performs best in terms of extracting marker information? merlin –d file.dat –p file.ped –m file.map --information

48 SNPs + parents microsat + parents SNP microsat 0.2 cM 3 cM 0.5 cM 10 cM Densities 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0102030405060708090100 Position (cM) Information Content SNPs vs Microsatellites


Download ppt "Calculation of IBD probabilities David Evans University of Oxford Wellcome Trust Centre for Human Genetics."

Similar presentations


Ads by Google