Presentation on theme: "Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt."— Presentation transcript:
Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt
Outline: ● What gene frequencies are for ? ● Consequenses of incorrect frequencies ● Estimation techniques ● Example ● Estimations with ILINK ● Exercise
What gene frequencies are for? ● Consider a pedigree with unknown genotype/phenotype founders... ● This is especially important in desease alleles where correspondance between genotype and phenotype is rarely 1:1 ● In order to estimate the likelyhood function values should be provided for allele frequencies... ? a/ b b/ a a/c
Example of analysis with different gene frequencies ● We will demonstrate this on an example... Given a pedigree with recessive desease and only single affected individual ( =0) 1/1
Example... The table shows that when desease gene freq. is less than 90% the freq. of the marker has more effect on the analysis than the freq. of the desease allele... (because the penetrances at the desease locus tell us more about the untyped individuals desease locus genotypes than we know about their marker locus) 1/1 In practice: if your analysis has drastically different results depending on gene frequencies – significance of the results should be highly questioned...
Wrong frequencies? ● It is difficult to choose correct frequencies: for the population or a pedigree. One of the techniques: equal allele frequencies... Q: What are the effects of using wrong gene frequencies then? A: In general the effects of choosing to use equal gene frequencies was shown to lead to systematic bias in favor of linkage... in other words this tends to give false positives in linkage analysis.
Estimation techniques ● There are published frequencies for many markers based on random samples. But those frequencies may differ strongly between different populations... ● In large pedigrees: treat unrelated individuals as a sample and apply counting methods ● ILINK (LINKAGE package) – is another powerful approach... A B
Estimation techniques (cont.) ● Contrary to a simple counting method ILINK can extract additional data from the pedigree structure about the untyped individuals... Example: ● The estimation step can be repeated to get even more refined results (EM) ● Significance of the approach depends on number of untyped pedigree members
Estimation techniques (cont) ● Take into consideration that in such estimation the recombination factor is active parameter in determining the gene frequency... ● Though the difference in allele frequencies might not be significant, the affect on the lod-score might be notable in some situations... ● A way to balance the computations: compute the frequencies separatly for =0.5 and ` Z( ' log 10 (L( ',p' i )/L( 0.5,p'' i ))
Gene Frequencies Estimation ● Published estimates for the gene frequencies may be used as a first approximation. ● But it is advisable to estimate marker allele frequencies on your own from unrelated individuals taken from the same genetic population as your disease pedigrees. ● Another approach is to use the ILINK program to estimate the allele frequencies from the pedigree data.
Gene Frequencies With ILINK ● In our pedigree there are eight founders, two of whom are untyped. ● Directly estimating the allele frequencies based on the six typed founders produces: ● 4 copies of the 1 allele ● 2 copies of the 2 allele ● 5 copies of the 3 allele ● 1 copy of the 4 allele ● Gene frequency estimates: – 1 (0.3333) 2 (0.1667) 3 (0.4167) 4 (0.0833) Example
Gene Frequencies With ILINK ● However, there is some information in the pedigree about the genotypes of the two untyped founders. ● To take advantage of it,we use the ILINK program. ● Prepare the parameter file for the example. Example (cont.)
● Disease locus is fully penetrant. ● Disease locus is autosomal dominant. ● Gene frequency for the disease allele equal to ● Estimated values used as starting values for gene frequencies. Assumptions Parameter File For The Example
d << No. of loci, risk locus, sex linked, program << Mut locus, mut male, mut fem, hap freq. 1 2 << Affection, No. of alleles E E-05 << Gene Frequencies 1 << No. of liability classes << Penetrances Datafile.dat Parameter File For The Example
d 3 4 << Allele numbers, No. of alleles << Gene Frequencies 0 0 << Sex difference, interference (if 1 or 2) << Recombination values 2 << This locus may have iterated pars << Estimate 3 free gene frequencies Datafile.dat (cont.) Parameter File For The Example
Running ILINK Program d CHROMOSOME ORDER OF LOCI : 1 2 ****************** FINAL VALUES ******************** PROVIDED FOR LOCUS 2 (CHROMOSOME ORDER) ***************************************************** GENE FREQUENCIES : ***************************************************** THETAS: LN(LIKE) = e+02 LOD SCORE = e+00 NUMBER OF ITERATIONS = 6 NUMBER OF FUNCTION EVALUATIONS = 37 PTG = e-06 Final.dat
Gene Frequencies With ILINK ● We did the estimation conditional on there being linkage between marker and disease. ● What happens to the estimates if we assume that the recombination between disease and marker is 50%? ● This involves estimating marker allele frequencies ignoring all information about linkage. Estimation 2
Gene Frequencies With ILINK ● Now we set recombination values to 0.5 and run the ILINK program again. ● The estimates change slightly to the following numbers: – 1 ( ) 2 ( ) 3 ( ) 4 ( ) Estimation 2 (cont.)
Gene Frequencies With ILINK ● Another thing we may think of is jointly estimating recombination fraction with the gene frequencies. ● This can be done by setting the bottom line of the parameter file to be such that all 4 parameters be estimated. ● ILINK results: θ = – 1 ( ) 2 ( ) 3 ( ) 4 ( ) Estimation 3
Gene Frequencies With ILINK Gene Frequency Estimates Under Different Hypotheses Θ = 0.079Θ = 0.500Θ = ΘCounting p p p p
Gene Frequencies With ILINK ● Estimating gene frequencies using different hypotheses leads to slightly different estimates. ● Fortunately, the difference is not huge, though it may have a significant influence on the lod scores in some situations. ● Because most pedigree members were typed in this example, the gene frequencies are not very crucial, whereas in other examples, the results may vary dramatically. Conclusion
The Exercise 1.Go back to Exercise 8 and estimate gene frequencies for the ABO blood group in this same pedigree. 2.Does the lod score change when these frequencies are estimated instead of using population gene frequency estimates? 3.Consider the incomplete penetrance model on this same family. 4.Does encorporating this reduced penetrance affect your estimates of marker allele frequencies? 5.How does the gene frequency information affect the lod score between ABO and the disease?
ABO Blood Group A A A A A AAA AA A B BB B B BAB B B B B B O O A
The Exercise - Solution Estimating Allele Frequencies Estimation 1: We set the recombination fraction between disease and ABO to 0.5 and estimate allele frequencies. Results:A(0.288) B(0.343) O(0.369) Estimation 2: We estimate allele frequencies jointly with the recombination fraction. Results:A (0.277) B(0.341) O (0.382) θ (0.001)
The Exercise - Solution Computing Lod Score 1.Allele frequencies estimated jointly with recombination fraction: Z(θ=0) = Allele frequencies estimated when disease considered to be unlinked to the marker: Z(θ=0) = Treat gene frequency estimates as nuisance parameters.Z(θ=0) = In our case the lod scores are not greatly affected by the changes in gene frequency estimates at ABO.
The Exercise - Solution Incomplete penetrance model ● Define penetrance for each age class. ● For individuals younger than 10, the penetrance is 0.1 ● For individuals older than 60, the penetrance is 0.9 ● For individuals in the middle use formula for the line connecting the points (10,0.1) and (60,0.9) ● Estimating allele frequencies based on this model. Results: θ=0.5 A (0.288) B(0.343) O (0.369) θ=θ’ A (0.277) B(0.341) O (0.382)
The Exercise - Solution Incomplete penetrance model (cont.) ● The estimations are the same as in full penetrance model. ● This is true because the estimation of allele frequency is done independently of the disease phenotypes in pedigree. ● Another reason is that there is little ambiguity as to the disease locus genotypes of the founders.
The Exercise - Solution Computing Lod Score ● The lod scores are now as follows: Z(θ=0) = θ= θ` Z(θ=0) = θ= 0.5 Z(θ=0) = ● The last lod score is again right between the two lod scores computed with fixed gene frequency estimates.