Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

A. Novelletto, F. De Rango Dept. Cell Biology, University of Calabria GENOTYPING CONCORDANT / DISCORDANT COUSIN PAIRS.
Mapping genes with LOD score method
Genetic Heterogeneity Taken from: Advanced Topics in Linkage Analysis. Ch. 27 Presented by: Natalie Aizenberg Assaf Chen.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Welcome to Week 3!. Today’s Agenda: 1. Reviewing Pedigrees (Part 1) 2. Practicing with Chi Square Analysis (Part 2) 3. Thinking About Genetics and Agriculture.
MMLS-C By : Laurence Bisht References : The Power to Detect Linkage in Complex Diseases Means of Simple LOD-score Analyses. By David A.,Paula Abreu and.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Simulation/theory With modest marker spacing in a human study, LOD of 3 is 9% likely to be a false positive.
Parametric and Non-Parametric analysis of complex diseases Lecture #8
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Biology 102 Patterns of Inheritance (cont.). Lecture outline Inheritance of multiple traits Inheritance of multiple traits Same chromosome (linkage) Same.
General Explanation There are 2 input files –The locus file describes the loci being analyzed and parameters for the different analyzing programs. –The.
 Genes are found on the X AND Y chromosomes.  Genes that are carried on the sex chromosomes are called sex linked genes.
CHAPTER 9 Patterns of Inheritance. Genetic testing –Allows expectant parents to test for possibilities in their unborn child. –Includes amniocentesis.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
CHAPTER 9 Patterns of Inheritance
A gene is composed of strings of bases (A,G, C, T) held together by a sugar phosphate backbone. Reminder - nucleotides are the building blocks.
Non-Mendelian Genetics
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
INTRODUCTION TO ASSOCIATION MAPPING
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
7.4 Human Genetics and Pedigrees Bell Work. 7.4 Human Genetics and Pedigrees Bell Work.
Understanding Inheritance Main Idea: The interactions among alleles, genes, and the environment determine an organism’s traits.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Power Calculations for GWAS
7.1 Chromosome and Phenotype
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Chapter Seven: Extending Mendelian Genetics
Two copies of each autosomal gene affect phenotype.
Genetics.
Mendelian Inheritance of Human Traits
Chapter 12 Table of Contents Section 1 Chromosomes and Inheritance
Harald H.H. Göring, Joseph D. Terwilliger 
Linkage, Recombination, and Eukaryotic Gene Mapping
Lab: Pedigree Analysis
Error Checking for Linkage Analyses
Jeopardy! Created by Educational Technology Network
Chapter 12 Table of Contents Section 1 Chromosomes and Inheritance
Using Punnett Squares A Punnett square is a model that predicts the likely outcomes of a genetic cross. A Punnett square shows all of the genotypes that.
Balanced Translocation detected by FISH
Unit 8: Mendelian Genetics
Linkage Analysis Problems
Multipoint Approximations of Identity-by-Descent Probabilities for Accurate Linkage Analysis of Distantly Related Individuals  Cornelis A. Albers, Jim.
Mendelian Genetics.
Genetic linkage analysis
7.1 Chromosomes and Phenotype
Presentation transcript:

Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt

Outline: ● What gene frequencies are for ? ● Consequenses of incorrect frequencies ● Estimation techniques ● Example ● Estimations with ILINK ● Exercise

What gene frequencies are for? ● Consider a pedigree with unknown genotype/phenotype founders... ● This is especially important in desease alleles where correspondance between genotype and phenotype is rarely 1:1 ● In order to estimate the likelyhood function values should be provided for allele frequencies... ? a/ b b/ a a/c

Example of analysis with different gene frequencies ● We will demonstrate this on an example... Given a pedigree with recessive desease and only single affected individual (  =0) 1/1

Example... The table shows that when desease gene freq. is less than 90% the freq. of the marker has more effect on the analysis than the freq. of the desease allele... (because the penetrances at the desease locus tell us more about the untyped individuals desease locus genotypes than we know about their marker locus) 1/1 In practice: if your analysis has drastically different results depending on gene frequencies – significance of the results should be highly questioned...

Wrong frequencies? ● It is difficult to choose correct frequencies: for the population or a pedigree. One of the techniques: equal allele frequencies... Q: What are the effects of using wrong gene frequencies then? A: In general the effects of choosing to use equal gene frequencies was shown to lead to systematic bias in favor of linkage... in other words this tends to give false positives in linkage analysis.

Estimation techniques ● There are published frequencies for many markers based on random samples. But those frequencies may differ strongly between different populations... ● In large pedigrees: treat unrelated individuals as a sample and apply counting methods ● ILINK (LINKAGE package) – is another powerful approach... A B

Estimation techniques (cont.) ● Contrary to a simple counting method ILINK can extract additional data from the pedigree structure about the untyped individuals... Example: ● The estimation step can be repeated to get even more refined results (EM) ● Significance of the approach depends on number of untyped pedigree members

Estimation techniques (cont) ● Take into consideration that in such estimation the recombination factor is active parameter in determining the gene frequency... ● Though the difference in allele frequencies might not be significant, the affect on the lod-score might be notable in some situations... ● A way to balance the computations: compute the frequencies separatly for  =0.5 and  ` Z(  '  log 10 (L(  ',p' i )/L(  0.5,p'' i ))

Gene Frequencies Estimation ● Published estimates for the gene frequencies may be used as a first approximation. ● But it is advisable to estimate marker allele frequencies on your own from unrelated individuals taken from the same genetic population as your disease pedigrees. ● Another approach is to use the ILINK program to estimate the allele frequencies from the pedigree data.

Gene Frequencies With ILINK ● In our pedigree there are eight founders, two of whom are untyped. ● Directly estimating the allele frequencies based on the six typed founders produces: ● 4 copies of the 1 allele ● 2 copies of the 2 allele ● 5 copies of the 3 allele ● 1 copy of the 4 allele ● Gene frequency estimates: – 1 (0.3333) 2 (0.1667) 3 (0.4167) 4 (0.0833) Example

Gene Frequencies With ILINK ● However, there is some information in the pedigree about the genotypes of the two untyped founders. ● To take advantage of it,we use the ILINK program. ● Prepare the parameter file for the example. Example (cont.)

● Disease locus is fully penetrant. ● Disease locus is autosomal dominant. ● Gene frequency for the disease allele equal to ● Estimated values used as starting values for gene frequencies. Assumptions Parameter File For The Example

d << No. of loci, risk locus, sex linked, program << Mut locus, mut male, mut fem, hap freq. 1 2 << Affection, No. of alleles E E-05 << Gene Frequencies 1 << No. of liability classes << Penetrances Datafile.dat Parameter File For The Example

d 3 4 << Allele numbers, No. of alleles << Gene Frequencies 0 0 << Sex difference, interference (if 1 or 2) << Recombination values 2 << This locus may have iterated pars << Estimate 3 free gene frequencies Datafile.dat (cont.) Parameter File For The Example

Running ILINK Program d CHROMOSOME ORDER OF LOCI : 1 2 ****************** FINAL VALUES ******************** PROVIDED FOR LOCUS 2 (CHROMOSOME ORDER) ***************************************************** GENE FREQUENCIES : ***************************************************** THETAS: LN(LIKE) = e+02 LOD SCORE = e+00 NUMBER OF ITERATIONS = 6 NUMBER OF FUNCTION EVALUATIONS = 37 PTG = e-06 Final.dat

Gene Frequencies With ILINK ● We did the estimation conditional on there being linkage between marker and disease. ● What happens to the estimates if we assume that the recombination between disease and marker is 50%? ● This involves estimating marker allele frequencies ignoring all information about linkage. Estimation 2

Gene Frequencies With ILINK ● Now we set recombination values to 0.5 and run the ILINK program again. ● The estimates change slightly to the following numbers: – 1 ( ) 2 ( ) 3 ( ) 4 ( ) Estimation 2 (cont.)

Gene Frequencies With ILINK ● Another thing we may think of is jointly estimating recombination fraction with the gene frequencies. ● This can be done by setting the bottom line of the parameter file to be such that all 4 parameters be estimated. ● ILINK results: θ = – 1 ( ) 2 ( ) 3 ( ) 4 ( ) Estimation 3

Gene Frequencies With ILINK Gene Frequency Estimates Under Different Hypotheses Θ = 0.079Θ = 0.500Θ = ΘCounting p p p p

Gene Frequencies With ILINK ● Estimating gene frequencies using different hypotheses leads to slightly different estimates. ● Fortunately, the difference is not huge, though it may have a significant influence on the lod scores in some situations. ● Because most pedigree members were typed in this example, the gene frequencies are not very crucial, whereas in other examples, the results may vary dramatically. Conclusion

The Exercise 1.Go back to Exercise 8 and estimate gene frequencies for the ABO blood group in this same pedigree. 2.Does the lod score change when these frequencies are estimated instead of using population gene frequency estimates? 3.Consider the incomplete penetrance model on this same family. 4.Does encorporating this reduced penetrance affect your estimates of marker allele frequencies? 5.How does the gene frequency information affect the lod score between ABO and the disease?

ABO Blood Group A A A A A AAA AA A B BB B B BAB B B B B B O O A

The Exercise - Solution Estimating Allele Frequencies Estimation 1: We set the recombination fraction between disease and ABO to 0.5 and estimate allele frequencies. Results:A(0.288) B(0.343) O(0.369) Estimation 2: We estimate allele frequencies jointly with the recombination fraction. Results:A (0.277) B(0.341) O (0.382) θ (0.001)

The Exercise - Solution Computing Lod Score 1.Allele frequencies estimated jointly with recombination fraction: Z(θ=0) = Allele frequencies estimated when disease considered to be unlinked to the marker: Z(θ=0) = Treat gene frequency estimates as nuisance parameters.Z(θ=0) =  In our case the lod scores are not greatly affected by the changes in gene frequency estimates at ABO.

The Exercise - Solution Incomplete penetrance model ● Define penetrance for each age class. ● For individuals younger than 10, the penetrance is 0.1 ● For individuals older than 60, the penetrance is 0.9 ● For individuals in the middle use formula for the line connecting the points (10,0.1) and (60,0.9) ● Estimating allele frequencies based on this model. Results: θ=0.5 A (0.288) B(0.343) O (0.369) θ=θ’ A (0.277) B(0.341) O (0.382)

The Exercise - Solution Incomplete penetrance model (cont.) ● The estimations are the same as in full penetrance model. ● This is true because the estimation of allele frequency is done independently of the disease phenotypes in pedigree. ● Another reason is that there is little ambiguity as to the disease locus genotypes of the founders.

The Exercise - Solution Computing Lod Score ● The lod scores are now as follows: Z(θ=0) = θ= θ` Z(θ=0) = θ= 0.5 Z(θ=0) = ● The last lod score is again right between the two lod scores computed with fixed gene frequency estimates.