Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.

Slides:



Advertisements
Similar presentations
Planning breeding programs for impact
Advertisements

Mapping genes with LOD score method
Gene linkage seminar No 405 Heredity. Key words: complete and incomplete gene linkage, linkage group, Morgan´s laws, crossing-over, recombination, cis.
LINKAGE AND CHROMOSOME MAPPNG
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Concepts and Connections
Chromosome Mapping in Eukaryotes
Biology 2250 Principles of Genetics Announcements Lab 3 Information: B2250 (Innes) webpage Lab 3 Information: B2250 (Innes) webpage download and print.
AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS Mary Sara McPeek Presented by: Yue Wang and Zheng Yin 11/25/2002.
Linkage Mapping Physical basis of linkage mapping
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Linkage Genes linked on the same chromosome may segregate together.
Joint Linkage and Linkage Disequilibrium Mapping
QTL Mapping R. M. Sundaram.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Mendelian Genetics.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
An quick survey of human 2-point linkage analysis Terry Speed Division of Genetics & Bioinformatics, WEHI Department of Statistics, UCB NWO/IOP Genomics.
Molecular Mapping. Terminology Gene: a particular sequence of nucleotides along a molecule of DNA which represents a functional unit of inheritance. (
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Review Question If two genes are 13 map units apart on a linkage map, what proportion of recombinant offspring will be seen in a testcross? What proportion.
Mapping Basics MUPGRET Workshop June 18, Randomly Intermated P1 x P2  F1  SELF F …… One seed from each used for next generation.
Genetic Recombination in Eukaryotes
Gene linkage seminar No 405 Heredity. Key words: complete and incomplete gene linkage, linkage group, Morgan´s laws, crossing-over, recombination, cis.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Textbook. Textbook Grading 30% Homework (one per two weeks) 70% Research project - Class presentation (20%) - Written report (50%)
DATA ANALYSIS Module Code: CA660 Lecture Block 4.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Chi-square Statistical Analysis You and a sibling flip a coin to see who has to take out the trash. Your sibling grows skeptical of the legitimacy of.
Mapping populations Controlled crosses between two parents –two alleles/locus, gene frequencies = 0.5 –gametic phase disequilibrium is due to linkage,
1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.
Class 3 1. Construction of genetic maps 2. Single marker QTL analysis 3. QTL cartographer.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Welcome to the Presentation
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Quantitative Genetics
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Grouping loci Criteria Maximum two-point recombination fraction –Example -r ij ≤ 0.40 Minimum LOD score - Z ij –For n loci, there are n(n-1)/2 possible.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 15: Linkage Analysis VII
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Recombination Frequency and Gene Linkage Mapping
Computational Issues on Statistical Genetics Develop Methods Data Collection Analyze Data Write Reports/Papers Research Questions Review the Literature.
Statistical Genetics Instructor: Rongling Wu.
Lecture 22: Quantitative Traits II
Chapter 7 Outline 7.1 Linked Genes Do Not Assort Independently, Linked Genes Segregate Together, and Crossing Over Produces Recombination between.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Step one Two gene loci: A & B What will your first cross be in an experiment to test for possible meiotic crossing over? Hint: what condition do you have.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
I. Allelic, Genic, and Environmental Interactions
Bioinformatic tools for Genome Mapping Avraham Korol (2449), room 217 in multipurpose building Course Assistant: Irit.
I. Allelic, Genic, and Environmental Interactions
The Chromosomal Basis of Inheritance GENE MAPPING AP Biology/ Ms. Day
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Gene Linkage and Genetic Mapping
Linkage, Recombination, and Eukaryotic Gene Mapping
Basic concepts on population genetics
How to Solve Linkage Map Problems
Why we look the way we look...
The Chromosomal Basis of Inheritance GENE MAPPING AP Biology/ Ms. Day
Lecture 9: QTL Mapping II: Outbred Populations
Introduction to Genetics
Presentation transcript:

Genetic design

Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected frequency½½¼ ½¼ Expected numbern/2n/2n/4n/2n/4 The x 2 test statistic is calculated by x 2 =  (obs – exp) 2 /exp = (n 1 -n/2) 2 /(n/2) + (n 0 -n/2) 2 /(n/2) =(n 1 -n 0 )2/n ~x 2 df=1, for BC, (n 2 -n/4) 2 /(n/4)+(n 1 -n/2) 2 /(n/2)+(n 0 -n/4) 2 /(n/4)~x 2 df=2, for F 2

Examples BackcrossF2 AaaaAAAaaa Observation Expected frequency½½¼ ½¼ Expected number The x 2 test statistic is calculated by x 2 =  (obs – exp) 2 /exp = (44-59) 2 /103 = < x 2 df=1 = 3.841, for BC, ( ) 2 /42.75+( ) 2 /85.5+( ) 2 /42.75=0.018 < x 2 df=2 =5.991, for F2 The marker under study does not deviate from Mendelian segregation in both the BC and F2.

Linkage analysis Backcross ParentsAABB x aabb AB ab F1 AaBb x aabb ABAbaBab ab BCAaBbAabbaaBbaabb Obsn 11 n 10 n 01 n 00  n =  n ij Freq½(1-r)½r ½r ½(1-r) r is the recombination fraction between two markers A and B. The maximum likelihood estimate (MLE) of r is r^ = (n 10 +n 01 )/n. r has interval [0,0.5]: r=0 complete linkage, r=0.5, no linkage

Proof of r^ = (n 10 +n 01 )/n The likelihood function of r given the observations: L(r|n ij ) = n!/(n 11 !n 10 !n 01 !n 00 !)  [ ½(1-r) ] n11 [ ½r ] n10 [ ½r ] n01 [ ½(1-r) ] n00 = n!/(n 11 !n 10 !n 01 !n 00 !)  [ ½(1-r) ] n11+n00 [ ½r ] n10+n01 log L(r|n ij ) = C+(n 11 +n 00 )log[ ½(1-r) ] +(n 10 +n 01 )log[½r] = C + (n 11 +n 00 )log(1-r) + (n 10 +n 01 )log r + nlog(½) Let the score  logL(r|n ij )/  r = (n 11 +n 00 )[-1/(1-r)] +(n 10 +n 01 )(1/r) = 0, we have (n 11 +n 00 )[1/(1-r)]=(n 10 +n 01 )(1/r)  r^ = (n 10 +n 01 )/n

Testing for linkage BCAaBbaabbAabbaaBb Obsn 11 n 00 n 10 n 01  n=  n ij Freq½(1-r) ½(1-r) ½r ½r Gamete type n NR = n 11 +n 00 n R = n 10 +n 01 Freq with no linkage ½ ½ Exp ½n ½n x 2 =  (obs – exp) 2 /exp = (n NR - n R ) 2 /n ~ x 2 df=1 ExampleAaBbaabbAabbaaBb n NR = 49+47=96 n R = = 7 n=96+7=103 x 2 =  (obs – exp) 2 /exp = (96-7)2/103 = > x 2 df=1 = These two markers are statistically linked. r^ = 7/103 = 0.068

Linkage analysis in the F2 BBBbbb AAObsn 22 n 21 n 20 Freq¼(1-r) 2 ½r(1-r) ¼r 2 AaObs n 12 n 11 n 10 Freq ½r(1-r) ½(1-r) 2 +½r 2 ½r(1-r) aaObs n 02 n 01 n 00 Freq ¼r 2 ½r(1-r) ¼(1-r) 2 Likelihood function L(r|n ij ) = n!/(n 22 !...n 00 !)  [¼(1-r) 2 ] n22+n00 [¼r 2 ] n20+n02 [½r(1-r)] n21+n12+n10+n01  [½(1-r) 2 +½r 2 ] n11 Let the score = 0 so as to obtain the MLE of r, but this will be difficult because AaBb contains a mix of two genotype formation types (in the dominator we will have ½(1-r) 2 +½r 2 ).

I will propose a shortcut EM algorithm for obtain the MLE of r BBBbbb AAObsn 22 n 21 n 20 Freq¼(1-r) 2 ½r(1-r) ¼r 2 Recombinant012 AaObs n 12 n 11 n 10 Freq ½r(1-r) ½(1-r) 2 +½r 2 ½r(1-r) Recombinant1r 2 /[(1-r) 2 +r 2 ]1 aaObs n 02 n 01 n 00 Freq ¼r 2 ½r(1-r) ¼(1-r) 2 Recombinant210

Based on the distribution of the recombinants (i.e., r), we have r = 1/(2n)[2(n 20 +n 02 )+(n 21 +n 12 +n 10 +n 01 )+r 2 /[(1-r) 2 +r 2 ]n 11 (1) = 1/(2n)(2n 2R + n 1R +  n 11 ) where n 2R = n 20 +n 02, n 1R = n 21 +n 12 +n 10 +n 01, n 0R = n 22 +n 00. The EM algorithm is formulated as follows E step: Calculate  = r 2 /[(1-r) 2 +r 2 ] (expected the number of recombination events for the double heterozygote AaBb) M step: Calculate r^ by substituting the calculated  from the E step into Equation 1 Repeat the E and M step until the estimate of r is stable

Example BBBbbb AAn 22 =20n 21 =17n 20 =3 Aan 12 =20 n 11 =49n 10 =19 aan 02 =3 n 01 =21n 00 =19 Calculating steps: 1.Give an initiate value for r, r (1) =0.1, 2.Calculate  (1) =(r (1) ) 2 /[(1- r (1) ) 2 +(r (1) ) 2 ] = /[(1-0.1) ] = x; 3.Estimate r using Equation 1, r (2) = y; 4.Repeat steps 2 and 3 until the estimate of r is stable (converges). The MLE of r = How to determine that r has converged? |r (t+1) – r (r) | < a very small number, e.g., e -8

Testing the linkage in the F2 BBBbbb AAObsn 22 =20n 21 =17n 20 =3 Exp with no linkage1/16n1/8n1/16n AaObsn 12 =20 n 11 =49n 10 =19 Exp with no linkage1/8n¼n1/8n aaObsn 02 =3 n 01 =21n 00 =19 Exp with no linkage1/16n1/8n1/16n n =  n ij = 191 x 2 =  (obs – exp) 2 /exp ~ x 2 df=1 = (20-1/16×191)/(1/16×191) + … = a > x 2 df=1 =3.381 Therefore, the two markers are significantly linked.

Log-likelihood ratio test statistic Two alternative hypotheses H0: r = 0 vs. H1: r  0 Likelihood value under H1 L 1 (r|n ij ) = n!/(n 22 !...n 00 !)  [¼(1-r) 2 ] n22+n00 [¼r 2 ] n20+n02 [½r(1-r)] n21+n12+n10+n01 [½(1-r) 2 +½r 2 ] n11 Likelihood value under H0 L 0 (r=0.5|n ij ) = n!/(n 22 !...n 00 !)  [¼(1-0.5) 2 ] n22+n00 [¼0.5 2 ] n20+n02 [½0.5(1-0.5)] n21+n12+n10+n01 [½(1-0.5) 2 +½0.5 2 ] n11 LOD = log 10 [L 1 (r|n ij )/L 0 (r=0.5|n ij )] = {(n 22 +n 00 )2[log 10 (1-r)-log 10 (1-0.5)+…} = 6.08 > critical LOD=3

Three-point analysis Determine a most likely gene order; Make full use of information about gene segregation and recombination Consider three genes A, B and C. Three possible orders A-B-C, A-C-B, or B-A-C

AaBbCc produces 8 types of gametes (haplotypes) which are classified into four groups Recombinant # between ObservationFrequency A and BB and C ABC and abc 00n 00 =n ABC +n abc g 00 Abc and abC01n 01 =n Abc +n abC g 01 aBC and Abc10n 10 =n aBC +n Abc g 10 AbC and aBc11n 11 =n AbC +n aBc g 11 Note that the first subscript of n or g denotes the number of recombinant between A and B, whereas the second subscript of n or g denotes the number of recombinant between B and C (assuming order A-B-C)

Matrix notations Markers B and C Markers A and BRecombinantNon-recombinantTotal Recombinantn 11 n 10 Non-recombinantn 01 n 00 Totaln Recombinantg 11 g 10 r AB Non-recombinantg 01 g 00 1-r AB Total r BC 1-r BC 1 What is the recombination fraction between A and C? r AC = g 01 + g 10 Thus, we have r AB = g 11 + g 10 r BC = g 11 + g 01 r AC = g 01 + g 10

The data log-likelihood (complete data, it is easy to derive the MLEs of g ij ’s) log L(g 00, g 01, g 10, g 11 | n 00, n 01, n 10, n 11, n) = log n! – (log n 00 ! + log n 01 ! + log n 10 ! + log n 11 !) + n 00 log g 00 + n 01 log g 01 + n 10 log g 10 + n 11 log g 11 The MLE of g ij is: g ij ^ = n ij /n Based on the invariance property of the MLE, we obtain the MLE of r AB, r AC and r BC. A relation: 0  g 11 = ½(r AB + r BC - r AC )  r AC  r AB + r BC 0  g 10 = ½(r AB - r BC + r AC )  r BC  r AB + r AC 0  g 01 = ½(-r AB + r BC + r AC )  r AB  r AC + r BC

Advantages of three-point (and generally multi-point) analysis Determine the gene order, Increase the estimation precision of the recombination fractions (for partially informative markers).

Real-life example – AoC/oBo  ABC/ooo Eight groups of offspring genotypes A_B_C_ A_B_cc A_bbC_ A_bbcc aaB_C_ aaV_cc aabbC_ aabbcc Obs Order A -B-C Two-point analysis 0.38    Three-point analysis 0.20    0.059

Multilocus likelihood – determination of a most likely gene order Consider three markers A, B, C, with no particular order assumed. A triply heterozygous F1 ABC/abc backcrossed to a pure parent abc/abc GenotypeABC or abc ABc or abC Abc or aBC AbC or aBc Obs. n 00 n 01 n 10 n 11 Frequency under Order A-B-C (1-r AB )(1- r BC ) (1-r AB ) r BC r AB (1- r BC ) r AB r BC Order A-C-B (1-r AC )(1- r BC ) r AC r BC r AC (1-r BC ) (1-r AC )r BC Order B-A-C (1-r AB )(1- r AC ) (1-r AB ) r AC r AB r AC r AB (1-r AC ) r AB = the recombination fraction between A and B r BC = the recombination fraction between B and C r AC = the recombination fraction between A and C

It is obvious thatr AB = (n 10 + n 11 )/n r BC = (n 01 + n 11 )/n r AC = (n 01 + n 10 )/n What order is the mostly likely? L ABC  (1-r AB ) n00+n01 (1-r BC ) n00+n10 (r AB ) n10+n11 (r BC ) n01+n11 L ACB  (1-r AC ) n00+n11 (1-r BC ) n00+n10 (r AC ) n01+n10 (r BC ) n01+n11 L BAC  (1-r AB ) n00+n01 (1-r AC ) n00+n11 (r AB ) n10+n11 (r AC ) n01+n10 According to the maximum likelihood principle, the linkage order that gives the maximum likelihood for a data set is the best linkage order supported by the data. This can be extended to include many markers for searching for the best linkage order.

Map function Transfer the recombination fraction (non-additivity) between two genes into their corresponding genetic map distance (additivity) Map distance is defined as the mean number of crossovers The unit of map distance is Morgan (in honor of T. H. Morgan who obtained the Novel prize in 1930s) 1 Morgan or M = 100 centiMorgan or cM

The Haldane map function (Haldane 1919) Assumptions: No interference (the occurrence of one crossover is independent of that of next) Crossover events follow the Poisson distribution. Consider three markers with an order A-B-C A triply heterozygous F1 ABC/abc backcrossed to a pure parent abc/abc EventGameteFrequency No crossoverABC or abc(1-r AB )(1-r BC ) Crossover between B&CABc or abC(1-r AB )r BC Crossover between A&B Abc or aBCr AB (1-r BC ) Crossovers between A&B and B&C AbC or aBcr AB r BC The recombination fraction between A and C is expected to be r AC = (1-r AB )r BC + r AB (1-r BC ) = r AB +r BC -2r AB r BC  (1-2r AC )=(1-2r AB )(1-2r BC )

Map distance: A genetic length (map distance) x of a chromosome is defined as the mean number of crossovers. Poisson distribution (x = genetic length): Crossover event0123…t… Probabilitye -x xe -x x 2 e -x x 3 e -x …x t e -x … 2! 3! t!

The value of r (recombination fraction) for a genetic length of x is the sum of the probabilities of all odd numbers of crossovers: r = e -x (x 1 /1! + x 3 /3! + x 5 /5! + x 7 /7! + …) = ½(1- e -2x ) x = -ln(1-2r) We have x AC = x AB + x BC for a given order A-B-C, but generally, r AC  r AB + r BC

Proof of x AC = x AB + x BC For order A-B-C, we have r AB = ½(1- e -2xAB ), r BC = ½(1- e -2xBC ), r AC = ½(1- e -2xAC ) r AC = r AB + r BC – 2r AB r BC = ½(1- e -2xAB ) + ½(1- e -2xBC ) - 2 ½(1- e -2xAB ) ½(1- e -2xBC ) = ½[1- e -2xAB +1- e -2xBC -1+ e -2xAB + e -2xBC - e -2xAB e -2xBC = ½(1- e -2(xAB+xBC) ) = ½(1- e -2xAC ), which means x AC = x AB + x BC

The Kosambi map function (Kosambi 1943) The Kosambi map function is an extension of the Haldane map function For gene order A-B-C [1] r AC = r AB + r BC – 2r AB r BC [2] r AC  r AB + r BC, for small r’s [3] r AC  r AB + r BC – r AB r BC, for intermediate r’s The Kosambi map function attempts to find a general expression that covers all the above relationships

Map Function HaldaneKosambi x = -½ln(1-2r)¼ln(1+2r)/(1-2r) r =½(1-e -2x )½(e 2x -e -2x )/(e 2x +e -2x ) r AB =r A +r B -2r A r B (r A +r B )/(1+4r A r B ) Reference Ott, J, 1991 Analysis of Human Genetic Linkage. The Johns Hopkins University Press, Baltimore and London

Construction of genetic maps The Lander-Green algorithm -- a hidden Markov chain Genetic algorithm

Linkage analysis between two dominant markers Dominant marker - AA and AO cannot be separated from each other in phenotype. But both of them are different from third genotype OO. Two codominant markers A and B 1/4AA + 1/2Aa + 1/4aa = 1AA: 2Aa: 1aa 1/4BB + 1/2Bb + 1/4bb = 1BB: 2Bb: 1bb Two dominant markers A and B Mix(1/4AA + 1/2AO): 1/4OO = 3A_ : 1OO Mix(1/4BB + 1/2BO): 1/4OO = 3B_ : 1OO

Let r be the recombination fraction between the two markers For two codominant markers, we have 9 (=3x3) groups of genotypes in the F 2, whose genotype frequencies are expressed, in matrix notation, as AAAaaatotal BB¼(1-r) 2 ½r(1-r)¼r 2 1/4 Bb½r(1-r)½[(1-r) 2 +r 2 ]¼r(1-r) 1/2 bb¼r 2 ½r(1-r)¼(1-r) 2 1/4 Total1/41/21/41

For one codominant (B) and one dominant marker (A), 9 groups of genotypes will be collapsed into 6 (=3  2) distinguishable groups: A_aatotal BB¼(1-r) 2 + ½r(1-r)¼r 2 1/4 Bb½r(1-r) + ½[(1-r) 2 +r 2 ]¼r(1-r) 1/2 bb¼r 2 + ½r(1-r)¼(1-r) 2 ¼ Total3/41/41

For two dominant markers, 9 groups of genotypes will be collapsed into 4 (=2  2) distinguishable groups. The 2x2 probability matrix is A_aatotal B_¼(1-r) 2 + ½r(1-r)¼r 2 + ½r(1-r) + ½[(1-r) 2 +r 2 ] + ¼r(1-r) 3/4 bb¼r 2 + ½r(1-r)¼(1-r) 2 1/4 Total3/41/41

Observations counted from the poplar data set provided: A_aaTotal B_n 1 n 2 bbn 3 n 4 Totaln

Expected numbers of recombinant gametes (the number of r) A_aa B_c 1 c 2 bbc 2 0 where c 1 = [2  ½r 2 +1  r(1-r)]/[½r 2 +r(1-r)+¾(1-r) 2 ](1) c 2 = [2  ¼r 2 +1  ½r(1-r)]/[¼r 2 +½r(1-r)](2) Based on the definition of r (the proportion of recombinant gametes over all gametes), we have r^ = (c 1 n 1 + c 2 n 2 + c 3 n 3 ) /2n(3) Equations (1) and (2) are for the E step, whereas Equation (3) is for the M step.