Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.

Similar presentations


Presentation on theme: "Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering."— Presentation transcript:

1 Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering

2 Linkage Grouping and Locus Ordering  Definition: Linkage grouping is the combining of loci into linkage groups based on the linkage to each other.  Definition: Locus ordering is the arrangement of linkage group loci in a linear order.

3 Linkage Group  Biologically speaking, a linkage group is a group of loci located on the same chromosome.  Statistically speaking, a linkage group is a group of loci grouped together because of some statistical criteria.  There may be difference between the two definitions if there are large segments of genome without markers that statistically isolate multiple groups of loci on the same chromosome.

4 The Basis for Linkage Grouping  By this time, you are familiar with recombination fractions  ij, lod scores z ij, and p-values for linkage p ij, where i and j indicate two different markers.  Any or all of these statistics can be used to group loci.

5 Overview of Linkage Grouping  Iterate through grouping criteria until a grouping with acceptable properties is encountered.  Given particular grouping criteria, iterate through loci to determine grouping pattern.

6 Identifying a Acceptable Grouping  A grouping that is robust to changes in grouping criteria is desirable.  A grouping pattern that matches with biological expectation (e.g. the number of groups equals the number of chromosomes) is desirable.  Very large linkage groups are a sign that that the grouping criteria are too loose.  Many isolated markers suggest bad data, small population, or small number of markers.

7 Log Likelihood in Terms of Gamete Frequencies

8 Log-Likelihood in Terms of Recombination Fractions

9 Log Likelihood in Terms of Interference

10 Determining Order: Two-Locus Recombination Fraction  When order is unknown, assume ABC and calculate the three pairwise recombination fraction.  Order from largest to smallest.  Or order from smallest to largest.

11 Backcross Example Gamete Count ABC and abc  00 69 ABc and abC  01 21 AbC and aBc  10 9 Abc and aBC  11 1

12 Backcross Example (contd)  The largest recombination fraction is between B and C (0.30). Therefore, B and C are located at the ends and A is in the middle: BAC.  The smallest recombination fraction is between A and B. Place AB together. Then C is closest to A, so place C next to A to get CAB.  We arrive at the same order. The second method is more easily extended to greater than 3 loci.

13 Determining Order: Log Likelihood Approach Order ABCACBBAC C33.18180.455 l( , C) -84.65 l( , C=0) -86.82-91.81-87.20 l( , C=1) -93.60-113.78-87.20 l( , C  1) -93.60-113.78-84.65

14 Log Likelihood Approach (contd)  When the likelihoods are fully parameterized, there is no difference in log likelihood between locus orders. Cannot use log likelihood to order!  One must constrain the parameters in order to use the log likelihood to order. The most general and probably biologically reasonable restriction is C  1.

15 Log Likelihood Approach (contd)  Even when the parameters are constrained, there is no degree of freedom to undertake a statistical test (i.e. log likelihood ratio test statistic).  One can resort to the odds ratio, but interpretation is difficult.

16 Log Likelihood Approach (contd)  The interpretation is that order BAC is 7708 times more likely than ABC.  But, you get different odds with different assumptions on C. In addition there is no distributional information, no way to determine significance.

17 Multilocus Ordering  Three-Locus Approach  Maximum Likelihood Approach  SARF and PARF  SALOD  Least Square method based on map distances (covered later).

18 Three-Locus Approach  Find the triplet with highest linkage.  Add next tightest locus by comparing with each of the 3 pairs possible from the established triplet.  Problems: Loci for which there is a contradiction cannot be placed. Local optimal ordering not global optimal order. Information potentially lost by considering triplets only.

19  Suppose you order markers 1, 2, and 3 in order 1-2- 3, using recombination fraction ranking or log- likelihood.  You are now hoping to place marker 4.  You find 1-2-4, 1-4-3, 2-4-3 are supported.  What is the four-locus order?  What if instead you find 1-2-4, 1-4-3, and 2-3-4 are supported? Three-Locus Approach (example)

20 Multilocus Ordering: Maximum Likelihood  Assume that there is no interference. Then we can write the likelihood of the data for all neighboring pairs of loci.  Let there be l loci. Let j=i+1,  ij be the recombination fraction and n ij the number of informative gametes between loci i and j.  Then the likelihood for a particular order is given by:

21 SARF or PARF  SARF is the Sum of Adjacent Recombination Fractions.  PARF is the product of Adjacent Recombination Fractions.  The goal is the find the order that minimizes SARF or PARF.

22 SALOD  SALOD stands for Sum of Adjacent Lod Score and one tries to maximize SALOD in order to find the best order.

23 SALOD PIC

24 SALOD EN

25 Traveling Salesman Problem (TSP)  Locus ordering is a special case of TSP. Given l cities with distances between them of x ij, find the shortest route such that all cities are visited once.

26 Seriation  Used to obtain minimum SARF.  Start: Select the ith locus. Place the most tightly linked loci next to it.  Iterate: Find the remaining unplaced loci which is most closely linked to the ith locus. Place it beside one of the two external loci of all ordered loci to which it is closest.  Finish: There are l locus orders. Find the order with minimum CI.

27 Simulated Annealing Algorithm  Let the current state of the system be i with energy E i.  Perturb the system to generate the next state j with energy E j.  If E i – E j  0, then the new state j is accepted as being better.  If E i – E j < 0, then state j is accepted with probability given by

28 Simulated Annealing Algorithm (cont)  For a given temperature T, a thermal equilibrium is eventually obtained and the distribution of states is given by  Once thermal equilibrium is established, lower T and perturb state until a new thermal equilibrium is reached.

29 Simulated Annealing for SARF  Choose a starting order of l loci.  Perturb the arrangement. Sample perturbations include selecting a random segment and inverting its order, or randomly replacing one locus with another.  Acceptance of new orders is based on the proceeding equations where E i is the SARF for gene order i.  T is chosen to be larger than the largest changes of E i in each stage. Cooling speed of 0.9 has proven useful in simulation.

30 Statistical Support for Locus Order  We know we can calculate the likelihood ratio when using likelihood-based methods. However, the likelihood ratio is difficult to interpret.  A bootstrap approach is also possible.

31 Bootstrap Confidence in Locus Order Locus 1Locus 2Locus 3Locus 4Locus 5 Gene 1p 11 p 12 p 13 p 14 p 15 Gene 2p 21 p 22 p 23 p 24 p 25 Gene 3p 31 p 32 p 33 p 34 p 35 Gene 4p 41 p 42 p 43 p 44 p 45 Gene 5p 51 p 52 p 53 p 54 p 55

32 Confidence Interval for Gene Order Locus NameP ii 95% Interval A1.01-1 B0.982-2 C0.993-3 D0.534-5 E0.534-5 F0.986-6 G0.837-8 H0.819-12...

33 Dropping Problematic Loci  Sometimes particularly difficult to order loci can affect the order of other loci and removing them can improve confidence in the map. These tend to be tightly linked loci that would need a larger sample size to resolve.  A framework map is one where certain loci are dropped based on some criteria (e.g. log 10 (L 1 /L 2 )>3)

34 Bootstrap Plus Jacknife  Drop each locus in turn and estimate PCO.  If PCO increases significantly when a locus is discarded, that locus is considered “bad”.  If PCO decreases significantly when a locus is discarded, then that locus is considered essential.

35 Summary  How to make linkage groups.  How to calculate the log likelihood for three- locus model.  Determining order for for three-locus model.  Computational approaches to multilocus ordering.  Statistical support for locus order.


Download ppt "Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering."

Similar presentations


Ads by Google