Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

Similar presentations


Presentation on theme: "Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density."— Presentation transcript:

1 Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density

2 True Multi-Locus Mapping  True multi-locus mapping would use all the data to build an order and distance between loci. BUT...  Large number of unknown parameters.  There are 2 l-1 gamete types and the sample size is usually not large enough to populate all of these types.  Computationally intensive as there are l!/2 possible orders.

3 Least Squares Method  r ij is the recombination fraction between loci i and j.  M ij is the map distance between loci i and j.  s rij is the standard deviation of r ij.  m i is the map distance between loci i and i+1. m1m1 m2m2 m3m3 m4m4 m5m5 m6m6 r 12 r 23 r 34 r 45 r 56 r 67 1234567 recomb. fraction map distance

4 Least Squares Method (cont)

5 Least Squares: Haldane Map Function  Recall the map function.  Find the inverse map function F(  ).  Take the first derivative of F(  ).  Plug first derivative into approximate formula for S M.

6 Least Squares: Kosambi Map Function  Recall the inverse map function F(  ).  Take the first derivative of F(  ).  Plug first derivative into approximate formula for S M.

7 Least Squares Method (cont)

8

9 Least Squares: Data Markersr M HaldaneExpected 120.10 (0.03)0.11 (0.038)m1m1 230.15 (0.04)0.18 (0.057)m2m2 130.3 (0.13)0.46 (0.325)m 1+ m 2

10 Least Squares: Calculation

11 Least Squares: Variance Estimation

12 Least Squares: Variance Calculation

13 Why is this Least Squares?

14 Alternative Weighting  Use LOD score for linkage as weight. Then the equation becomes:

15 EM Algorithm (Lander-Green)  Make an initial guess for  0 = (  1,  2,...,  l-1 ).  E Step: Compute the expected number of recombinants for each interval assuming current  old.  M Step: Treating the expected values as true, compute maximum likelihood estimate  new.  Iterate EM until likelihood converges.

16 EM Algorithm ABBCAC True recombination fraction 11 22 True number of recombinantst1t1 t2t2 Total observed gametesN 12 N 23 N 13 Number observed recombinantsR 12 R 23 R 13

17 EM Algorithm: E Step  t 1 = R 12 + P(rec. in AB | rec. in AC)R 13 + P(rec. in AB | no rec. in AC)(N 13 – R 13 )  t 2 = R 23 + P(rec. in BC | rec. in AC)R 13 + P(rec. in BC | no rec. in AC)(N 13 – R 13 )

18 EM Algorithm: E Step (cont)

19 EM Algorithm: M Step

20 Simulation  Find map function which fits the data well by comparing the likelihoods of the data.  Distribution of likelihood difference is unknown, so simulation is needed to obtain it empirically.

21 Simulation: Evidence for Interference  Recall that if you are given pairwise recombination fractions  ij and a map function, you know how to find the gametic frequencies .  Then the log likelihood is given by (m = 2 l-1 )

22 Simulation: Implementation  To simulate under the null hypothesis of no interference, we assume the neighbor pairwise recombination fractions and simulate gametes under the assumption of no interference. 00 10 01 11

23 Marker Coverage and Map Density  Proportion of genome covered by markers is the marker coverage. It is simply the genomic map length divided by total genome length.  The maximum genome segment between two adjacent markers is an indicator of map density. It is the average or maximum map distance between two adjacent markers.

24 Random Distribution of Markers  Markers are generally assumed to be distributed randomly throughout the genome.  Nonrandom distribution will generally decrease coverage and lower density.  Unfortunately markers may be non- randomly distributed. Name some reasons.

25 Mapping Population  Even if you have many markers, if your sample is small you may have insufficient information to achieve high coverage and density.  Unattached genome segments are most common coverage problem.  Solutions: increase sample size or using mapping population with more information (greater polymorphism).

26 Data Analysis and Models  Wrong gene order can overestimate the map length thus overestimating map coverage and underestimating density.  The wrong mapping function may convert recombination fractions into the wrong map distance, causing over/underestimation.  Different grouping criteria can lead to different linkage groups. The more stringent, the more linkage groups and the lower the coverage and higher the density.

27 Prediction of Marker Coverage and Density  A method for predicting marker coverage and density are based on the assumption of random distribution: confidence probability P is the probability that at least one marker is located in a 2d M genome segment.

28 Calculations  Suppose the genome is a total L long.  P(a marker not fall on 2d segment) = 1-2d/L.  P(n markers don’t fall on 2d segment) = (1- 2d/L) n.

29 Calculations  P(at least one marker on 2d segment) = 1-(1- 2d/L) n

30 Calculations  When 2d/L < 0.1, then

31 Predicted Number of Markers Needed

32 Prediction when Genome Length Unknown  Use all (500) markers to estimate a genetic map and assume the genome length is the length of this map, say L 500.  Randomly draw 100 markers from the dataset with replacement. Estimate the genome length for 100 makers only, say L 100.

33 Advantages of the Simulation Approach  No assumptions on marker distribution needed.  No prior information about actual genome length is needed.  Approach can be used to test other factors that might affect marker coverage along as those factors can be resampled.

34 Summary  Least squares method for building genetic maps.  EM algorithm method for building genetic maps.  Simulated likelihood ratio statistic distribution for hypothesis tests.  Predicting marker coverage and density.


Download ppt "Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density."

Similar presentations


Ads by Google