Presentation is loading. Please wait.

Presentation is loading. Please wait.

David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.

Similar presentations


Presentation on theme: "David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003."— Presentation transcript:

1 David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003

2

3

4

5 How best to analyze data from n correlated variables? (1)Run n univariate tests of linkage? -Increased type I error -How to interpret the results? -Doesn’t take advantage of the multivariate structure of the data (2)Multivariate Analysis -Increased Power vs Univariate case -Type I error controlled -Advantages in interpretation: -See effect of the QTL on each variable in context of all the other measures -Less prone to stochastic variation? Introduction

6 (1)Linear Composite techniques (e.g. Amos et al., 1990) (2)Factor Score Method (Boomsma, 1996) (3)Fitting the full multivariate model to the data (Eaves, Neale & Maes, 1996; Martin, Boomsma & Machin, 1997) Methods

7 Central idea is to create a linear composite of the multivariate phenotypes, and then to perform the linkage analysis on this composite -e.g. Amos et al., 1990: Suggested an extension to Haseman-Elston regression: (Y i 1 – Y i 2 ) 2 = β 0 + β 1 * π i + e i i.e. Estimate a linear composite of squared pair differences in trait measurements which has the strongest correlation between the proportion of alleles shared IBD at the marker locus: [α 1 (y 11 - y 12 ) + α 2 (y 21 - y 22 ) +... α k (y k1 - y k2 )] 2 = β 0 + β 1 * π i + e i with the constraint Σ α = 1. -e.g. Marlow et al. (2003) Perform a Principle Components Analysis, then analyze first principal component as in a univariate analysis (1) Linear Composites k j =1 ^ ^

8 Problems: -Amos et al. 1990 method has all problems associated with Haseman-Elston -Generalization to complex pedigrees? -Power of approaches relative to fitting the full multivariate model?? (1) Linear Composites

9 (2) Factor score method (Boomsma et al., 1996) Calculate factor scores on a pleiotropic genetic factor and then perform a linkage analysis on these genetic factor scores E1E1 V 11 V 12 V 13 V 14 G1G1 ε 11 ε 12 ε 13 ε 14 λ 4g λ 3g λ 2g λ 1g V 21 V 22 V 23 V 24 ε 21 ε 22 ε 23 ε 24 λ 4g λ 3g λ 2g λ 1g G2G2 E2E2 0.5 V ip = λ pg G i + λ pe E i + ε ip p = 4 variables m = 2 pleiotropic factors

10 (1) Fit a common factor model to the data and estimate factor loadings Σ = ΛΨΛ' + Θ Σ is 2p x 2p expected covariance matrix, Λ is 2p x 2m matrix of factor loadings Θ is the estimated 2p x 2p matrix of unique variances Ψ is the 2m x 2m matrix of factor correlations specified apriori p = 4 variables m = 2 pleiotropic factors E 11 V 11 V 12 V 13 V 14 G 11 ε 11 ε 12 ε 13 ε 14 λ 4g λ 3g λ 2g λ 1g V 21 V 22 V 23 V 24 ε 21 ε 22 ε 23 ε 24 λ 4g λ 3g λ 2g λ 1g G 21 E 21 0.5

11 (2) Calculate a weight matrix A (Thurstone): A = ΨΛ' Σ -1 A = ΨΛ'(ΛΨΛ' + Θ) -1 Weight matrix is obtained by minimizing the sum of squared differences between estimated and true factor scores. Equivalent to finding the linear regression of factor scores on phenotypes. (3) Estimate the genetic factor scores by premultiplying the matrix of multivariate phenotypes by a weight matrix A: f= A’P Σ is 2p x 2p expected covariance matrix, Λ is 2p x 2m matrix of factor loadings Θ is the estimated 2p x 2p matrix of unique variances Ψ is the 2m x 2m matrix of factor correlations

12 Advantages: -Partitions out environmental and background genetic noise -Selective genotyping of subjects Disadvantage: -The weight matrix used to calculate the factor scores is calculated from the same sample as the actual factor scores- i.e. the information used to calculate the weight matrix and the information used to test for linkage are not independent

13 P 21 P 22 P 23 (3) Fitting the Full Multivariate Model P 11 P 12 P 13 Q1Q1 Q 21 q1q1 ^ π q2q2 q3q3 q1q1 q2q2 q3q3 NB. QTL within twin cross-trait covariance is the square root of the product of the qtl variances. QTL is parameterized as a latent factor which pleiotropically affects the phenotypes of interest ^ Correlation between QTL factors is set as π

14 P 21 P 22 P 23 P 11 P 12 P 13 Q1Q1 Q 21 q1q1 ^ π q2q2 q3q3 q1q1 q2q2 q3q3 A 11 A 12 A 13 A 21 A 22 A 23 E 11 E 12 E 13 E 21 E 22 E 23 a 22 a 31 a 21 a 11 a 33 a 23 a 22 a 31 a 21 a 11 e 31 e 23 e 11 a 33 a 23 e 22 e 33 e 23 e 22 e 31 e 21 e 11 e 21 e 33 0.5

15 Σ = (6 x 6) AA’ + EE’ + QQ’(0.5 x AA’) + π i QQ’ (0.5 x AA’) + π i QQ’AA’ + EE’ + QQ’ Q’ = [q 1, q 2, q 3 ] A = a 11 a 21 a 22 a 31 a 32 a 33 E = e 11 e 21 e 22 e 31 e 32 e 33

16 ζ a1 β a2 β a3 ζ a2 ζ a3 A1A1 A2A2 A3A3 C1C1 C2C2 C3C3 E1E1 E3E3 E2E2 P1P1 ε3ε3 P2P2 P3P3 ζ c2 ζ c3 ζ e1 ζ c1 ζ e3 ζ e2 ε2ε2 ε1ε1 β c2 β c3 β e2 β e3 λ a2 λ a1 λ c1 λ a3 λ c3 λ c2 λ e2 λ e3 λ e1 q2q2 Q1Q1 q1q1 q3q3

17 “Pi hat” method -The likelihood for each pedigree (i) is calculated as: L(θ) = (2π -k )|Σ i | -1/2 exp[-1/2(y i - μ)’Σ i -1 (y i - μ)] -where -Easy to specify, especially in large pedigrees, but… -Computationally intensive -Bias in selected samples Σ i = AA’ + EE’ + QQ’(0.5 x AA’) + π i QQ’ (0.5 x AA’) + π i QQ’AA’ + EE’ + QQ’

18 -Computationally efficient -Difficult to specify in large sibships/pedigrees “Full IBD Method” -The likelihood for each pedigree (i) is calculated as: P(IBD = 0)(2π -k )|Σ 0 | -1/2 exp[-1/2(y i - μ)’Σ 0 -1 (y i - μ)] P(IBD = 1)(2π -k )|Σ 1 | -1/2 exp[-1/2(y i - μ)’Σ 1 -1 (y i - μ)] P(IBD = 2)(2π -k )|Σ 2 | -1/2 exp[-1/2(y i - μ)’Σ 2 -1 (y i - μ)] + + Σ 0 = AA’ + EE’ + QQ’ (0.5 x AA’) AA’ + EE’ + QQ’ Σ 1 = AA’ + EE’ + QQ’ (0.5 x AA’) + 0.5* QQ’ AA’ + EE’ + QQ’ Σ 2 = AA’ + EE’ + QQ’ (0.5 x AA’) + QQ’ AA’ + EE’ + QQ’

19 Likelihood Ratio Test Under standard conditions, twice the difference in natural log-likelihood between models is distributed asymptotically as a χ 2 distribution with degrees of freedom equal to the difference in the number of parameters between the models BUT In linkage analysis, the likelihood ratio test is conducted under non-standard conditions That is, the true value of some of the parameters under the null hypothesis (i.e. σ q 2 = 0) are located on the boundary of the parameter space defined by the alternative hypothesis. Under these conditions, the likelihood ratio statistic is distributed as a mixture of χ 2 distributions, with the mixing proportions determined by the geometry of the parameter space. For example, in the case of a univariate VC linkage analysis, the test is asymptotically distributed as a 50:50 mixture of χ 1 2 and a point mass at zero (Self & Liang, 1987).

20 Likelihood Ratio Test In multivariate tests of linkage, the situation is even more complicated and determining the asymptotic distribution of the test statistic is difficult. For example, in the case of a bivariate test for linkage, Amos et al. (2002) suggests that the test for linkage is distributed asymptotically as: ¼ χ 0 2 : ½χ 1 2 :¼χ 2 2 Therefore evaluating the test against χ 2 2 will result in a conservative test- although as the number of variables increases, this difference may become small (Marlow et al., 2003). Perhaps the best strategy at the present time is to evaluate significance using empirically derived significance values.

21 Why is power increased? V Q1Q1 q1q1 π V Q2Q2 q1q1 ^ Sib OneSib Two Σ q12q12 q12q12 πq12πq12 ^ =

22 Why is power increased? V 12 V 11 Q1Q1 q2q2 q1q1 π V 22 V 21 Q2Q2 q2q2 q1q1 ^ Sib OneSib Two Σ q12q12 q22q22 πq1q2πq1q2 ^ = q1q2q1q2 πq12πq12 ^ q12q12 q22q22 q1q2q1q2 πq1q2πq1q2 ^ πq22πq22 ^

23 E1E1 V 12 V 11 E2E2 Q S2S2 S1S1 e2e2 e1e1 s2s2 s1s1 q2q2 q1q1 π E1E1 V 22 V 21 E2E2 Q S2S2 S1S1 e2e2 e1e1 s2s2 s1s1 q2q2 q1q1 11 Sibling OneSibling Two ββ α α α α ^ Under what Conditions is Power Greatest? -q 1 and q 2 large -s 1 and s 2 are large -QTL and residual sources of covariation operate in opposite directions

24 Power depends on direction and source of residual phenotypic correlation (q 1 = q 2 = 20%; s 1 = s 2 = 40%; e 1 = e 2 = 40%)

25 -One is most likely to detect a QTL that induces a correlation between variables in the opposite direction to the residual correlation => variables with low phenotypic correlations -Therefore a simple inspection of correlation matrices may not reveal which combination of variables would be best to combine in a multivariate analysis. Perhaps the decision is best guided by information on the biological system being considered. -Can this fact be taken advantage of experimentally? Implications

26 Example CD4/CD8 ratio Project: -Australian monozygotic and dizygotic twins bled at twelve, fourteen and sixteen years of age. -Measured longitudinally on a variety of hematological and immunological indices, including CD4/CD8 ratio which is a measure of immune function Significance: -CD4/CD8 ratio is depressed in a variety of conditions including AIDS, Graft-versus Host disease, and some viral infections -CD4/CD8 ratio predicts the course of HIV infection -Localising a QTL for CD4/CD8 ratio is of therapeutic significance

27

28 Example: Platelet count

29

30 ζ a1 β a2 β a3 ζ a2 ζ a3 A1A1 A2A2 A3A3 D1D1 D2D2 D3D3 E1E1 E3E3 E2E2 P1P1 ε3ε3 P2P2 P3P3 ζ d2 ζ d3 ζ e1 ζ d1 ζ e3 ζ e2 ε2ε2 ε1ε1 β d2 β d3 β e2 β e3 1 1 1 1 1 1 1 1 1 q2q2 Q1Q1 q1q1 q3q3 ζ -Variance of the innovations are estimated λ - Factor loadings are constrained to unity

31 The residual structures can be expressed compactly in matrix algebra form e.g.: A = (I - B) -1 * Ψ * (I - B) -1 ’ + Θ ε I is an identity matrix B is the matrix of transmission coefficients Ψ is the matrix of innovation variances var(ζ 1a ) 0 0 0 var(ζ 2a ) 0 0 0 var(ζ 3a ) Ψ = 0 0 0 β 2 0 0 0 β 3 0 B =

32 Mx Script Platelet count

33

34 Issues: -Has the method increased power??? If not why? -Equating factor loadings? -Likelihood of base model?


Download ppt "David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003."

Similar presentations


Ads by Google