Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.

Similar presentations


Presentation on theme: "Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible."— Presentation transcript:

1 Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible strains We model where Y is phenotype, X – “genotype” matrix, π – probability of descending from strains, z – flanking markers The objective is to estimate β’s and to test for

2 Linear regression replace the true design matrix with E(X) and the estimator is given by - estimator is unbiased - normally distributed - linear in Y - has large variance due to collinearity Approaches

3 Approaches (2) Maximum likelihood estimator Maximize with respect to β: - expression simplifies - easy to evaluate point-wise - functional form not known, hence difficult to optimize - properties of the MLE are unknown

4 The two steps are: E-step, calculate for i-th mouse (only for categorical covariates) M-step, maximize Q w.r.t. β Advantages: Automatic Fast Approximate distribution of estimates allows to perform testing Easily generalised to GLM Approaches (3) Use a stochastic optimiser for finding MLE: the EM

5 M-step becomes equivalent to a Weighted Least Squares or a weighted GLM model (fitting routines available in R and Matlab): Where Y and X are augmented matrices, the weights matrix constructed using HMM output. Below there are only results for normal distribution of Y but the EM was applied to the binomial and exponential cases as well. Implementation of the EM

6 Given the phenotypes Y and the weights W we create the model: Augmenting the model with corresponding weights

7 Simulated example: generated phenotypes Response generated for set variance 0.3 and β = (1,0,0,0,0,0,0,0)

8 Values of β parameters at the EM iterations. The real values are (1,0,0,0,0,0,0,0). Running the EM

9 10 seconds - approximate running time for the WLS case - on 1,649 mice - implemented in Matlab - with convergence achieved at 15 iterations for some starting points 60 seconds - For 3,298 mice Running time

10 Likelihood ratio test performed for - the EM - linear regression with known design matrix - linear regression with the expectation of design matrix. Testing under collinearity

11 E(X) case null distribution Empirical null distributions EM algorithm null distribution

12 Description of the power of the LR test All β’s set to 0 except first one Simulate data sets and plot number of rejections For each value of β 500 simulations were performed Power curves

13 Simulated power curves Most likely combination of progenitor strains Randomly drawn combination of progenitor strains Least likely combination of progenitor strains

14 Considered OpenArmTime phenotype ~200 mice have zero records and were removed Is it a mixture of normal distributions? Data

15 Time to event models - Censored data - Cox proportional hazards model Bayesian models Implementation in R Models for multivariate phenotypes Multiple hypothesis testing HMM improvement Future development


Download ppt "Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible."

Similar presentations


Ads by Google