Download presentation

Presentation is loading. Please wait.

Published byAbigail Mooney Modified over 3 years ago

1
1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture models for classifying differentially expressed genes

2
2 Modelling differential expression Many different methods/models for differential expression –t-test –t-test with stabilised variances (EB) –Bayesian hierarchical models –mixture models Choice whether to model alternative hypothesis or not Our model: –Model the alternative hypothesis –Fully Bayesian

3
3 Gene means and fold differences: linear model on the log scale Gene variances: borrow information across genes by assuming exchangeable variances Mixture prior on fold difference parameters Point mass prior for null hypothesis Mixture model features

4
4 1st level y g1r | g, d g, g1 N( g – ½ d g, g1 2 ), y g2r | g, d g, g2 N( g + ½ d g, g2 2 ), 2nd level gs 2 | a s, b s IG (a s, b s ) d g ~ 0 δ 0 + 1 G_ (1.5, 1 ) + 2 G + (1.5, 2 ) 3rd level Gamma hyper prior for 1, 2, a s, b s Dirichlet distribution for ( 0, 1, 2 ) Fully Bayesian mixture model for differential expression Explicit modelling of the alternative H0H0

5
5 In full Bayesian framework, introduce latent allocation variable z g = 0,1 for gene g in null, alternative For each gene, calculate posterior probability of belonging to unmodified component: p g = Pr( z g = 0 | data ) Classify using cut-off on p g (Bayes rule corresponds to 0.5) For any given p g, can estimate FDR, FNR. Decision Rules For gene-list S, est. (FDR | data) = Σ g S p g / |S|

6
6 Simulation Study Explore Explore performance of fully Bayesian mixture in different situations: Non-standard distribution of DE genes Small number of DE genes Small number of replicate arrays Asymmetric distributions of over- and under- expressed genes Simulated data, 50 simulated data sets for each of several different set-ups.

7
7 2500 genes, 8 replicates in each experimental condition d g ~ 0 δ 0 + 1 ( Unif() + (1 - ) N() ) + 2 ( Unif() + (1 - ) N() ) gs ~ logNorm(-1.8, 0.5) ( logNorm based on data ) Simulation Study

8
8 Gamma distributions superimposed Non-standard distributions of DE genes Av. est. π 0 = 0.805 ± 0.010 Av. est. π 0 = 0.797 ± 0.010 Av. est. π 0 = 0.781 ± 0.010 = 0.3 = 0.5 = 0.8 π 0 = 0.8

9
9 Small number of DE genes / Small number of replicate arrays True π 0 = 0.95 True π 0 = 0.99 8 replicates Av. FDR = 7.0 % Av. FNR = 2.0 % Av. est. π 0 = 0.947 ± 0.007 3 replicates Av. FDR = 17.9 % Av. FNR = 3.6 % Av. est. π 0 = 0.956 ± 0.009 8 replicates Av. FDR = 9.2 % Av. FNR = 0.6 % Av. est. π 0 = 0.990 ± 0.003 3 replicates Av. FDR = 17.6 % Av. FNR = 0.9 % Av. est. π 0 = 0.995 ± 0.007

10
10 Asymmetric distributions of over/under-expressed genes True π 0 = 0.9 True π 1 = 0.09 True π 2 = 0.01 Av. est. π 0 = 0.897 ± 0.007 Av. est. π 1 = 0.093 ± 0.003 Av. est. π 2 = 0.011 ± 0.006 d g ~ 0 δ 0 + 1 (0.6 Unif( 0.01, 1.7 ) + 0.4 N(1.7, 0.8) ) + 2 (0.6 Unif( -0.7, -0.01 ) + 0.4 N( -0.7, 0.8) )

11
11 1) FDR / FNR can be estimated well Additional Checks 50 simulations of same set-up: Av. est. π 0 = 0.999 No genes are declared to be DE. 2) Model works when there are no DE genes True FDR Est. FDR True FNR Est. FNR

12
12 Comparison with conjugate mixture prior Replace d g ~ 0 δ 0 + 1 G_ (1.5, 1 ) + 2 G + (1.5, 2 ) with d g ~ 0 δ 0 + 1 N(0, c g 2 ) NB: We estimate both c and 0 in fully Bayesian way. True 0 Est. 0 with Gamma prior Est. 0 with conjugate prior 0.80.781 ± 0.0100.796 ± 0.010 0.950.947 ± 0.0070.955 ± 0.006 0.990.990 ± 0.0030.991 ± 0.003 10.999 ± 0.001

13
13 Application to Mouse data Mouse wildtype (WT) and knock-out (KO) data (Affymetrix) ~ 22700 genes, 8 replicates in each WT and KO Gamma prior Est. π 0 = 0.996 ± 0.001 Declares 59 genes DE

14
14 Summary Good performance of fully Bayesian mixture model –can estimate proportion of DE genes in variety of situations –accurate estimation of FDR / FNR Different mixture priors give similar classification results Gives reasonable results for real data

Similar presentations

OK

Alex Lewin (Imperial College Centre for Biostatistics) Ian Grieve (IC Microarray Centre) Elena Kulinskaya (IC Statistical Advisory Service) Improving Interpretation.

Alex Lewin (Imperial College Centre for Biostatistics) Ian Grieve (IC Microarray Centre) Elena Kulinskaya (IC Statistical Advisory Service) Improving Interpretation.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on movie 300 Ppt on human chromosomes sex Ppt on chapter 3 atoms and molecules bill Simple ppt on wifi technology Ppt on building information modeling training Ppt on conference call etiquette tips Ppt on complex numbers class 11th physics Ppt on linear equations in two variables and functions Ppt on pi in maths cheating Jit ppt on manufacturing company