1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

Slides:

Advertisements

Similar presentations

Alex Lewin (Imperial College Centre for Biostatistics) Ian Grieve (IC Microarray Centre) Elena Kulinskaya (IC Statistical Advisory Service) Improving Interpretation.

Advertisements

Pattern Recognition and Machine Learning

Bayes rule, priors and maximum a posteriori

Interim Analysis in Clinical Trials: A Bayesian Approach in the Regulatory Setting Telba Z. Irony, Ph.D. and Gene Pennello, Ph.D. Division of Biostatistics.

ICES III, June, 2007 Zdenek Patak & Jack Lothian, Statistics Canada ENHANCING THE QUALITY OF PRICE INDEXES – A SAMPLING PERSPECTIVE.

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.

Hierarchical Models and

Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, : Imperial College Dept. Epidemiology 2: Imperial College.

STATISTICAL TOOLS FOR SYNTHESIZING LISTS OF DIFFERENTIALLY EXPRESSED FEATURES IN MICROARRAY EXPERIMENTS Marta Blangiardo and Sylvia Richardson 1 1 Centre.

Estimating the False Discovery Rate in Multi-class Gene Expression Experiments using a Bayesian Mixture Model Alex Lewin 1, Philippe Broët 2 and Sylvia.

Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.

Alex Lewin Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) Philippe Broët (INSERM, Paris) In collaboration with Anne-Mette Hein,

1 Modelling of CGH arrays experiments Philippe Broët Faculté de Médecine, Université de Paris-XI Sylvia Richardson Imperial College London CGH = Competitive.

BGX 1 Sylvia Richardson Centre for Biostatistics Imperial College, London Statistical Analysis of Gene Expression Data In collaboration with Natalia Bochkina,

Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.

Structured statistical modelling of gene expression data Peter Green (Bristol) Sylvia Richardson, Alex Lewin, Anne-Mette Hein (Imperial) with Clare Marshall,

Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina.

Model checking in mixture models via mixed predictive p-values Alex Lewin and Sylvia Richardson, Centre for Biostatistics, Imperial College, London Mixed.

1 Sylvia Richardson Centre for Biostatistics Imperial College, London Bayesian hierarchical modelling of genomic data In collaboration with Natalia Bochkina,

1 Sylvia Richardson Centre for Biostatistics Imperial College, London Bayesian hierarchical modelling of gene expression data In collaboration with Natalia.

BGX 1 Sylvia Richardson Natalia Bochkina Alex Lewin Centre for Biostatistics Imperial College, London Bayesian inference in differential expression experiments.

Linear Models for Microarray Data

Wellcome Trust Centre for Neuroimaging

Sampling Distributions

Hypothesis Test II: t tests

On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach

9.4 t test and u test Hypothesis testing for population mean Example : Hemoglobin of 280 healthy male adults in a region: Question: Whether the population.

Chapter 13: Chi-Square Test

1 Uncertainty in rainfall-runoff simulations An introduction and review of different techniques M. Shafii, Dept. Of Hydrology, Feb

Hypothesis Tests: Two Independent Samples

“Students” t-test.

10/11/2014 Perkins AP Calculus AB Day 13 Section 3.9.

Statistical Inferences Based on Two Samples

Research Methodology Statistics Maha Omair Teaching Assistant Department of Statistics, College of science King Saud University.

CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.

Categorical Data Analysis

(C) SAS Institute Inc All Rights Reserved. Measurement of Academic Growth of Individual Students Toward Variable and Meaningful Academic Standards.

Basics of Statistical Estimation

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.

A Bayesian mixture model for detecting unusual time trends Modelling burglary counts in Cambridge Guangquan (Philip) Li 4 th ESRC Research Methods Festival.

Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley

Sylvia Richardson, with Alex Lewin Department of Epidemiology and Public Health, Imperial College Bayesian modelling of differential gene expression data.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

EM and expected complete log-likelihood Mixture of Experts

Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.

Stick-Breaking Constructions

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

Latent Dirichlet Allocation

1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.

Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.

Estimation of Gene-Specific Variance

Bayesian Semi-Parametric Multiple Shrinkage

Mixture Modeling of the p-value Distribution

Lecture Nine - Twelve Tests of Significance.

Differential Gene Expression

Uniform-Beta Mixture Modeling of the p-value Distribution

Bayesian Models in Machine Learning

Parametric Empirical Bayes Methods for Microarrays

Statistics II: An Overview of Statistics

LECTURE 23: INFORMATION THEORY REVIEW

Mathematical Foundations of BME

Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology)

Presentation transcript:

1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture models for classifying differentially expressed genes

2 Modelling differential expression Many different methods/models for differential expression –t-test –t-test with stabilised variances (EB) –Bayesian hierarchical models –mixture models Choice whether to model alternative hypothesis or not Our model: –Model the alternative hypothesis –Fully Bayesian

3 Gene means and fold differences: linear model on the log scale Gene variances: borrow information across genes by assuming exchangeable variances Mixture prior on fold difference parameters Point mass prior for null hypothesis Mixture model features

4 1st level y g1r | g, d g, g1 N( g – ½ d g, g1 2 ), y g2r | g, d g, g2 N( g + ½ d g, g2 2 ), 2nd level gs 2 | a s, b s IG (a s, b s ) d g ~ 0 δ G_ (1.5, 1 ) + 2 G + (1.5, 2 ) 3rd level Gamma hyper prior for 1, 2, a s, b s Dirichlet distribution for ( 0, 1, 2 ) Fully Bayesian mixture model for differential expression Explicit modelling of the alternative H0H0

5 In full Bayesian framework, introduce latent allocation variable z g = 0,1 for gene g in null, alternative For each gene, calculate posterior probability of belonging to unmodified component: p g = Pr( z g = 0 | data ) Classify using cut-off on p g (Bayes rule corresponds to 0.5) For any given p g, can estimate FDR, FNR. Decision Rules For gene-list S, est. (FDR | data) = Σ g S p g / |S|

6 Simulation Study Explore Explore performance of fully Bayesian mixture in different situations: Non-standard distribution of DE genes Small number of DE genes Small number of replicate arrays Asymmetric distributions of over- and under- expressed genes Simulated data, 50 simulated data sets for each of several different set-ups.

genes, 8 replicates in each experimental condition d g ~ 0 δ ( Unif() + (1 - ) N() ) + 2 ( Unif() + (1 - ) N() ) gs ~ logNorm(-1.8, 0.5) ( logNorm based on data ) Simulation Study

8 Gamma distributions superimposed Non-standard distributions of DE genes Av. est. π 0 = ± Av. est. π 0 = ± Av. est. π 0 = ± = 0.3 = 0.5 = 0.8 π 0 = 0.8

9 Small number of DE genes / Small number of replicate arrays True π 0 = 0.95 True π 0 = replicates Av. FDR = 7.0 % Av. FNR = 2.0 % Av. est. π 0 = ± replicates Av. FDR = 17.9 % Av. FNR = 3.6 % Av. est. π 0 = ± replicates Av. FDR = 9.2 % Av. FNR = 0.6 % Av. est. π 0 = ± replicates Av. FDR = 17.6 % Av. FNR = 0.9 % Av. est. π 0 = ± 0.007

10 Asymmetric distributions of over/under-expressed genes True π 0 = 0.9 True π 1 = 0.09 True π 2 = 0.01 Av. est. π 0 = ± Av. est. π 1 = ± Av. est. π 2 = ± d g ~ 0 δ (0.6 Unif( 0.01, 1.7 ) N(1.7, 0.8) ) + 2 (0.6 Unif( -0.7, ) N( -0.7, 0.8) )

11 1) FDR / FNR can be estimated well Additional Checks 50 simulations of same set-up: Av. est. π 0 = No genes are declared to be DE. 2) Model works when there are no DE genes True FDR Est. FDR True FNR Est. FNR

12 Comparison with conjugate mixture prior Replace d g ~ 0 δ G_ (1.5, 1 ) + 2 G + (1.5, 2 ) with d g ~ 0 δ N(0, c g 2 ) NB: We estimate both c and 0 in fully Bayesian way. True 0 Est. 0 with Gamma prior Est. 0 with conjugate prior ± ± ± ± ± ± ± 0.001

13 Application to Mouse data Mouse wildtype (WT) and knock-out (KO) data (Affymetrix) ~ genes, 8 replicates in each WT and KO Gamma prior Est. π 0 = ± Declares 59 genes DE

14 Summary Good performance of fully Bayesian mixture model –can estimate proportion of DE genes in variety of situations –accurate estimation of FDR / FNR Different mixture priors give similar classification results Gives reasonable results for real data