Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Analysis of Variance Outlines: Designing Engineering Experiments
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Visual Recognition Tutorial
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Lecture 5: Learning models using EM
Differentially expressed genes
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
CHAPTER 6 Statistical Analysis of Experimental Data
Visual Recognition Tutorial
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Today Concepts underlying inferential statistics
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Confidence intervals and hypothesis testing Petter Mostad
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
ANOVA: Analysis of Variance.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Clustering Features in High-Throughput Proteomic Data Richard Pelikan (or what’s left of him) BIOINF 2054 April
Statistics for Differential Expression Naomi Altman Oct. 06.
Sample Size Determination Text, Section 3-7, pg. 101 FAQ in designed experiments (what’s the number of replicates to run?) Answer depends on lots of things;
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel.
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
Canadian Bioinformatics Workshops
Estimation of Gene-Specific Variance
Bayesian Semi-Parametric Multiple Shrinkage
Dynamical Statistical Shape Priors for Level Set Based Tracking
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
PSY 626: Bayesian Statistics for Psychological Science
Parametric Empirical Bayes Methods for Microarrays
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Chapter 10 Introduction to the Analysis of Variance
Introduction to the t Test
Presentation transcript:

Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages

Outline Introduction Methodology Experimental Results Conclusion

Introduction Cyclical biological processes : Cell cycle, hair growth cycle, mammary cycle and circadian rhythms Produce coordinated periodic expression of thousands of genes. Existing computational methods are biased toward discovering genes that follow sine-wave patterns. The objective is to identify or rank which of these genes are most likely to be periodically regulated.

Introduction Two major categories : Frequency domain Compute the spectrum of the average expression profile for each probe. Test the significance of the dominant frequency against a suitable null hypothesis such as uncorrelated noise. Not well suited for short time courses. Time domain Identification of sinusoidal expression patterns Simple and computational efficiency Not effective at finding periodic signals which violate the sinusoidal assumption.

Introduction In this article, a general statistical framework for detecting periodic profiles from time course Analyzing the similarity of observed profiles across the cycles. discover periodic transcripts of arbitrary shapes from replicated gen expression profiles. Provide an empirical Bayes procedure for estimating parameters of the prior distribution. Derive closed-formed expressions for the posterior probability of periodicity.

Introduction Expression profiles from the murine liver time course data set. Two of these probe sets (NrIdI and Arntl) correspond to well-established clock-control genes.

Methodology Probabilistic mixture model: Differentially expressed genes change their expression level in response to changes in experimental conditions Background genes remains constant throughout the experiment Coordinated expression across multiple cycles Model periodic phenomena

Methodology Mode the data using a mixture of three components for background, differentially and periodically expressed profiles. Compute the posterior probability that a given probe set was generated by the periodic component.

Methodology A probabilistic model for periodicity N probe sets over C cycles of known length. Each cycle is represented by the same grid of T time points, indexed from 1 to T. Denote the number of replicate observations for probe set at time point of cycle by. : the expression intensity value for a particular probe set i, time point j and replicate k for cycle c. : the entire set of observations for probe set i.

Methodology Our probabilistic model for expression, then consists of three components : background(b), differentailly expressed but aperiodic (d) and periodically expressed profiles (p). Let denote the component associated with probe set i. Each of the three component models consists of Normal/Inverse Gamma (NIG) prior distribution on the latent profile and additional Normal noise on the observations.

Methodology Normal/Inverse Gamma (NIG) prior is a flexible and computationally convenient distribution commonly used as a prior model for latent expression levels and replicate variability. Scalar variables are distributed as NIG with parameters. : inverse Gamma distribution with a degrees of freedom and scale parameters b, evaluated at x.

Methodology Three type of unknown quantities: The prior parameters, denoted  Determine via an empirical Bayesian procedure Subsequently treated as known and fixed Probe set-specific hidden variables: the latent profiles (consisting of a mean and variance) for each component. The component identify, indicating from which component the data ware generated.

Methodology The observed profiles Y and latent variables Z (component identity) and { ,  } N probes sets, repeat N times

Methodology The background component model: NIG prior shared by all background probe sets and parameterized by four scalars Y i are modeled as independent samples from a Gaussian distribution with mean and variance

Methodology The differentially expressed component model: and be (C x T)-dimensional vector The prior distribution for this component is defined by four (C x T) –dimensional parameters, Mode observations as being independent given :

Methodology The periodic component model: Assume repeated expression of the same pattern across multiple cycles and are T-dimensional variables encoding expression levels and replicate variability in the ‘ideal’ cycle.

Methodology The complete set of prior parameters  includes the prior component probabilities  z (corresponding to the relative frequencies of background, differentially expressed, and periodic probe sets)

Methodology Inference Detect periodic expression by computing the posterior probability of the periodic component

Methodology An analysis of variance periodicity detector The resulting inferential test for periodicity is quite close to a simplified, non-Bayesian test based on analysis of variance (ANOVA). Construct ANOVA test Dividing the data into groups by their associated time points regardless of cycle number All replicates for c=1,..,C and k=1,…, fall into the same group

Methodology test whether the data support separation into these groups whether the amount of variation between groups is significantly larger than the variation found within the groups. High values of the ratio of these quantities indicated that most of the variability in observations can be explained using a time-dependent, cycle-independent profile,

Methodology Estimating parameters of the prior distribution: Develop an empirical Bayes procedure to determine the prior parameters  Determine a tentative assignment of probe set to each component Use this assignment to find approximate maximum likelihood estimates of the location scale  and parameter of the inverse Gamma distribution (a,b); we set the location mean to o in all three components.

Methodology To find a tentative initial assignment of probe sets for estimating prior parameters: Run ANOVA detector of differential expression and periodicity. To define parameters of the component for differential expression Probe sets that vary significantly over time (P<0.01) To define the parameters of the background components: Probe sets which fail this test (P>0.1) probe sets for estimating the prior parameters of the periodic component choosing those probe sets with P<0.001 results in a number of probe sets similar to that previously identified in the literature.

Experimental Results Demonstrate the model can effectively identify both sinusoidal and non-sinusoidal periodic expression pattern. It is widely believed that 5-10% of transcribed genes may be under circadian regulation, with some studies suggesting a higher proportion – up to 50% in murine liver. The datasets analyzed in this article contain gene expression profiles of liver and skeletal muscle tissues in mice.

Experimental Results Sine-wave detection: Use the sine-wave matching algorithm of Straume (2004). Identify 848 distinct rhythmic prove sets in liver and 383 such probe sets in skeletal muscle. Model-based detection: Among the top 25 probe sets there are nine that were not among the top 400 ranked by sine-wave matching. Profile peak or drop at a single time point are poorly matched to a sinusoid shape.

Experimental Results

Tns3 is just the single probe set that ranked above 25 by the sin-wave method but below 400 by the model. Conforms to the sine-wave pattern, but possesses a very small amplitude, and is assigned to the background component by the model. All of the other probe sets that were so highly ranked by the sine-wave method received posterior probabilities of periodicity >0.9 from our model.

Conclusion We argue that in typical experiments with only a small number of samples per cycle, we should test for arbitrary patterns which are repeated between cycles, rather than parametric shapes. To this end, we propose a Bayesian mixture model for identifying patterns of unconstrained shape, which stand out as both differentially and periodically expressed.