Presentation is loading. Please wait.

Presentation is loading. Please wait.

Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120.

Similar presentations


Presentation on theme: "Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120."— Presentation transcript:

1 Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120

2 Outline Introduction Methodology Experimental Results Conclusion

3 Introduction Cyclical biological processes : Cell cycle, hair growth cycle, mammary cycle and circadian rhythms Produce coordinated periodic expression of thousands of genes. Existing computational methods are biased toward discovering genes that follow sine-wave patterns. The objective is to identify or rank which of these genes are most likely to be periodically regulated.

4 Introduction Two major categories : Frequency domain Compute the spectrum of the average expression profile for each probe. Test the significance of the dominant frequency against a suitable null hypothesis such as uncorrelated noise. Not well suited for short time courses. Time domain Identification of sinusoidal expression patterns Simple and computational efficiency Not effective at finding periodic signals which violate the sinusoidal assumption.

5 Introduction In this article, a general statistical framework for detecting periodic profiles from time course Analyzing the similarity of observed profiles across the cycles. discover periodic transcripts of arbitrary shapes from replicated gen expression profiles. Provide an empirical Bayes procedure for estimating parameters of the prior distribution. Derive closed-formed expressions for the posterior probability of periodicity.

6 Introduction Expression profiles from the murine liver time course data set. Two of these probe sets (NrIdI and Arntl) correspond to well-established clock-control genes.

7 Methodology Probabilistic mixture model: Differentially expressed genes change their expression level in response to changes in experimental conditions Background genes remains constant throughout the experiment Coordinated expression across multiple cycles Model periodic phenomena

8 Methodology Mode the data using a mixture of three components for background, differentially and periodically expressed profiles. Compute the posterior probability that a given probe set was generated by the periodic component.

9 Methodology A probabilistic model for periodicity N probe sets over C cycles of known length. Each cycle is represented by the same grid of T time points, indexed from 1 to T. Denote the number of replicate observations for probe set at time point of cycle by. : the expression intensity value for a particular probe set i, time point j and replicate k for cycle c. : the entire set of observations for probe set i.

10 Methodology Our probabilistic model for expression, then consists of three components : background(b), differentailly expressed but aperiodic (d) and periodically expressed profiles (p). Let denote the component associated with probe set i. Each of the three component models consists of Normal/Inverse Gamma (NIG) prior distribution on the latent profile and additional Normal noise on the observations.

11 Methodology Normal/Inverse Gamma (NIG) prior is a flexible and computationally convenient distribution commonly used as a prior model for latent expression levels and replicate variability. Scalar variables are distributed as NIG with parameters. : inverse Gamma distribution with a degrees of freedom and scale parameters b, evaluated at x.

12 Methodology Three type of unknown quantities: The prior parameters, denoted  Determine via an empirical Bayesian procedure Subsequently treated as known and fixed Probe set-specific hidden variables: the latent profiles (consisting of a mean and variance) for each component. The component identify, indicating from which component the data ware generated.

13 Methodology The observed profiles Y and latent variables Z (component identity) and { ,  } N probes sets, repeat N times

14 Methodology The background component model: NIG prior shared by all background probe sets and parameterized by four scalars Y i are modeled as independent samples from a Gaussian distribution with mean and variance

15 Methodology The differentially expressed component model: and be (C x T)-dimensional vector The prior distribution for this component is defined by four (C x T) –dimensional parameters, Mode observations as being independent given :

16 Methodology The periodic component model: Assume repeated expression of the same pattern across multiple cycles and are T-dimensional variables encoding expression levels and replicate variability in the ‘ideal’ cycle.

17 Methodology The complete set of prior parameters  includes the prior component probabilities  z (corresponding to the relative frequencies of background, differentially expressed, and periodic probe sets)

18 Methodology Inference Detect periodic expression by computing the posterior probability of the periodic component

19 Methodology An analysis of variance periodicity detector The resulting inferential test for periodicity is quite close to a simplified, non-Bayesian test based on analysis of variance (ANOVA). Construct ANOVA test Dividing the data into groups by their associated time points regardless of cycle number All replicates for c=1,..,C and k=1,…, fall into the same group

20 Methodology test whether the data support separation into these groups whether the amount of variation between groups is significantly larger than the variation found within the groups. High values of the ratio of these quantities indicated that most of the variability in observations can be explained using a time-dependent, cycle-independent profile,

21 Methodology Estimating parameters of the prior distribution: Develop an empirical Bayes procedure to determine the prior parameters  Determine a tentative assignment of probe set to each component Use this assignment to find approximate maximum likelihood estimates of the location scale  and parameter of the inverse Gamma distribution (a,b); we set the location mean to o in all three components.

22 Methodology To find a tentative initial assignment of probe sets for estimating prior parameters: Run ANOVA detector of differential expression and periodicity. To define parameters of the component for differential expression Probe sets that vary significantly over time (P<0.01) To define the parameters of the background components: Probe sets which fail this test (P>0.1) probe sets for estimating the prior parameters of the periodic component choosing those probe sets with P<0.001 results in a number of probe sets similar to that previously identified in the literature.

23 Experimental Results Demonstrate the model can effectively identify both sinusoidal and non-sinusoidal periodic expression pattern. It is widely believed that 5-10% of transcribed genes may be under circadian regulation, with some studies suggesting a higher proportion – up to 50% in murine liver. The datasets analyzed in this article contain gene expression profiles of liver and skeletal muscle tissues in mice.

24 Experimental Results Sine-wave detection: Use the sine-wave matching algorithm of Straume (2004). Identify 848 distinct rhythmic prove sets in liver and 383 such probe sets in skeletal muscle. Model-based detection: Among the top 25 probe sets there are nine that were not among the top 400 ranked by sine-wave matching. Profile peak or drop at a single time point are poorly matched to a sinusoid shape.

25 Experimental Results

26 Tns3 is just the single probe set that ranked above 25 by the sin-wave method but below 400 by the model. Conforms to the sine-wave pattern, but possesses a very small amplitude, and is assigned to the background component by the model. All of the other probe sets that were so highly ranked by the sine-wave method received posterior probabilities of periodicity >0.9 from our model.

27 Conclusion We argue that in typical experiments with only a small number of samples per cycle, we should test for arbitrary patterns which are repeated between cycles, rather than parametric shapes. To this end, we propose a Bayesian mixture model for identifying patterns of unconstrained shape, which stand out as both differentially and periodically expressed.


Download ppt "Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120."

Similar presentations


Ads by Google