Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,

Similar presentations


Presentation on theme: "Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,"— Presentation transcript:

1 Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, China Vol. 26 2010 pages i168~i174 BIOINFORMATICS

2 Outline Introduction Methods Results and Discussion Conclusion

3 Introduction Circadian rhythmic analysis of temporal microarray data is very challenging: – Short time-series – Low sampling frequency – High level of noise

4 Introduction Two major categories : time-domain and frequency-domain – Time domain Typically, sinusoid-based. Model dependent. Noise sensitive COSOPT (Straume, 2004) – Frequency domain Spectral analysis Noise-tolerant Fisher’s G-test (Wichert et al., 2004)

5 Introduction The proposed algorithm named ARSER – Combines time domain and frequency domain analysis – Employ autoregressive spectral estimation to predict an expression profile’s periodicity – Model the rhythmic patterns by using a harmonic regression model to fit the time-series

6 Methods Overview

7 Methods Period detection – Circadian rhythm has approximately (but never exactly) 24h periodicity – To measure the length of actual period for each gene expression profile – ARSER estimate the period by AR spectral estimation, which is a high-resolution spectral analysis

8 Methods – ARSER implements the AR model-fitting Setting Compute the AR coefficients – AR spectral analysis After AR modeling, estimates the spectrum, with model parameters instead of original data If periodic signals are present in the time-series, then the spectrum will show peaks at dominant frequencies. At high frequencies the noise signals may also show peaks known a s pseudo-periods.

9 Methods – Step-by-step procedure:

10 Methods Rhythm modeling and gene selection – ARSER employs the harmonic regression model to represent the cycle trends – Reduce to an simpler multiple linear regression model :

11 Methods Generating simulation data – Consist of periodic and non-periodic time-series data – Periodic time-series Stationary model Non-stationary model – The mean level and amplitude exponentially decay over time – Non-stationary model is more likely to approximate the circadian rhythm

12 Results and Discussion Robustness to noise – Compare the robustness of three algorithm to noise Generate 10000 stationary and 10000 non-stationary periodic signals – There were 2000 periodic time-series under each SNR, 5:1, 4:1, 3:1, 2:1, 1:1

13 Results and Discussion – Accuracy of ARSER, COSOPT and Fisher’s G-test at identifying (A) stationary and (B) non-stationary periodic signals under decreasing signal-to-noise ratio Matthew’s correlation coefficient (MCC) is in essence a correlation coefficient between the observed the predicted binary classifications, with values between -1 and +1.

14 Results and Discussion Correctness for predicting wavelength – Distribution of the difference between the predicted wavelength and the actual wavelength of stationary periodic signals and non- stationary periodic signals Log transformation for the difference (  ) was carried out using ln(1+|  |)

15 Results and Discussion Periodicity detection with random background models – To separate periodic from non-periodic signals, which measure the sensitivity and specificity in the predictions

16 Results and Discussion – (receiver operating characteristic) ROC curves for identifying periodic signals from four datasets (A)10000 stationary periodic signals and 10000 white noise signals (B)10000 non-stationary periodic signal and 10000 white noise signals (C)10000 stationary periodic signals and 10000 AR(1)-based random signals (D)10000 non-statioinary periodic signals and 10000 AR(1)-based random signals

17 Results and Discussion Detection of non-sinusoidal periodic waveforms – The testing dataset was downloaded from the HAYSTACJ web site (http://haystack.cgrb.oregonstate.edu)http://haystack.cgrb.oregonstate.edu – This dataset included five cycling patterns based on circadian time-course studies : rigid, spike, sine and two box-like patterns – A total of 120 time-series in the dataset.

18 Results and Discussion – Comparison of detecting multiple waveforms (A) Number of samples from five periodic waveforms and random signal (B) Waveforms include sinusoidal (red) and non- sinusoidal :rigid waves, box1 waves, box2 wave and spike waves.

19 Results and Discussion Analysis of Arabidopsis circadian expression data – Apply ARSER to analyze a real microarray dataset Arabidopsis circadian system (http://millar.cio.ed.ac.uk/data.htm)http://millar.cio.ed.ac.uk/data.htm – 13 data points, representing 48h of observation obtained at 4h samploing intervals

20 Results and Discussion – Area-proportional Venn diagram addresses the predictive power of three algorithm for identifying Arabidopsis circadian-regulated genes The microarray data were originally analyzed by COSOPT in the study of Edwards et al. (2006) and scored 3504 genes as rhythmic(pMMC-  <0.05) A total of 4929 genes were identified by ARSER (FDR q<0.05) Only 536 were found by Fisher’s G-test (FDR q<0.05)

21 Results and Discussion – A set of 1549 transcripts were uniquely identified as rhythmic by ARSER – Principal component analysis of the newly found rhythmic transcripts in Arabidopsis identified by ARSER (A)Relative variance for the first nine components (B)The First eigengenes (C)The second eigengenes (D)The Third eigengenes.

22 Results and Discussion – The first three principal components account for 78% of the variance, – The first and second eigengenes are cyclic with spike-like patterns – The third show a linear trend, indicating the non- stationary feature of the data. – These data reveal that non-sinusoidal and non- stationary periodic transcripts could be found by applying ARSER

23 Results and Discussion – Dodd et al. (2007) reported 27 well-known clock- associated genes in Arabidopsis – Two of these genes were found among the newly identified genes CR1 : functions as a photoreceptor PRR9 : play an important role in response to temperature signals Rank the 22810 genes of the entire Arabidopsis genome in order of the statistical significance of their expression profiles

24 Conclusion Employ harmonic regression based on AR spectral analysis to identify and model circadian rhythms Analyze the time-series through both frequency and time domains


Download ppt "Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,"

Similar presentations


Ads by Google