Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 1 review: Quizzes 1-6.
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Introduction Relative weights can be estimated by fitting a linear model using responses from individual trials: where g is the linking function. Relative.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
1 Detection and Analysis of Impulse Point Sequences on Correlated Disturbance Phone G. Filaretov, A. Avshalumov Moscow Power Engineering Institute, Moscow.
LHC Collimation Working Group – 19 December 2011 Modeling and Simulation of Beam Losses during Collimator Alignment (Preliminary Work) G. Valentino With.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Gene Ontology Models and Tests Mark Reimers, NCI.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Chapter 11 Multiple Regression.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Today Concepts underlying inferential statistics
CH#3 Fourier Series and Transform
Zbigniew LEONOWICZ, Tadeusz LOBOS Wroclaw University of Technology Wroclaw University of Technology, Poland International Conference.
BOX JENKINS METHODOLOGY
Traffic modeling and Prediction ----Linear Models
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,
Lecture 1 Signals in the Time and Frequency Domains
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
TIME SERIES by H.V.S. DE SILVA DEPARTMENT OF MATHEMATICS
10 IMSC, August 2007, Beijing Page 1 An assessment of global, regional and local record-breaking statistics in annual mean temperature Eduardo Zorita.
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
1 Enviromatics Environmental time series Environmental time series Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Controlling FDR in Second Stage Analysis Catherine Tuglus Work with Mark van der Laan UC Berkeley Biostatistics.
Signals CY2G2/SE2A2 Information Theory and Signals Aims: To discuss further concepts in information theory and to introduce signal theory. Outcomes:
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
1 Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms Critical Assessment.
Peng Lei Beijing University of Aeronautics and Astronautics IGARSS 2011, Vancouver, Canada July 26, 2011 Radar Micro-Doppler Analysis and Rotation Parameter.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
LIGO- G Z Analyzing Event Data Lee Samuel Finn Penn State University Reference: T030017, T
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
Z bigniew Leonowicz, Wroclaw University of Technology Z bigniew Leonowicz, Wroclaw University of Technology, Poland XXIX  IC-SPETO.
LIGO-G Z Status of MNFTmon Soma Mukherjee +, Roberto Grosso* + University of Texas at Brownsville * University of Nuernberg, Germany LSC Detector.
Cluster validation Integration ICES Bioinformatics.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Autoregressive (AR) Spectral Estimation
WCRP Extremes Workshop Sept 2010 Detecting human influence on extreme daily temperature at regional scales Photo: F. Zwiers (Long-tailed Jaeger)
Analysis of Traction System Time-Varying Signals using ESPRIT Subspace Spectrum Estimation Method Z. Leonowicz, T. Lobos
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Detecting Signal from Data with Noise Xianyao Chen Meng Wang, Yuanling Zhang, Ying Feng Zhaohua Wu, Norden E. Huang Laboratory of Data Analysis and Applications,
Performance of Digital Communications System
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Outline Random variables –Histogram, Mean, Variances, Moments, Correlation, types, multiple random variables Random functions –Correlation, stationarity,
2 June 2003Soumya D. Mohanty et al, LIGO-G D1 LASER INTERFEROMETER GRAVITATIONAL WAVE OBSERVATORY - LIGO – CALIFORNIA INSTITUTE OF TECHNOLOGY.
CLASSIFICATION OF ECG SIGNAL USING WAVELET ANALYSIS
CH#3 Fourier Series and Transform 1 st semester King Saud University College of Applied studies and Community Service 1301CT By: Nour Alhariqi.
Loco: Distributing Ridge Regression with Random Projections Yang Song Department of Statistics.
Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice.
Part II Physical Layer Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Signal, Noise, and Variation in Neural and Sensory-Motor Latency
Volume 11, Issue 3, Pages (March 2018)
15.1 The Role of Statistics in the Research Process
Ultradian Rhythms in the Transcriptome of Neurospora crassa
Volume 11, Issue 3, Pages (March 2018)
Geology 491 Spectral Analysis
Maria S. Robles, Sean J. Humphrey, Matthias Mann  Cell Metabolism 
Chap 7: Seasonal ARIMA Models
Presentation transcript:

Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, China Vol pages i168~i174 BIOINFORMATICS

Outline Introduction Methods Results and Discussion Conclusion

Introduction Circadian rhythmic analysis of temporal microarray data is very challenging: – Short time-series – Low sampling frequency – High level of noise

Introduction Two major categories : time-domain and frequency-domain – Time domain Typically, sinusoid-based. Model dependent. Noise sensitive COSOPT (Straume, 2004) – Frequency domain Spectral analysis Noise-tolerant Fisher’s G-test (Wichert et al., 2004)

Introduction The proposed algorithm named ARSER – Combines time domain and frequency domain analysis – Employ autoregressive spectral estimation to predict an expression profile’s periodicity – Model the rhythmic patterns by using a harmonic regression model to fit the time-series

Methods Overview

Methods Period detection – Circadian rhythm has approximately (but never exactly) 24h periodicity – To measure the length of actual period for each gene expression profile – ARSER estimate the period by AR spectral estimation, which is a high-resolution spectral analysis

Methods – ARSER implements the AR model-fitting Setting Compute the AR coefficients – AR spectral analysis After AR modeling, estimates the spectrum, with model parameters instead of original data If periodic signals are present in the time-series, then the spectrum will show peaks at dominant frequencies. At high frequencies the noise signals may also show peaks known a s pseudo-periods.

Methods – Step-by-step procedure:

Methods Rhythm modeling and gene selection – ARSER employs the harmonic regression model to represent the cycle trends – Reduce to an simpler multiple linear regression model :

Methods Generating simulation data – Consist of periodic and non-periodic time-series data – Periodic time-series Stationary model Non-stationary model – The mean level and amplitude exponentially decay over time – Non-stationary model is more likely to approximate the circadian rhythm

Results and Discussion Robustness to noise – Compare the robustness of three algorithm to noise Generate stationary and non-stationary periodic signals – There were 2000 periodic time-series under each SNR, 5:1, 4:1, 3:1, 2:1, 1:1

Results and Discussion – Accuracy of ARSER, COSOPT and Fisher’s G-test at identifying (A) stationary and (B) non-stationary periodic signals under decreasing signal-to-noise ratio Matthew’s correlation coefficient (MCC) is in essence a correlation coefficient between the observed the predicted binary classifications, with values between -1 and +1.

Results and Discussion Correctness for predicting wavelength – Distribution of the difference between the predicted wavelength and the actual wavelength of stationary periodic signals and non- stationary periodic signals Log transformation for the difference (  ) was carried out using ln(1+|  |)

Results and Discussion Periodicity detection with random background models – To separate periodic from non-periodic signals, which measure the sensitivity and specificity in the predictions

Results and Discussion – (receiver operating characteristic) ROC curves for identifying periodic signals from four datasets (A)10000 stationary periodic signals and white noise signals (B)10000 non-stationary periodic signal and white noise signals (C)10000 stationary periodic signals and AR(1)-based random signals (D)10000 non-statioinary periodic signals and AR(1)-based random signals

Results and Discussion Detection of non-sinusoidal periodic waveforms – The testing dataset was downloaded from the HAYSTACJ web site ( – This dataset included five cycling patterns based on circadian time-course studies : rigid, spike, sine and two box-like patterns – A total of 120 time-series in the dataset.

Results and Discussion – Comparison of detecting multiple waveforms (A) Number of samples from five periodic waveforms and random signal (B) Waveforms include sinusoidal (red) and non- sinusoidal :rigid waves, box1 waves, box2 wave and spike waves.

Results and Discussion Analysis of Arabidopsis circadian expression data – Apply ARSER to analyze a real microarray dataset Arabidopsis circadian system ( – 13 data points, representing 48h of observation obtained at 4h samploing intervals

Results and Discussion – Area-proportional Venn diagram addresses the predictive power of three algorithm for identifying Arabidopsis circadian-regulated genes The microarray data were originally analyzed by COSOPT in the study of Edwards et al. (2006) and scored 3504 genes as rhythmic(pMMC-  <0.05) A total of 4929 genes were identified by ARSER (FDR q<0.05) Only 536 were found by Fisher’s G-test (FDR q<0.05)

Results and Discussion – A set of 1549 transcripts were uniquely identified as rhythmic by ARSER – Principal component analysis of the newly found rhythmic transcripts in Arabidopsis identified by ARSER (A)Relative variance for the first nine components (B)The First eigengenes (C)The second eigengenes (D)The Third eigengenes.

Results and Discussion – The first three principal components account for 78% of the variance, – The first and second eigengenes are cyclic with spike-like patterns – The third show a linear trend, indicating the non- stationary feature of the data. – These data reveal that non-sinusoidal and non- stationary periodic transcripts could be found by applying ARSER

Results and Discussion – Dodd et al. (2007) reported 27 well-known clock- associated genes in Arabidopsis – Two of these genes were found among the newly identified genes CR1 : functions as a photoreceptor PRR9 : play an important role in response to temperature signals Rank the genes of the entire Arabidopsis genome in order of the statistical significance of their expression profiles

Conclusion Employ harmonic regression based on AR spectral analysis to identify and model circadian rhythms Analyze the time-series through both frequency and time domains