Presentation on theme: "Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle."— Presentation transcript:
Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle Park, NC
Outline of the talk Some objectives for performing “long series” time-course experiments A. Single cell-cycle experiment – A nonlinear regression model – Phase angle of a cell cycle gene – Inference – Open research problems B. Multiple cell-cycle experiments – “Coherence” between multiple cell-cycle experiments – Illustration – Open research problems
Objectives Some genes play an important role during the cell division cycle process. They are known as “cell- cycle genes”. Objectives: Investigate various characteristics of cell-cycle and/or circadian genes such as: –Amplitude of initial expression –Period –Phase angle of expression (angle of maximum expression for a cell cycle gene)
A brief description G1 phase: "GAP 1". For many cells, this phase is the major period of cell growth during its lifespan. S ("Synthesis”) phase: DNA replication occurs.
A brief description G2 phase: "GAP 2“: Cells prepare for M phase. The G2 checkpoint prevents cells from entering mitosis when DNA was damaged since the last division, providing an opportunity for DNA repair and stopping the proliferation of damaged cells. M (“Mitosis”) phase: Nuclear (chromosomes separate) and cytoplasmic (cytokinesis) division occur. Mitosis is further divided into 4 phases.
Whitfield et al. (Molecular Biology of the Cell, 2002) Basic design is as follows: Experimental units: Human cancer cells (HeLa) Microarray platform: cDNA chips used with approx 43000 probes (i.e. roughly 29000 genes) 3 different patterns of time points (i.e. 3 different experiments) One of the goals of these experiments was to identify periodically expressed genes.
Whitfield et al. (Molecular Biology of the Cell, 2002) Experiment 1: (26 time points) Hela cancer cells arrested in the S-phase using double thymidine block. Sampling times after arrest (hrs): – 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 18 20 22 24 26 28 32 36 40 44.
Whitfield et al. (2002) Experiment 2: (47 time points) Hela cancer cells arrested in the S-phase using double thymidine block. Sampling times after arrest (hrs): – every hour between 0 and 46.
Whitfield et al. (2002) Experiment 3: (19 time points) Hela cancer cells arrested arrested in the M- phase using thymidine and then by nocodazole. Sampling times after arrest (hrs): – 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36.
Questions Can we describe the gene expression of a cell- cycle gene as a function of time? Can we determine the phase angle for a given cell- cycle gene? i.e. can we quantify the previous table in terms of angles on a circle? What is the period of expression for a given gene? Can we test the hypothesis that all cell-cycle genes share the same time period? Etc.
Some important observations 1. Gene expression has a sinusoidal shape 2. Gene expression for a given gene is an average value of mRNA levels across a large number of cells 3. Duration of cell cycle varies stochastically across cells 4. Initially cells are synchronized but over time they fall out of synchrony 5. Gene expression of a cell-cycle gene is expected to “decrease/decay” over time. This is because of items 2 and 4 listed above!
Random Periods Model (PNAS, 2004) a and b: background drift parameters K: the initial amplitude T:the average period the attenuation parameter the phase angle
A hypothesis of biological interest Do all cell cycle genes have same T and same but the other 4 parameters are gene specific? i.e.
An Important Feature Correlated data –Temporal correlation within gene –Gene-to-gene correlations
Test Statistic Wald statistic for heteroscedastic linear and non- linear models –Zhang, Peddada and Rogol (2000) –Shao (1992) –Wu (1986)
The Null Distribution Due to the underlying correlation structure –Asymptotic approximation is not appropriate. –Use moving-blocks bootstrap technique on the residuals of the nonlinear model. Kunsch (1989)
Moving-blocks Bootstrap Step 1: Fit the null model to the data and compute the residuals. Step 2: Draw a simple random sample (with replacement) from all possible blocks, of a specific size, of consecutive residuals.
Moving-blocks Bootstrap Step 3: Add these residuals to the fitted curve under the null hypothesis to obtain the bootstrap data set Step 4: Using the bootstrap data fit the model under the alternate hypothesis and compute the Wald statistic.
Moving-blocks Bootstrap Step 5: Repeat the above steps a large number of times. Step 6: The bootstrap p-value is the proportion of the above Wald statistics that exceed the Wald statistic determined from the actual data.
Analysis of experiment 2 The bootstrap p-value for testing using Experiment 2 data of Whitfield et al. (2002) is 0.12. Thus our model is biologically plausible.
Statistical inferences on the phase angle Multiple experiments
Some questions of interest How to evaluate or combine results from multiple cell division cycle experiments? – Are the results “consistent” across experiments? How to evaluate this? What could be a possible criterion?
Data : RPM estimate of phase angle of a cell-cycle gene ‘g’ from the experiment.
Representation using a circle Consider 4 cell cycle genes A, B, C, D. The vertical line in the circle denotes the reference line. The angles are measured in a counter-clockwise. Thus the sequential order of expression in this example is A, B, D, C. A D B C
“Coherence” in multiple cell-cycle experiments A group of cell cycle genes are said to be coherent across experiments if their sequential order of the phase angles is preserved across experiments. A D B C D A C B B C D A Exp 1 Exp 2 Exp 3
Geometric Representation We shall represent phase angles from multiple cell cycle experiments using concentric circles. Each circle represents an experiment. Same gene from a pair of experiments is connected by a line segment. –A figure with non-intersecting lines indicates perfect coherence. –If there is no coherence at all then there will be many intersecting lines.
Estimated Phase Angles Due to statistical errors in estimation, the estimated phase angles from multiple cell cycle experiments need not preserve the sequential order even though the true phase angles are in a sequential order.
Some background on regression for circular data
Experiment A Experiment B Question: Can we determine a rotation matrix A such that we can rotate the circle representing Experiment A to obtain the circle representing Experiment B?
Angle of rotation for a rigid body Yes! By solve the following minimization problem:
Determination of Coherence Across “k” Experiments
The Basic Idea Consider a rigid body rotating in a plane. Suppose the body is perfectly rigid with no deformations. Let denote the 2x2 rotation matrices from experiment i to i+1 (k+1 = 1). Then Alternatively
The Basic Idea Equivalently, if Then under perfect rigid body motion we should have
Problem! In the present context we do NOT necessarily have a rigid body! – Not all experiments are performed with same precision. – The time axis may not be constant across experiments. – Number of time points may not be same across experiments. – Etc.
Example: Not a rigid motion but perfectly coherent
Consequence Rotation matrix A alone may not be enough to bring two circles to congruence! An additional “association/scaling” parameter may be needed as see in the previous figure!
Circular-Circular regression model for a pair of experiments (Downs and Mardia, 2002) For, let denote a pair of angular variables. Suppose is von-Mises distributed with mean direction and concentration parameter
Circular-Circular Regression Model (Downs and Mardia, 2002) The regression model is given by the link function
Determination Of Coherence Suppose we have K experiments, labeled as 1, 2, 3, …, K. Let denote the angle of rotation for the regression of i on j for a group of g genes. Compute Note.
Determination Of Coherence We expect under no coherence to be “stochastically” larger than under coherence.
Comparison of Cumulative Distribution Functions Blue line: Coherence Pink line: No Coherence
Determination Of Coherence For a given data compute Generate the bootstrap distribution of under the null hypothesis of no coherence.
Bootstrap P-value For Coherence Let denote the angle of rotation using the bootstrap sample. Then the P-value is:
Illustration: Whitfield et al. data There are 3 experiments. The phase angles of each gene was estimated using Liu et al., (2004) model. A total of 47 common cell-cycling genes were selected from the three experiments.
Estimates The estimated values of interest are Note that
Conclusion Since the bootstrap P-value < 0.05, we conclude that the three experiments are coherent.
Statistical inferences on the phase angle - Some open problems
Estimation subject to inequality constraints It is reasonable to hypothesize that for a normal cell division cycle, the p phase marker genes must express in an order around the unit circle. Thus they must satisfy:
Open problems - data from single experiment How to estimate the phase angles subject to the simple order restriction? More generally - wow to estimate the phase angles subject isotropic simple order restriction? How to test the above hypothesis? What are the null and alternative hypotheses?
Open problems – data from multiple experiments How do we estimate the phase angles from multiple experiments under the order restriction on the phase angles of cell cycle genes? What are the statistical errors associated with such an estimator? How to construct confidence intervals and test hypotheses?
Acknowledgments Delong Liu (former Post-doc at NIEHS) David Umbach (NIEHS) Leping Li (NIEHS) Clare Weinberg (NIEHS) Pat Crocket (Constella Group) Cristina Rueda (Univ. of Valladolid, Spain) Miguel Fernandez (Univ. of Valladolid, Spain)