Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing Time Series Gene Expression Data

Similar presentations


Presentation on theme: "Analyzing Time Series Gene Expression Data"— Presentation transcript:

1 Analyzing Time Series Gene Expression Data
Ziv Bar-Joseph Center for Automated Learning and Discovery Carnegie Mellon University 06/2004 Analyzing Time Series Expression Data

2 Expression Experiments
Time series: Multiple arrays at various temporal intervals Static: Snapshot of the activity in the cell 06/2004 Analyzing Time Series Expression Data

3 Abundance of time series expression datasets
Over 30% of the 170 papers perform time series experiments. A total of 220 time series datasets. More arrays used for time series than for static expression experiments. 06/2004 Analyzing Time Series Expression Data

4 Analyzing Time Series Expression Data
06/2004 Analyzing Time Series Expression Data

5 Unique features of time series expression experiments
Autocorrelation between successive points. Can identify complete set of acting genes. Allows to infer causality. 06/2004 Analyzing Time Series Expression Data

6 Time Series Examples: Development
Development of fruit flies [Arbeitman, Science 02] 06/2004 Analyzing Time Series Expression Data

7 Time Series Examples (cont)
Function Infectious diseases, response to external stimulus Interactions and Systems Transcription factors knockouts 06/2004 Analyzing Time Series Expression Data

8 Time Series Examples: Systems
The cell cycle system in yeast [Simon et al, Cell 01] 06/2004 Analyzing Time Series Expression Data

9 Computational challenges
Biological Networks dynamic regulatory networks information fusion function, response programs clustering temporal signals Pattern Recognition alignment, diff. expressed genes continuous representation, missing values, Data Analysis transcription, decay rates sampling rates, duration Experimental Design 06/2004 Analyzing Time Series Expression Data

10 Analyzing Time Series Expression Data
Sampling Rates Non uniform Differ between experiments 06/2004 Analyzing Time Series Expression Data

11 Analyzing Time Series Expression Data
Cell Cycle Datasets Dataset Method of arrest Duration Cell cycle length Sampling Repeats alpha (Spellman 98) alpha mating factor 0-119m 64m every 7 minutes 1 cdc15 (Spellman 98) temp. sensitive cdc15 10-290m 112m ev. 20m for 1 hr, ev. 10m for 3 hr, ev. 20m for final hr cdc28 (Cho98) temp. sensitive cdc28 0-160m 85m every 10 minutes fkh1/fkh2 knockout (Zhu00) 0-215m 105m every 15m until 165m then after 45m 2 yox1/yhp1 knockout (Pramila02) 0-120m 60m 06/2004 Analyzing Time Series Expression Data

12 Analyzing Time Series Expression Data
Networks Pattern Recognition Data Analysis Experimental Design 06/2004 Analyzing Time Series Expression Data

13 Representing time series expression data
We are capturing a continuous process with a few samples. We need a way to convert our samples for each gene to an expression profile. Some simple techniques: - Linear interpolation - Spline interpolation - Functional assignment 06/2004 Analyzing Time Series Expression Data

14 Standard interpolation
If we have missing values and noise linear interpolation will fail to reproduce an accurate representation. 06/2004 Analyzing Time Series Expression Data

15 Analyzing Time Series Expression Data
Splines Instead of linear interpolation, we can use splines: piecewise polynomials. Still, will overfit when faced with missing values and noise. 06/2004 Analyzing Time Series Expression Data

16 The power of co-expression
We can modify our splines to take into account the fact that many genes are co-expressed. 06/2004 Analyzing Time Series Expression Data

17 Analyzing Time Series Expression Data
Avoiding overfitting Require that for each gene  ~N(0, j) Add noise term 06/2004 Analyzing Time Series Expression Data

18 Analyzing Time Series Expression Data
Class Assignment In some cases the biological classes are known in advance. The algorithm can be modified and combined with a Gaussian mixture algorithm to perform clustering of the continuous representation of the expression data. Unlike previous algorithms, works on the continuous representation of the expression profile Performs much better than k-means on non uniformly sampled data 06/2004 Analyzing Time Series Expression Data

19 Analyzing Time Series Expression Data
Missing values 06/2004 Analyzing Time Series Expression Data

20 Analyzing Time Series Expression Data
Interpolation 06/2004 Analyzing Time Series Expression Data

21 Analyzing Time Series Expression Data
Alignment FKH1 cdc15 alpha cdc28 Difference in the timing of similar biological processes 06/2004 Analyzing Time Series Expression Data

22 Analyzing Time Series Expression Data
Continuous Alignment Using the estimated splines, we continuously align two expression datasets by minimizing a global error function Can be solved using numeric methods. Hard problem due to spline segments and the different durations. We also have an algorithm for removing genes that should not align RECOMB 2002 06/2004 Analyzing Time Series Expression Data

23 Identifying differentially expressed genes
Wild Type Knockout Hard to perform manual comparison. Sampling rates and different timing prevent direct comparison. Local vs. global methods: Using the original samples is infisable since not on the same scale. Sampling the curves might lead to errors since does not take into account systematic biases Need a global measure and probability distribution Zhu et al, Nature 2000 06/2004 Analyzing Time Series Expression Data

24 Using Global Error to Determine Significance
Key idea: Combine individual noise model with a global error (area between curves) that correctly captures the temporal difference between the two profiles. 06/2004 Analyzing Time Series Expression Data

25 Comparing the continuous representation
WT Knockout 06/2004 Analyzing Time Series Expression Data

26 Enrichment for the Cell Cycle Factors
06/2004 Analyzing Time Series Expression Data

27 Overcoming population effects
Smc3: observed values Microarray experiments profile population of cells. Initially cells are synchronized, but they lose their synchronization over time. Need to compensate for synchronization loss in order to recover single cell values. 06/2004 Analyzing Time Series Expression Data

28 Analyzing Time Series Expression Data
Networks Pattern Recognition Individual Gene Experimental Design 06/2004 Analyzing Time Series Expression Data

29 Pattern recognition and clustering
Identifying relationships between genes based on expression profiles. Handling non uniform sampling rates. Determining relationships between clusters. 06/2004 Analyzing Time Series Expression Data

30 Time Shifted and Inverted Profiles
Qian et al Journal of Molecular Biology 2001 06/2004 Analyzing Time Series Expression Data

31 Analyzing Time Series Expression Data
Results Simultaneous expression profile relationships: Inverted expression profile relationships: Time delayed expression profile relationships 06/2004 Analyzing Time Series Expression Data

32 Hierarchical clustering
For n leaves there are n-1 internal nodes Each flip in an internal node creates a new linear ordering There are 2n-1 possible linear ordering of the leafs of the tree 3 4 5 4 5 3 1 2 06/2004 Analyzing Time Series Expression Data

33 Determine Relations Between Clusters
Optimal leaf ordering: selects the ordering that maximizes the sum of the similarities of adjacent leaves in the clustering tree. Initial structure Permuted Hierarchical clustering 06/2004 Analyzing Time Series Expression Data

34 Results – Synthetic Data
Hierarchical clustering Input Optimal ordering Input Optimal ordering Hierarchical clustering 06/2004 Analyzing Time Series Expression Data

35 Analyzing Time Series Expression Data
24 cell cycle experiments 06/2004 Analyzing Time Series Expression Data

36 Analyzing Time Series Expression Data
Short Time Series 60% of the time series datasets are short (<7). Over signals are measured, data is very noisy and experiments are compared across all time points. Most clustering algorithms will miss small sets, and in addition, cannot be used to compare datasets. 06/2004 Analyzing Time Series Expression Data

37 Taking advantage of the small number of points
06/2004 Analyzing Time Series Expression Data

38 Analyzing Time Series Expression Data
Networks Pattern Recognition Individual Gene Experimental Design 06/2004 Analyzing Time Series Expression Data

39 Analyzing Time Series Expression Data
Systems Biology Different types of data provide partial information about the activity in the cell. By integrating these data sources we can obtain a better picture of the activity in the cell. A lot of current interest – though relatively few methods construct temporal models. 06/2004 Analyzing Time Series Expression Data

40 Dynamic Bayesian Networks
Bayesian networks are graphical models which can account for the stochastisity in the data. Can be extended to handle time series data (dynamic Bayesian networks). So far have been used for small scale modeling. 06/2004 Analyzing Time Series Expression Data

41 Modeling tryptophan metabolism on E. coli
Ong et al Bioinformatics 2002 06/2004 Analyzing Time Series Expression Data

42 Genetic RegulAtory Modules (GRAM)
Gene Modules: Set of genes that are co-regulated and co-expressed. Functional Module: Collection of gene modules with related function. 06/2004 Analyzing Time Series Expression Data

43 Analyzing Time Series Expression Data
Assembly of the Cell Cycle Transcriptional Regulatory Network Blue boxes: gene modules We combine GRAM with our continuous alignment algorithms to construct a dynamic model for a sub-network 06/2004 Analyzing Time Series Expression Data

44 Analyzing Time Series Expression Data
Assembly of the Cell Cycle Transcriptional Regulatory Network Blue boxes: gene modules Individual regulators: ovals, connected to their modules Dashed line: extends from module encoding a regulator to the regulator protein oval 06/2004 Analyzing Time Series Expression Data

45 Comparing the Continuous Representation
WT Knockout 06/2004 Analyzing Time Series Expression Data

46 Analyzing Time Series Expression Data
Assembly of the Cell Cycle Transcriptional Regulatory Network Blue boxes: gene modules Individual regulators: ovals, connected to their modules Dashed line: extends from module encoding a regulator to the regulator protein oval 06/2004 Analyzing Time Series Expression Data

47 Analyzing Time Series Expression Data
Summary Time series expression data can be used to answer important biological questions. Pros: Autocorrelation, allows for casual inference, provides a better view of cellular activity Cons: Large number of signals but small number of time points, noise, lack of repeats By using methods specifically developed for this data we can overcome the above problems and take advantage of its unique properties 06/2004 Analyzing Time Series Expression Data

48 Analyzing Time Series Expression Data
Want to know more ? Z. Bar-Joseph, “Analyzing time series gene expression data” Bioinformatics, in press. 06/2004 Analyzing Time Series Expression Data


Download ppt "Analyzing Time Series Gene Expression Data"

Similar presentations


Ads by Google