Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Approach to Analyzing Gene Expression Time Series Data Ziv Bar-Joseph Georg Gerber David K. Gifford Tommi S. Jaakkola Itamar Simon Learning Seminar:

Similar presentations


Presentation on theme: "A New Approach to Analyzing Gene Expression Time Series Data Ziv Bar-Joseph Georg Gerber David K. Gifford Tommi S. Jaakkola Itamar Simon Learning Seminar:"— Presentation transcript:

1 A New Approach to Analyzing Gene Expression Time Series Data Ziv Bar-Joseph Georg Gerber David K. Gifford Tommi S. Jaakkola Itamar Simon Learning Seminar: Bioinformatics & Other Applications Prof. Nathan Intrator Presented By: Adam Segoli Schubert May 16, 2005

2 Overview Gene Expression Time Series Statistical Analysis of Time-Series DNA Microarray Gene Expression Time-Series Analyzing Gene Expression Time-Series Data Estimating Unobserved Expression Values and Time Points What is a Spline? Using the Splines Parameters Analysis Aligning Time-Series Data Aligning Temporal Data Using Splines Results – Unobserved Data Estimation Result - Aligning Temporal Data References

3 Gene Expression

4 Time-Series A series of values of variables taken in successive periods of time Time Points Sampling Intervals (constant / inconstant) A well established area in statistical analysis of data is dedicated to the study of time-series

5 Statistical Analysis of Time-Series Two main goals: Identifying the nature of the phenomenon Predicting unobserved values of the time- series variable

6 DNA Microarray Allows the monitoring of expression levels of thaousands of genes under a variety of conditions. The data of microarray experiments is usually in the form of a large matrix. Very Expensive.

7 Gene Expression Time-Series Determined by measuring mRNA levels or protein concentrations Commonly are very short (i.e. 4 to 20 samples) Usually unevenly sampled The measuring techniques are extremely noise- prone and/or subject to bias in the biological measurements.

8 Analyzing Gene Expression Time- Series Data Estimating Unobserved Expression Values and Time Points Aligning Time-Series Data

9 Estimating Unobserved Expression Values and Time Points Row Average or Filling with Zeros Singular Value Decomposition (SVD) Weighted K-Nearest Neighbors Linear Interpolation

10 A New Analysis Approach By using Cubic Splines.

11 What is a Spline? A special curve defined piecewise by polynomials. Given k points t i called knots in an interval [a,b] with The parametric curve is called a Spline of degree n if and A Cubic Spline if n = 3.

12 Using the Splines We Obtain a continues time formulation by using cubic splines to represent gene expression curves. Spline control points are uniformly spaced. We constrain spline coefficients of co- expressed genes to have the same covariance matrix.

13 Estimating Unobserved Data Using Splines Given c Genes Classes. - The gene i (of class j) value as observed at time t Can be written as

14 Estimating Unobserved Data Using Splines Resampling gene I at any time t’ of an unobserved time point: Estimating Missing Values: Averaging of the observed values using the class covariance matrix, class average and the gene specific variation. Where are determined by a probabilistic model.

15 Estimating Unobserved Data Using Splines Parameters Analysis Y i – Vector of observed expression values for gene i. S i – Matrix m x q for m observations.

16 Aligning Time-Series Data Dynamic Time Wraping Developed for voice recognition purposes at the 70’s. Dynamic Programming John Aach & George M. Church operates on individual genes

17 Aligning Temporal Data Using Splines Operates on a set of genes. Assume we have two spline curve for gene i: We define a mapping function T(s) = t

18 Aligning Temporal Data Using Splines We Define the alignment error for each gene: Alignment Limits: Starting Point Ending Point

19 Aligning Temporal Data Using Splines We define the error for a set of genes S of size n as: - Weighted coefficients that sum to one (uniform / nonuniforn).

20 Aligning Temporal Data Using Splines The Mapping function (T(s) = t) can then be found by minimizing ‘s value. Using standard non-linear optimization techniques.

21 Results – Unobserved Data Estimation Comparison of the new approach with: Linear Interpolation Spline interpolation using individual genes K-Nearest neighbors (KNN) k = 20

22

23

24 Result - Aligning Temporal Data Aligned three yeast cell-cycle gene expression time series

25

26 Thank You! Any Questions?

27 References C. S. Moller-Levet. Clustering of Gene Expressiom Time-Series Data. Biology. Fifth Edition By Neil A. Campbell, Jane B. Reece, and Lawrence G. Mitchell. J. Aach and G. M. Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17:495-508, 2001. C. de Boor. A practical guide to splines. Springer, 1978. P. D’haeseleer, X. Wen, S. Fuhrman, and R. Somogyi. Linear modeling of mrna expression levels during cns development and injury. In PSB99, 1999. G. James and T. Hastie/ Functional linear discriminant analysis for irregulary sampled curves. Jurnal of the Royal Statistical Society, to appear, 2001. Sharan R. and Shamir R. Algorithmic approaches to clustering gene expression data/ current topics in coputational Biology, To appear. O. Troyanskaya, M. Cantor, and et al/ Missing value estimation methods for dna microarrays. bioinformatics, 17:520-525, 2001.


Download ppt "A New Approach to Analyzing Gene Expression Time Series Data Ziv Bar-Joseph Georg Gerber David K. Gifford Tommi S. Jaakkola Itamar Simon Learning Seminar:"

Similar presentations


Ads by Google