Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Word Spotting DTW.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Fast Algorithms For Hierarchical Range Histogram Constructions
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Principal Component Analysis
Heuristic alignment algorithms and cost matrices
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Reduced Support Vector Machine
A quick introduction to the analysis of questionnaire data John Richardson.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Review of Matrix Algebra
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Introduction to Bioinformatics - Tutorial no. 12
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Protein Structures.
Sequence comparison: Local alignment
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Solving Systems of Equations and Inequalities
Developing Pairwise Sequence Alignment Algorithms
Gene expression & Clustering (Chapter 10)
Presented By Wanchen Lu 2/25/2013
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Solving Systems of Equations and Inequalities Section 3.1A-B Two variable linear equations Section 3.1C Matrices Resolution of linear systems Section 3.1D.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Non Negative Matrix Factorization
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Microarrays.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Sparse nonnegative matrix factorization for protein sequence motifs information discovery Presented by Wooyoung Kim Computer Science, Georgia State University.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Unsupervised Learning
Cluster Analysis II 10/03/2012.
Introduction to Matrices
Sequence comparison: Local alignment
Principal Component Analysis (PCA)
Learning Sequence Motif Models Using Expectation Maximization (EM)
Sequence Alignment 11/24/2018.
Introduction to Matrices
Anastasia Baryshnikova  Cell Systems 
Matrices.
Unsupervised Learning
Presentation transcript:

Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics & Medical Informatics, Department of Computer Sciences and Department of Oncology, University of Wisconsin, Madison, USA BIOINFROMATICS Vol. 25 pages i119-i127, 2009

Outline Introduction Method – SCOW – Clustered alignments Results and Discussion Conclusion

Introduction Charactering and comparing temporal gene- expression responses is an important computational task for answering a variety of questions in biological studies. One application : Toxicongenomics charactering the potential toxicity of chemicals

Introduction answering similarity queries: assess similarity by determine the temporal correspondence between the query and treatment

Introduction Tow issue: – First : (Treatment B) all genes should be aligned together. (Treatment C) some genes need to be warped separately – Second : The best alignment does not account for the complete extent of both time series. Allow a type of local alignments in which the end of one series is unaligned Shorting the alignment

Introduction Multi-segment alignment method : Shorting : The alignment path that represents shorting ends in the top row or the right column of the alignment space diagram, but not in the top-right cell.

Introduction To solve “all genes are assumed to be aligned in lockstep with one another” – Calculated clustered alignments – Find clusters of gene such that genes within a cluster share a common alignment – Each cluster is aligned independently of the others – Similar to k-means Alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it To solve “alignment for the complete extent of both time series” – Multi-segment alignment – shorting

Method – SCOW (Shorting COW) COW (Nielsen et al., 1998) – a dynamic programming algorithm designed to find an optimal alignment between two series with multiple channels of information(such as genes). – Briefly, it aligns and scores two give time series based on their similarity – Two series as q (for query series) and d (for database series) – The series are partitioned into m segments, in which the i- th segments of the two series correspond to each other. – The score of a give alignment is the sum of correlations between corresponding segments

Method – SCOW The segment are assumed to be of constant length and usually evenly spaced in q Variable in d COW search for good segment boundaries in only a limited area of alignment space. The vector K contains the coordinates of the knots (segment endpoints) in q

Method – SCOW – The zero-indexed matrix , which is of dimensions m+1 by |d|+1. – The element contains the score of the best alignment of d from zero to x and q from zero to k. Pearson correlation q(a,b) : Subseries of q from a to b d is defined likewise. The predecessor function list valid starting locations in d for segments ending at x

Method – SCOW – The best score – a one-channel time series : the expression profile of a single gene a multi-channel time series : the expression profile of a set of genes The only difference between these two cases is in how the correlations are calculated. – COW is apt to align segments which differ greatly in magnitude.

Method – SCOW SCOW – Search for optimal knots in both dimensions The first step : seach independently in both dimensions. Second step : SCOW alternates horizontal and vertical movement of each knot until it converges.

Method – SCOW First step Second step

Method – SCOW – The matrix is calculated when the algorithm searches for knots with respect to q and hold them constant with respect to d, while is calculated during the opposite case. – The predecessor function : a cone-shaped search apace

Method – SCOW – Score function : Include terms that incur penalties for segment that involve stretching and significant difference in amplitude. The stretching s i is defined as the ratio of lengths between q i and d i, and a i is the amplitude ratio between the two as determined by a weighted least squares fitting procedure.

Method – Clustered alignment Find sets of genes that would have very similar alignments if they were aligned independently. a variant of traditional k-means cluster – Identifying clusters in which the genes have similar warpings – The genes in one of our clusters may have very different expression profiles.

Method – Clustered alignment The first step is to assign the initial alignment centroids, to select a representative set of gene alignments as the centroids. Subroutine Align returns the best alignment between two sereis based on a give set of genes. ScoreGene returns the score of two series when aligned using a given alignment and a specified gene. Record the best score so far that gene using one of the current centroidls.

Method – Clustered alignment It alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it.

Results and Discussion SCOW experiments – We construct queries for which we know the correct matching database treatments and their correct alignments. – The data we use comes from the EDGE toxicolog databases ( – Dataset consists of 216 unique observations of microarray data, each of which represents the the values for 1600 different genes. – Time range from 6h up to 96h. – The data span 11 different treatments.

Results and Discussion – Assemble 10 queries for each treatment by randomly sub-sampling time series in our dataset – We measure two accuracy : Treatment accuracy : identify the treatment from which each query series was extracted Alignment accuracy : align the query points to their actual time points in the treatment.

Results and Discussion The top line : treatment accuracy with different orders of splines The middle line : alignment accuracy by adding the criterion that the average time error in the mapping is less than or equal to 24 h The bottom line : alignment accuracy where this tolerance is decreased to 12 h.

Results and Discussion – Conclusion : Multi-segment alignment computed by SCOW, COW and Generative Multi-segment are superior to the alignment determined by ordinary dynamic time warping and the linear alignment method SCOW find more accurate alignment than the other two multi-segment algorithms

Results and Discussion Clustered alignment experiments

Conclusion Present new method which advance in two ways : – Compute clustered alignments – A new multi-segment alignment method, called SCOW