Presentation is loading. Please wait.

Presentation is loading. Please wait.

University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects.

Similar presentations


Presentation on theme: "University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects."— Presentation transcript:

1 University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects the structure of dataset Early / late stage of data mining Challenges of Microarray Visualization –High dimensionality –Large data size –Intuitive layout –Low time complexity

2 University at BuffaloThe State University of New York An Example – Early Stage

3 University at BuffaloThe State University of New York General Approaches Global Visualizations –Encode each dimension uniformly by the same visual cue Parallel coordinates

4 University at BuffaloThe State University of New York General Approaches, con’t Optimal Visualizations –Estimate the parameters and assess the fit of various spatial distance models for proximity data –Multidimensional scaling (MDS) Sammon’s mapping: topology preservation. Two samples that are close to each other have to stay close when projected.

5 University at BuffaloThe State University of New York Sammon’s mapping Sammon’s mapping is a classical case of MDS MDS optimizes 2-D presentation to preserve distances in original N-dimensional space Sammon’s mapping iteratively minimizes d ij * is the distance between points i and j in the N-dimensional space d ij * is the distance between points I and j in the visualization.

6 University at BuffaloThe State University of New York 2D to 1D

7 University at BuffaloThe State University of New York A method for achieving this projection 1. D1, D2 and D3 (the interpoint distances in the higher dimensional space) are calculated. 2. P1', P2' and P3' are generated randomly in the lower dimensional space. 3. The mapping error, E, is calculated for all the interpoint distances in the lower dimensional space. 4. The gradient showing the direction which minimizes the error is calculated. 5. The points in the lower dimensional space are moved according to the direction given by the gradient. 6. Steps 3 to 5 are repeated until E is below a given limit.

8 University at BuffaloThe State University of New York Sammon’s mapping, con’t Some drawbacks –Computationally intensive, time complexity O(n 2 ) –How to determine the best initialization –No user interaction is permitted –Addition of new data points requires rerun the process to get new minimized projection –Information loss

9 University at BuffaloThe State University of New York General Approaches, con’t Projective Visualizations –Use projection functions to achieve a low dimensional display –Radial Visualizations RadViz Star Coordinates VizStruct

10 University at BuffaloThe State University of New York Comparison of Approaches AdvantagesDisadvantages Global visualizationDisplay all dimensional information, no computation Severe overlapping, large space to display Optimal visualization Achieve optimal result, sound theoretical basis Lack user interaction, heavy computation Projection visualization Concise display, little computation Lack regorous proof, may not be optimal

11 University at BuffaloThe State University of New York Challenges of Microarray Visualization High dimensionality Large data size Intuitive layout Low time complexity

12 University at BuffaloThe State University of New York Density or Heat Plots Genes Sample Increased Before IFNAfter IFN Widely used with arrays Works well only for structured data Quantitative information is lost Gets easily cluttered

13 University at BuffaloThe State University of New York TreeView Visualization

14 University at BuffaloThe State University of New York Principal component analysis PCA: linear projection of data onto major principal components defined by the eigenvectors of the covariance matrix. PCA is also used for reducing the dimensionality of the data. Criterion to be minimised: square of the distance between the original and projected data. This is fulfilled by the Karhuven-Loeve transformation P is composed by eigenvectors of the covariance matrix Example: Leukemia data sets by Golub et al.: Classification of ALL and AML

15 University at BuffaloThe State University of New York Sammon`s mapping: Non-linear multi-dimensional scaling such as Sammon's mapping aim to optimally conserve the distances in an higher dimensional space in the 2/3-dimensional space. Mathematically: Minimalisation of error function E by steepest descent method: Multi-linear scaling Example: DLBCL prognosis – cured vs featal cases

16 University at BuffaloThe State University of New York Our Visualization Approach Gene Space Sample Space Fourier Harmonic Projection

17 University at BuffaloThe State University of New York Geometric Interpretation N-dimensional space Two-dimensional space

18 University at BuffaloThe State University of New York An Example of the Mapping P=[a,a,…a] -> ?

19 University at BuffaloThe State University of New York First Fourier Harmonic Projection N-dimensional spaceTwo-dimensional space

20 University at BuffaloThe State University of New York Analytical Properties

21 University at BuffaloThe State University of New York Scaling and Transpose Property Original Shift Scaling Transpose

22 University at BuffaloThe State University of New York Time Shifting Property

23 University at BuffaloThe State University of New York Visual Exploration Framework Explorative Visualization – Sample space Confirmative Visualization – Gene space

24 University at BuffaloThe State University of New York VizStruct Architecture WebBrowser Internet Client Web Server Matlab Web Server Matlab Libraries Intranet Matlab Applications

25 University at BuffaloThe State University of New York VizStruct User Interface

26 University at BuffaloThe State University of New York VizStruct User Interface (3) Cartesian Plot Polar plot

27 University at BuffaloThe State University of New York VizStruct User Interface (2) EM Mixture Density contour

28 University at BuffaloThe State University of New York Sample Classification

29 University at BuffaloThe State University of New York Binary Classification Leukemia-A 72 samples with 7129 genes 38(27+11)Training,34(20+14) Testing, hold out evaluation Multiple Sclerosis 44 samples, 4132 genes MS_IFN(28), MS_CON(30), cross validation evaluation Binary classification: two sample classes Evaluation: hold out and cross validation

30 University at BuffaloThe State University of New York Multiple Classification Breast Cancer 22 samples with 3226 genes 3 Classes: BRCA1 (7), BRCA2 (8), Sporadic (7) cross validation evaluation 88 samples with 2308 genes 4 classes: RMS, BL, NB, EWS, 63 Training and 25 Testing SRBCT

31 University at BuffaloThe State University of New York Classification Summary

32 University at BuffaloThe State University of New York Temporal Pattern (1) 10-OH Nortryptyline Nortryptyline

33 University at BuffaloThe State University of New York Temporal Pattern (2) Rat Kidney data set of Stuart et al. (2001) contains 873 genes of 7 time points during kidney development There are 5 patterns or gene groups classified by the author Parallel coordinate shows the actual data comply to the profiles but with some noise Parallel coordinates for each of the gene groups Idealized temporal gene expression profiles

34 University at BuffaloThe State University of New York Temporal Pattern (3) Genes having very high relative levels of expression in early development Genes having a relatively steady increase in expression throughout development The first Fourier harmonic projection Genes are somewhat symmetric to the middle time point, i.e., they are transposing each other Genes are very similar except the last time point

35 University at BuffaloThe State University of New York VizStruct vs. Sammon’s Mapping VizStruct is similar to Sammon’s mapping

36 University at BuffaloThe State University of New York VizStruct - Dimension Tour  Interactively adjust dimension parameters  Manually or automatically  May cause false clusters to break  Create dynamic visualization

37 University at BuffaloThe State University of New York Visualized Results for a Time Series Data Set

38 University at BuffaloThe State University of New York Interrelated Dimensional Clustering The approach is applied on classifying multiple-sclerosis patients and IFN-drug treated patients. –(A) Shows the original 28 samples' distribution. Each point represents a sample, which is a mapping from the sample's 4132 genes intensity vectors. –(B) Shows 28 samples' distribution on 2015 genes. –(C) Shows 28 samples' distribution on 312 genes. –(D) Shows the same 28 samples distribution after using our approach. We reduce 4132 genes to 96 genes.

39 University at BuffaloThe State University of New York References Li Zhang, Aidong Zhang, and Murali Ramanathan VizStruct: Exploratory Visualization for Gene Expression Profiling. Bioinformatics 2004 20: 85-92, 2004. Li Zhang, Chun Tang, Yuqing Song, and Aidong Zhang, Murali Ramanathan. VizCluster and Its Application on Clustering Gene Expression Data. International Journal of Distributed and Parallel Database, 13(1): 73-97, 2003 Li Zhang, Aidong Zhang, and Murali Ramanathan: Enhanced Visualization of Time Series through Higher Fourier Harmonics. In proceeding of BIOKDD 2003, Washington DC, August 2003, pp 49-56. Li Zhang, Aidong Zhang, and Murali Ramanathan: Fourier Harmonic Approach for Visualizing Temporal Patterns of Gene Expression Data. In proceeding of IEEE Computer Society Bioinformatics Conference (CSB 2003). Stanford, CA, August 2003, pp131-141. Li Zhang, Aidong Zhang, and Murali Ramanathan. Visualized Classification of Multiple Sample Types. In proceeding of BIOKDD 2002, Edmonton, Alberta, Canada, July 2002, pp 55-62. Li Zhang, Chun Tang, Yong Shi, Yuqing Song, and Aidong Zhang, Murali Ramanathan. VizCluster: An Interactive Visualization Approach to Cluster Analysis and Its Application on Microarray Data. In proceeding of the Second SIAM International Conference on Data Mining (SDM02). Arlinton, VA. April 2002, pp 29- 51.


Download ppt "University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects."

Similar presentations


Ads by Google