Download presentation
Presentation is loading. Please wait.
Published byMichael Lambert Modified over 8 years ago
1
University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects the structure of dataset Early / late stage of data mining Challenges of Microarray Visualization –High dimensionality –Large data size –Intuitive layout –Low time complexity
2
University at BuffaloThe State University of New York An Example – Early Stage
3
University at BuffaloThe State University of New York General Approaches Global Visualizations –Encode each dimension uniformly by the same visual cue Parallel coordinates
4
University at BuffaloThe State University of New York General Approaches, con’t Optimal Visualizations –Estimate the parameters and assess the fit of various spatial distance models for proximity data –Multidimensional scaling (MDS) Sammon’s mapping: topology preservation. Two samples that are close to each other have to stay close when projected.
5
University at BuffaloThe State University of New York Sammon’s mapping Sammon’s mapping is a classical case of MDS MDS optimizes 2-D presentation to preserve distances in original N-dimensional space Sammon’s mapping iteratively minimizes d ij * is the distance between points i and j in the N-dimensional space d ij * is the distance between points I and j in the visualization.
6
University at BuffaloThe State University of New York 2D to 1D
7
University at BuffaloThe State University of New York A method for achieving this projection 1. D1, D2 and D3 (the interpoint distances in the higher dimensional space) are calculated. 2. P1', P2' and P3' are generated randomly in the lower dimensional space. 3. The mapping error, E, is calculated for all the interpoint distances in the lower dimensional space. 4. The gradient showing the direction which minimizes the error is calculated. 5. The points in the lower dimensional space are moved according to the direction given by the gradient. 6. Steps 3 to 5 are repeated until E is below a given limit.
8
University at BuffaloThe State University of New York Sammon’s mapping, con’t Some drawbacks –Computationally intensive, time complexity O(n 2 ) –How to determine the best initialization –No user interaction is permitted –Addition of new data points requires rerun the process to get new minimized projection –Information loss
9
University at BuffaloThe State University of New York General Approaches, con’t Projective Visualizations –Use projection functions to achieve a low dimensional display –Radial Visualizations RadViz Star Coordinates VizStruct
10
University at BuffaloThe State University of New York Comparison of Approaches AdvantagesDisadvantages Global visualizationDisplay all dimensional information, no computation Severe overlapping, large space to display Optimal visualization Achieve optimal result, sound theoretical basis Lack user interaction, heavy computation Projection visualization Concise display, little computation Lack regorous proof, may not be optimal
11
University at BuffaloThe State University of New York Challenges of Microarray Visualization High dimensionality Large data size Intuitive layout Low time complexity
12
University at BuffaloThe State University of New York Density or Heat Plots Genes Sample Increased Before IFNAfter IFN Widely used with arrays Works well only for structured data Quantitative information is lost Gets easily cluttered
13
University at BuffaloThe State University of New York TreeView Visualization
14
University at BuffaloThe State University of New York Principal component analysis PCA: linear projection of data onto major principal components defined by the eigenvectors of the covariance matrix. PCA is also used for reducing the dimensionality of the data. Criterion to be minimised: square of the distance between the original and projected data. This is fulfilled by the Karhuven-Loeve transformation P is composed by eigenvectors of the covariance matrix Example: Leukemia data sets by Golub et al.: Classification of ALL and AML
15
University at BuffaloThe State University of New York Sammon`s mapping: Non-linear multi-dimensional scaling such as Sammon's mapping aim to optimally conserve the distances in an higher dimensional space in the 2/3-dimensional space. Mathematically: Minimalisation of error function E by steepest descent method: Multi-linear scaling Example: DLBCL prognosis – cured vs featal cases
16
University at BuffaloThe State University of New York Our Visualization Approach Gene Space Sample Space Fourier Harmonic Projection
17
University at BuffaloThe State University of New York Geometric Interpretation N-dimensional space Two-dimensional space
18
University at BuffaloThe State University of New York An Example of the Mapping P=[a,a,…a] -> ?
19
University at BuffaloThe State University of New York First Fourier Harmonic Projection N-dimensional spaceTwo-dimensional space
20
University at BuffaloThe State University of New York Analytical Properties
21
University at BuffaloThe State University of New York Scaling and Transpose Property Original Shift Scaling Transpose
22
University at BuffaloThe State University of New York Time Shifting Property
23
University at BuffaloThe State University of New York Visual Exploration Framework Explorative Visualization – Sample space Confirmative Visualization – Gene space
24
University at BuffaloThe State University of New York VizStruct Architecture WebBrowser Internet Client Web Server Matlab Web Server Matlab Libraries Intranet Matlab Applications
25
University at BuffaloThe State University of New York VizStruct User Interface
26
University at BuffaloThe State University of New York VizStruct User Interface (3) Cartesian Plot Polar plot
27
University at BuffaloThe State University of New York VizStruct User Interface (2) EM Mixture Density contour
28
University at BuffaloThe State University of New York Sample Classification
29
University at BuffaloThe State University of New York Binary Classification Leukemia-A 72 samples with 7129 genes 38(27+11)Training,34(20+14) Testing, hold out evaluation Multiple Sclerosis 44 samples, 4132 genes MS_IFN(28), MS_CON(30), cross validation evaluation Binary classification: two sample classes Evaluation: hold out and cross validation
30
University at BuffaloThe State University of New York Multiple Classification Breast Cancer 22 samples with 3226 genes 3 Classes: BRCA1 (7), BRCA2 (8), Sporadic (7) cross validation evaluation 88 samples with 2308 genes 4 classes: RMS, BL, NB, EWS, 63 Training and 25 Testing SRBCT
31
University at BuffaloThe State University of New York Classification Summary
32
University at BuffaloThe State University of New York Temporal Pattern (1) 10-OH Nortryptyline Nortryptyline
33
University at BuffaloThe State University of New York Temporal Pattern (2) Rat Kidney data set of Stuart et al. (2001) contains 873 genes of 7 time points during kidney development There are 5 patterns or gene groups classified by the author Parallel coordinate shows the actual data comply to the profiles but with some noise Parallel coordinates for each of the gene groups Idealized temporal gene expression profiles
34
University at BuffaloThe State University of New York Temporal Pattern (3) Genes having very high relative levels of expression in early development Genes having a relatively steady increase in expression throughout development The first Fourier harmonic projection Genes are somewhat symmetric to the middle time point, i.e., they are transposing each other Genes are very similar except the last time point
35
University at BuffaloThe State University of New York VizStruct vs. Sammon’s Mapping VizStruct is similar to Sammon’s mapping
36
University at BuffaloThe State University of New York VizStruct - Dimension Tour Interactively adjust dimension parameters Manually or automatically May cause false clusters to break Create dynamic visualization
37
University at BuffaloThe State University of New York Visualized Results for a Time Series Data Set
38
University at BuffaloThe State University of New York Interrelated Dimensional Clustering The approach is applied on classifying multiple-sclerosis patients and IFN-drug treated patients. –(A) Shows the original 28 samples' distribution. Each point represents a sample, which is a mapping from the sample's 4132 genes intensity vectors. –(B) Shows 28 samples' distribution on 2015 genes. –(C) Shows 28 samples' distribution on 312 genes. –(D) Shows the same 28 samples distribution after using our approach. We reduce 4132 genes to 96 genes.
39
University at BuffaloThe State University of New York References Li Zhang, Aidong Zhang, and Murali Ramanathan VizStruct: Exploratory Visualization for Gene Expression Profiling. Bioinformatics 2004 20: 85-92, 2004. Li Zhang, Chun Tang, Yuqing Song, and Aidong Zhang, Murali Ramanathan. VizCluster and Its Application on Clustering Gene Expression Data. International Journal of Distributed and Parallel Database, 13(1): 73-97, 2003 Li Zhang, Aidong Zhang, and Murali Ramanathan: Enhanced Visualization of Time Series through Higher Fourier Harmonics. In proceeding of BIOKDD 2003, Washington DC, August 2003, pp 49-56. Li Zhang, Aidong Zhang, and Murali Ramanathan: Fourier Harmonic Approach for Visualizing Temporal Patterns of Gene Expression Data. In proceeding of IEEE Computer Society Bioinformatics Conference (CSB 2003). Stanford, CA, August 2003, pp131-141. Li Zhang, Aidong Zhang, and Murali Ramanathan. Visualized Classification of Multiple Sample Types. In proceeding of BIOKDD 2002, Edmonton, Alberta, Canada, July 2002, pp 55-62. Li Zhang, Chun Tang, Yong Shi, Yuqing Song, and Aidong Zhang, Murali Ramanathan. VizCluster: An Interactive Visualization Approach to Cluster Analysis and Its Application on Microarray Data. In proceeding of the Second SIAM International Conference on Data Mining (SDM02). Arlinton, VA. April 2002, pp 29- 51.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.