Download presentation
Presentation is loading. Please wait.
1
Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France
2
Plan of the talk Object of study Definition of principal manifold (PM) Constructing PMs: elastic maps Examples of biomedical applications
3
Principal manifolds Elastic maps framework SVM Principal manifolds Regression, approximation Supervised classification K- means SOM Clustering Multidim. scaling Visualization PCA Factor analysis LLE ISOMAP Non-linear Data-mining methods
4
Finite set of objects in R N X i i=1..m IRIS database Petal heght Petal width Sepal width Sepal height SPECIES 4.931.40.2Iris-setosa 4.73.21.30.3Iris-setosa 4.63.11.50.2Iris-setosa 73.24.71.4Iris-versicolor 6.43.24.51.5Iris-versicolor 6.93.14.91.5Iris-versicolor 6.33.362.5Iris-virginica 5.82.7X1.9Iris-virginica 7.135.92.1Iris-virginica 6.32.95.61.8Iris-virginica
5
Mean point K-means clustering
6
Principal “Object”,
7
Principal Component Analysis, Maximal dispersion 1 st Principal axis 2 nd principal axis
8
Principal manifold
9
What do we want? Non-linear surface (1D, 2D, 3D …) Smooth and not twisted The data model is unknown Speed (time linear with Nm) Uniqueness Fast way to project datapoints
10
Metaphor of elasticity Data points Graph nodes U (Y) U (E), U (R)
11
Constructing elastic nets y E (0) E (1) R (1) R (0) R (2)
12
Definition of elastic energy. E (0) E (1) R (1) R (0) R (2) y XjXj
13
Elastic manifold
14
Global minimum and softening 0, 0 10 3 0, 0 10 2 0, 0 10 1 0, 0 10 -1
15
Adaptive algorithms Growing net Adaptive net Refining net: Idea of scaling:
16
Projection onto the manifold Closest node of the net Closest point of the manifold
17
Colorings: visualize any function
18
Density visualization
19
Example: different topologies RNRN R2R2
20
VIDAExpert tool and elmap C++ package
21
Regression and principal manifolds regression principal component x F(x)
22
Image skeletonization or clustering around curves
23
Approximation of molecular surfaces
24
Application: economical data Gross output Density Profit Growth temp
25
Medical table 1700 patients with infarctus myocarde Lethal cases Patients map, density
26
Medical table 1700 patients with infarctus myocarde 128 indicators Age Numberof infarctus in anamnesis Stenocardia functional class
27
Codon usage in all genes of one genome Escherichia coli Bacillus subtilis Majority of genes Highly expressed genes “Foreign” genes “Hydrophobic” genes
28
Golub’s leukemia dataset 3051 genes, 38 samples (ALL/B-cell,ALL/T-cell,AML) ALL sample AML sample Map of genes: vote for ALL vote for AML used by T.Golub used by W.Lie
29
Golub’s leukemia dataset map of samples: AML ALL/B-cell ALL/T-cell density Cystatin C Retinoblastoma binding protein P48 CA2 Carbonic anhydrase II X-linked Helicase II
30
Thank you for your attention! Questions?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.