Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
3D Geometry for Computer Graphics
Dimensionality Reduction and Embeddings
Dimensionality Reduction
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.
Introduction Given a Matrix of distances D, (which contains zeros in the main diagonal and is squared and symmetric), find variables which could be able,
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
3D Geometry for Computer Graphics
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
SVD and PCA COS 323, Spring 05. SVD and PCA Principal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional subspacePrincipal.
Dimensionality Reduction
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Agenda Dimension reduction Principal component analysis (PCA)
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
SVD(Singular Value Decomposition) and Its Applications
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
Non Negative Matrix Factorization
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
381 Self Organization Map Learning without Examples.
Analyzing Expression Data: Clustering and Stats Chapter 16.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
1 Overview representing region in 2 ways in terms of its external characteristics (its boundary)  focus on shape characteristics in terms of its internal.
Principal Components Analysis ( PCA)
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
CSE 554 Lecture 8: Alignment
Ch 12. Continuous Latent Variables ~ 12
Data Mining, Neural Network and Genetic Programming
Lecture 8:Eigenfaces and Shared Features
Lecture: Face Recognition and Feature Reduction
Dipartimento di Ingegneria «Enzo Ferrari»,
Clustering (3) Center-based algorithms Fuzzy k-means
Singular Value Decomposition
Probabilistic Models with Latent Variables
Recitation: SVD and dimensionality reduction
Dimension reduction : PCA and Clustering
Feature space tansformation methods
Principal Component Analysis
Lecture 20 SVD and Its Applications
Presentation transcript:

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes lie in a one- or two-dimensional manifold (constrained topological map; Teuvo Kohonen, 1993) K prototypes: Rectangular grid, hexagonal grid Integer pair l j  Q 1 x Q 2, where Q 1 =1, …, q 1 & Q 2 =1,…,q 2 (K = q 1 x q 2 ) High-dimensional observations projected to the two- dimensional coordinate system

SOM Algorithm Prototype m j, j =1, …, K, are initialized Each observation x i is processed one at a time to find the closest prototype m j in Euclidean distance in the p-dimensional space All neighbors of m j, say m k, move toward x i as m k  m k +  x i – m k  Neighbors are all m k such that the distance between m j and m k are smaller than a threshold r (neighbor includes itself) –Distance defined on Q 1 x Q 2, not on the p-dimensional space SOM performance depends on learning rate  and threshold r –Typically,  and r are decreased from 1 to 0 and from R (predefined value) to 1 at each iteration over, say, 3000 iterations

SOM properties If r is small enough, each neighbor contains only one point  spatial connection between prototypes is lost  converges at a local minima of K-means clustering Need to check the constraint reasonable: compute and compare reconstruction error  =||x-m|| 2 for both methods (SOM’s  is bigger, but should be similar)

Tamayo et al. (1999; GeneCluster) Self-organizing maps (SOM) on microarray data - Hematopoietic cell lines (HL60, U937, Jurkat, and NB4): 4x3 SOM - Yeast data in Eisen et al. reanalyzed by 6x5 SOM

Principal Component Analysis Data x i, i=1,…,n, are from the p-dimensional space (n  p) –Data matrix: Xn x p Singular decomposition X = U  V T, where –  is a non-negative diagonal matrix with decreasing diagonal entries of eigen values (or singular value) i, –Un x p with orthogonal columns (u i t u j = 1 if i  j, =0 if i=j), and –Vp x p is an orthogonal matrix The principal components are the columns of XV (=U  –X and V have the same rank, at most p of non-zero eigen values

PCA properties The first column of XV or DU is the 1 st principal component, which represents the direction with the largest variance (the first eigen value represents its magnitude) The second column is for the second largest variance uncorrelated with the first, and so on. The first q columns, q < p, of XV are the linear projection of X into q diensions with the largest variance Let x = U  q V T, where  q is the diagonal matrix of  with q non-zero diagonals  x is best possible approximate of X with rank q

Traditional PCA Variance-Covariance matrix S from data X n x p –Eigen value decomposition: S = CDC T, with C an orthogonal matrix –(n-1)S = X T X = (U  V T)T U  V T = V  U T U  V T = V  2 V T –Thus, D =     n-1) and  C = V

Principal Curves and Surfaces Let f( ) be a parameterized smooth curve on the p- dimensional space For data x, let f (x) define the closest point on the curve to x Then f( ) is the principal curve for random vector X, if f( ) = E[X| f (X) = ] Thus, f( ) is the average of all data points that project to it

Algorithm for finding the principal curve Let f( ) have its coordinate f( ) = [f 1 ( ), f 2 ( ), …, f p ( )] where random vector X = [X 1, X 2, …, X p ] Then iterate the following alternating steps until converge: –(a) f j ( )  E[X j | (X) = l], j =1, …, p –(b) f (x)  argmin  ||x – f( )|| 2 The solution is the principal curve for the distribution of X

Multidimensional scaling (MDS) Observations x 1, x 2, …, x n in the p-dimensional space with all pair-wise distances (or dissimilarity measure) d ij MDS tries to preserve the structure of the original pair- wise distances as much as possible Then, seek the vectors z 1, z 2, …, z n in the k-dimensional space (k << p) by minimizing “stress function” –S D (z 1,…,z n ) =  i  j [(d ij - ||z i – z j ||) 2 ] 1/2 –Kruskal-Shephard scaling (or least squares) –A gradient descent algorithm is used to find the solution

Other MDS approaches Sammon mapping minimizes –  i  j [(d ij - ||z i – z j ||) 2 ] / d ij Classical scaling is based on similarity measure s ij –Often inner product s ij = is used –Then, minimize  i  j [(s ij - ) 2 ]