Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Similar presentations


Presentation on theme: "Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization."— Presentation transcript:

1 Jan Kamenický

2  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization

3  WhaT maniFold? ◦ Low dimensional embedding of high dimensional data lying on a smooth nonlinear manifold  Linear methods fail ◦ i.e. PCA

4  Unsupervised methods ◦ Without any a priori knowledge  ISOMAPs ◦ Isometric mapping  LLE ◦ Locally linear embedding

5  Core idea ◦ Use geodesic distances on the manifold instead of Euclidean  Classical MDS ◦ Maps data to the lower dimensional space

6  Select neighbours ◦ K-nearest neighbours ◦ ε-distance neighbourhood  Create weighted neighbourhood graph ◦ Weights = Euclidean distances  Estimate the geodesic distances as shortest paths in the weighted graph ◦ Dijkstra’s algorithm

7  1) Set distances (0 for initial, ∞ for all other nodes), set all nodes as unvisited  2) Select unvisited node with smallest distance as active  3) Update all unvisited neighbours of the active node (if the computed distance is smaller)  4) Mark active node as visited (it has now minimal distance), repeat from 2) as necessary

8  Time complexity ◦ O(|E|dec+|V|min)  Implementation ◦ Sparse edges ◦ Fibonacci heap as a priority queue ◦ O(|E|+|V|log|V|)  Geodesic distances in ISOMAP ◦ O(N 2 logN)

9  Input ◦ Dissimilarities (distances)  Output ◦ Data in a low-dimensional embedding, with distances corresponding to the dissimilarities  Many types of MDS ◦ Classical ◦ Metric / non-metric (number of dissimilarity matrices, symmetry, etc.)

10  Quantitative similarity  Euclidean distances (output)  One distance matrix (symmetric)  Minimizing the stress function

11  We can optimize directly ◦ Compute double-centered distance matrix ◦ Note: ◦ Perform SVD of B ◦ Compute final data

12  Covariance matrix  Projection of centered X onto eigenvectors of NS (result of the PCA of X)

13

14

15  How many dimensions to use? ◦ Residual variance  Short-circuiting ◦ Too large neigbourhood (not enough data) ◦ Non-isometric mapping ◦ Totally destroys the final embedding

16  Conformal ISOMAP ◦ Modified weights in geodesic distance estimate: ◦ Magnifies regions with high density ◦ Shrinks regions with low density

17

18  Landmark ISOMAP ◦ Use only geodesic distances from several landmark points (on the manifold) ◦ Use Landmark-MDS for finding the embedding  Involves triangulation of non-landmark data ◦ Significantly faster, but higher chance for “short- circuiting”, number of landmarks has to be chosen carefully

19  Kernel ISOMAP ◦ Ensures that the B (double-centered distance matrix) is positive semidefinite by constant-shifting method

20  Core idea ◦ Estimate each point as a linear combination of it’s neighbours – find best such weights ◦ Same linear representation will hold in the low dimensional space

21  Find weights W ij by constrained minimization  Neighbourhood preserving mapping

22  Low dimensional representation Y  We take eigenvectors of M corresponding to its q+1 smallest eigenvalues  Actually, different algebra is used to improve numeric stability and speed

23

24

25  ISOMAP ◦ Preserves global geometric properties (geodesic distances), especially for faraway points  LLE ◦ Preserves local neighbourhood correspondence only ◦ Overcomes non-isometric mapping ◦ Manifold is not explicitly required ◦ Difficult to estimate q (number of dimensions)

26 The end


Download ppt "Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization."

Similar presentations


Ads by Google