Presentation is loading. Please wait.

Presentation is loading. Please wait.

Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.

Similar presentations


Presentation on theme: "Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact."— Presentation transcript:

1 Atul Singh Junior Undergraduate CSE, IIT Kanpur

2  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact way using some alternate representation  Many process of data generation generate a large data set which is embedded in a low dimensional manifold of a high dimensional space. Dimensional reduction can be applied to those data sets

3  In statistical pattern recognition, we often encounter a data set in a high dimensional space  Many times the data is correlated in such a manner that there are very few independent dimensions  Possible to represent the data using much lower dimensions. Some benefits are - ◦ Compact representation ◦ Less processing time ◦ Visualization of high dimensional data ◦ Interpolate meaningful dimensions

4  Linear Methods ◦ Principal Component Analysis (PCA) ◦ Independent Component Analysis (ICA) ◦ Multi-dimensional Scaling (MDS)  Non-linear Methods ◦ Global  Isomap and its variants ◦ Local  Locally Linear Embedding (LLE)  Laplacian Eigenmaps

5  Principal Component Analysis ◦ Involves finding directions having large covariance ◦ Express the data as a linear combination of eigenvectors along those directions  Multidimensional Scaling ◦ Keep inter point distances as invariant ◦ Again a linear methodology

6  Many variables can’t be represented as linear combinations of some vectors ◦ Examples – Swiss roll, faces data, etc  In general the low dimension data is embedded on some non linear manifold  Not possible to transform these manifolds to low dimension space using only translation, rotation, rescaling

7  Linear methods have a pre conceived notion of dimension reduction  Goal is to – automate the estimation (infer the degrees of freedom of data manifold) and classification (embed data in low dimension space) process  So need to go beyond Linear methods  Non Linear Methods ◦ Isomap ◦ Locally Linear Embedding

8  Global manifold learning algorithm  A swiss roll (fig a) embedded as a manifold in high dimension space. Goal is to reduce it to two dimensions (fig c)

9  Consider a human face ◦ How is it represented/stored in brain? ◦ How is it represented/stored in a computer?  Do we need to store all the information (every pixel)?  Just need to figure out some important structures

10  Basic steps involved are ◦ Construct Neighborhood Graph: Determine the neighbors of each point and assign edge weights d x (i, j) to the graph thus formed ◦ Compute Geodesic distances: Estimate the geodesic distances d M (i, j) using Dijkstra’s algorithm ◦ Use MDS to lower dimensions: Apply MDS on the computed shortest path distance matrix and thus reduce the dimensionality  So basically similar to MDS just using geodesic distances rather than Euclidean

11

12  Another non-linear dimension reduction algorithm  Follows a local approach to reduce the dimensions  The idea is based on the assumption that a point on a manifold reside on a hyper plane determined by the point and its some nearest neighbors  Key Question – How to combine these parts of hyper planes and map them to low dimensional space?

13  The basic steps involved are ◦ Assign neighbors: To each point assign neighbors using nearest neighbor approach ◦ Weight calculation: We compute weights W ij such that X i is best reconstructed from its neighbors ◦ Compute Low dimension embedding: Using the above computed weight matrix find corresponding embedding vectors Y i in lower dimensional space minimizing error function

14  The weights computed for reconstruction are invariant with respect to translation, rotation and rescaling  The same weights should reconstruct the map in reduced dimensions  So we can conclude that the local geometry is preserved

15

16  Shortcomings of Isomap ◦ Need a dense sampling of data on the manifold ◦ If k is chosen very small then residual error will be too large ◦ If k is chosen very large then short-circuiting may happen  Shortcomings of LLE ◦ Due to local nature doesn’t give a complete picture of the data ◦ Again problems with selection of k

17  Short-circuiting ◦ “When the distance between the folds is very less or there is noise such that a point from a different fold is chosen to be a neighbour of the point, the distance computed does not represent geodesic distance and hence the algorithm fails”  Insight ◦ This problem arises due to selection of neighbors just on the basis of their Euclidean distance. The basic selection criteria are  Select all the points within a ball of radius ε  Select K nearest neighbors ◦ Locally Linear isomaps overcomes this problem by modifying the neighbor selection criteria  Proposed algorithm – K LL Isomaps

18  Similar to Tanenbaum’s Isomap except for the selection of nearest neighbors  Previous algorithms (Isomap, LLE) consider only the Euclidean distance to judge neighborhood criteria  The proposed algorithm is ◦ Find a candidate neighborhood using K-nn approach ◦ Construct the data point using candidate neighbors (as in LLE) to minimize the reconstruction error ◦ The neighbours whose Euclidean distance is less and those lying on the locally linear patch of the manifold get higher weights and hence are selected preferably ◦ Now K LL ≤ K neighbors are chosen based on the post construction weights

19  K LL Isomap has been demonstrated to perform better than Isomap under ◦ Sparsely sampled data ◦ Noisy data ◦ Dense data without noise  The metric used for Analysis are “Short circuit edges” and “Residual variance” To establish a formal proof for better performance of this algorithm

20  Tenenbaum J.B., Silva V.d., Langford J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction  Roweis S.T., Saul L.K.: Nonlinear Dimensionality Reduction by Locally Linear Embedding  Balasubramanian M., Schwartz E.L., Tenenbaum J.B., Silva V.d., Langford J.C.: The Isomap Algorithm and Topological Stability  Silva V.d., Tenenbaum J.B.: Global versus local methods in nonlinear dimensionality reduction  Roweis S.T., Saul L.K.: An Introduction to Locally Linear Embedding  Saxena A., Gupta A., Mukherjee A.: Non-linear Dimensionality Reduction by Locally Linear Isomaps

21

22


Download ppt "Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact."

Similar presentations


Ads by Google