# RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November 2009.

## Presentation on theme: "RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November 2009."— Presentation transcript:

RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November 2009 1

Outline 1. Dimensionality Reduction – Motivation 2. Methods for dimensionality reduction 1. PCA 2. DCT 3. Random Projection 3. Results on Image Data 4. Results on Text Data 5. Conclusions 2

Dimensionality Reduction Motivation  Many applications have high dimensional data  Market basket analysis Wealth of alternative products  Text Large vocabulary  Image Large image window  We want to process the data  High dimensionality of data restricts the choice of data processing methods Time needed to use processing methods is too long Memory requirements make it impossible to use some methods 3

Dimensionality Reduction Motivation  We want to visualize high dimensional data  Some features may be irrelevant  Some dimensions may be highly correlated with some other, e.g. height and foot size  “Intrinsic” dimensionality may be smaller than the number of features  The data can be best described and understood by a smaller number dimensions 4

Methods for dimensionality reduction  Main idea is to project the high-dimensional (d) space into a lower-dimensional (k) space  A statistically optimal way is to project into a lower- dimensional orthogonal subspace that captures as much variation of the data as possible for the chosen k  The best (in terms of mean squared error ) and most widely used way to do this is PCA  How to compare different methods?  Amount of distortion caused  Computational complexity 5

Principal Components Analysis (PCA) Intuition  Given an original space in 2d  How can we represent that points in a k-dimensional space (k<=d) while preserving as much information as possible Original axes * * * * * * * * * * * * * * * * * * * * * * * * Data points First principal component Second principal component 6

Principal Components Analysis (PCA) Algorithm  Eigenvalues  A measure of how much data variance is explained by each eigenvector  Singular Value Decomposition (SVD)  Can be used to find the eigenvectors and eigenvalues of the covariance matrix  To project into the lower-dimensional space  Multiply the principal components (PC’s) by X and subtract the mean of X in each dimension  To restore into the original space  Multiply the projection by the principal components and add the mean of X in each dimension  Algorithm 1. X  Create N x d data matrix, with one row vector x n per data point 2. X subtract mean x from each dimension in X 3. Σ  covariance matrix of X 4. Find eigenvectors and eigenvalues of Σ 5. PC’s  the k eigenvectors with largest eigenvalues 7

Random Projection (RP) Idea  PCA even when calculated using SVD is computationally expensive  Complexity is O(dcN) Where d is the number of dimensions, c is the average number of non-zero entries per column and N the number of points  Idea  What if we randomly constructed principal component vectors? Johnson-Lindenstrauss lemma If points in vector space are projected onto a randomly selected subspace of suitably high dimensions, then the distances between the points are approximately preserved 8

Random Projection (RP) Idea  Use a random matrix (R) equivalently to the principal components matrix  R is usually Gaussian distributed  Complexity is O(kcn)  The generated random matrix (R) is usually not orthogonal  Making R orthogonal is computationally expensive However we can rely on a result by Hecht-Nielsen: In a high-dimensional space, there exists a much larger number of almost orthogonal than orthogonal directions. Thus vectors with random directions are close enough to orthogonal Euclidean distance in the projected space can be scaled to the original space by 9

Random Projection Simplified Random Projection (SRP)  Random matrix is usually gaussian distributed  mean: 0; standart deviation: 1  Achlioptas showed that a much simpler distribution can be used  This implies further computational savings since the matrix is sparse and the computations can be performed using integer arithmetic's 10

Discrete Cosine Transform (DCT)  Widely used method for image compression  Optimal for human eye  Distortions are introduced at the highest frequencies which humans tend to neglect as noise  DCT is not data-dependent, in contrast to PCA that needs the eigenvalue decomposition  This makes DCT orders of magnitude cheaper to compute 11

Results Noiseless Images 12

Results Noiseless Images 13

Results Noiseless Images 14  Original space 2500-d  (100 image pairs with 50x50 pixels)  Error Measurement  Average error on euclidean distance between 100 pairs of images in the original and reduced space  Amount of distortion  RP and SRP give accurate results for very small k (k>10) Distance scaling might be an explanation for the success  PCA gives accurate results for k>600 In PCA such scaling is not straightforward  DCT still as a significant error even for k > 600  Computational complexity  Number of floating point operations for RP and SRP is on the order of 100 times less than PCA  RP and SRP clearly outperform PCA and DCT at smallest dimensions

Results Noisy Images  Images were corrupted by salt and pepper impulse noise with probability 0.2  Error is computed in the high- dimensional noiseless space  RP, SRP, PCA and DCT perform quite similarly to the noiseless case 15

Results Text Data  Data set  Newsgroups corpus sci.crypt, sci.med, sci.space, soc.religion  Pre-processing  Term frequency vectors  Some common terms were removed but no stemming was used  Document vectors normalized to unit length Data was not made zero mean  Size  5000 terms  2262 newsgroup documents  Error measurement  100 pairs of documents were randomly selected and the error between their cosine before and after the dimensionality reduction was calculated 16

Results Text Data 17

Results Text Data  The cosine was used as similarity measure since it is more common for this task  RP is not as accurate as SVD  The Johnson-Lindenstrauss result states that the euclidean distance are retained well in random projection not the cosine  RP error may be neglected in most applications  RP can be used on large document collections with less computational complexity than SVD 18

Conclusion  Random Projection is an effective dimensionality reduction method for high-dimensional real-world data sets  RP preserves the similarities even if the data is projected into a moderate number of dimensions  RP is beneficial in applications where the distances of the original space are meaningful  RP is a good alternative for traditional dimensionality reduction methods which are infeasible for high dimensional data since it does not suffer from the curse of dimensionality 19

Questions 20

Download ppt "RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November 2009."

Similar presentations