Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimensionality Reduction

Similar presentations


Presentation on theme: "Dimensionality Reduction"— Presentation transcript:

1 Dimensionality Reduction

2 Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc) Answering similarity queries in high-dimensions is a difficult problem due to “curse of dimensionality” A solution is to use Dimensionality reduction

3 High-dimensional datasets
Range queries have very small selectivity Surface is everything Partitioning the space is not so easy: 2d cells if we divide each dimension once Pair-wise distances of points is very skewed freq distance

4 Dimensionality Reduction
The main idea: reduce the dimensionality of the space. Project the d-dimensional points in a k-dimensional space so that: k << d distances are preserved as well as possible Solve the problem in low dimensions (the GEMINI idea of course…)

5 MDS Map the items in a k-dimensional space trying to minimize the stress But the running time is O(N2) and O(N) to add a new item

6 FastMap Find two objects that are far away
Project all points on the line the two objects define, to get the first coordinate

7 FastMap - next iteration

8 Example ~100 1 100 O5 O4 O3 O2 O1 ~1

9 Example Pivot Objects: O1 and O4
X1: O1:0, O2:0.005, O3:0.005, O4:100, O5:99 For the second iteration pivots are: O2 and O5

10 Results Documents /cosine similarity -> Euclidean distance (how?)

11 Results bb reports recipes

12 FastMap Extensions If the original space is not a Euclidean space, then we may have a problem: The projected distance may be a complex number! A solution to that problem is to define: di(a,b) = sign(di(a,b)) (| di(a,b) |2)1/2 where, di(a,b) = di-1(a,b)2 – (xia-xbi)2

13 Other DR methods Using SVD and project the items on the first k eigenvectors Random projections

14 SVD decomposition - the Karhunen-Loeve transform
Intuition: find the axis that shows the greatest variation, and project all points into this axis f2 e1 e2 f1

15 SVD: The mathematical formulation
Find the eigenvectors of the covariance matrix These define the new space Sort the eigenvalues in “goodness” order f2 f1 e1 e2

16 SVD: The mathematical formulation
Let A be the N x M matrix of N M-dimensional points The SVD decomposition of A is: = U x L x VT, But running time O(Nm2). How many I/Os? Idea: Find the eigenvectors of C=ATA (MxM) If M<<N, then C can be kept in main memory and can be constructed in one pass. (so we can find V and L) One more pass to construct U

17 SVD Cont’d Advantages:
Optimal dimensionality reduction (for linear projections) Disadvantages: Computationally hard, especially if the time series are very long. Does not work for subsequence indexing

18 SVD Extensions On-line approximation algorithm
[Ravi Kanth et al, 1998] Local diemensionality reduction: Cluster the time series, solve for each cluster [Chakrabarti and Mehrotra, 2000], [Thomasian et al]

19 Random Projections Based on the Johnson-Lindenstrauss lemma: For:
any (sufficiently large) set S of M points in Rn k = O(e-2lnM) There exists a linear map f:S Rk, such that (1- e) D(S,T) < D(f(S),f(T)) < (1+ e)D(S,T) for S,T in S Random projection is good with constant probability

20 Random Projection: Application
Set k = O(e-2lnM) Select k random n-dimensional vectors (an approach is to select k gaussian distributed vectors with variance 0 and mean value 1: N(1,0) ) Project the time series into the k vectors. The resulting k-dimensional space approximately preserves the distances with high probability Monte-Carlo algorithm: we do not know if correct

21 Random Projection A very useful technique,
Especially when used in conjunction with another technique (for example SVD) Use Random projection to reduce the dimensionality from thousands to hundred, then apply SVD to reduce dimensionality farther


Download ppt "Dimensionality Reduction"

Similar presentations


Ads by Google