Presentation is loading. Please wait.

Presentation is loading. Please wait.

FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.

Similar presentations


Presentation on theme: "FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets."— Presentation transcript:

1 FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets

2 AbstractAbstract  Describe a fast algorithm to map objects into points in some k- dimensional space, such that the dis- similarities are preserved.

3 AbstractAbstract  Thus, we can subsequently use fine- tuned spatial access methods (SAMs) to answer queries such as “ query by example ” or “ all pairs query ”.

4 IntroductionIntroduction  Not easy to extract k feature-extraction functions, which map to k-dimensional points  For instance, typed English words, what distance function should we consider to transform one string to the other?

5 SolutionsSolutions  Old : Multi-Dimensional Scaling (MDS)  Unsuitable for indexing  Proposed : Fast Algorithm  Much faster  Allow indexing

6 ApplicationsApplications  Image and multimedia databases  Medical databases

7 ApplicationsApplications  String databases, e.g. OCR  Time series, e.g. financial data

8 ApplicationsApplications  Data mining and visualization applications

9 Desirable types of queries  query-by-example search a collection of objects to find the ones that are within a user-defined distance from the query object  all pairs query find the pairs of objects which are within distance from each other

10 Benefit of mapping objects  Accelerate the search time for queries, by employing SAMs like R*-trees and z-ordering  Help with visualization, clustering and data-mining

11 Ideal mapping fulfills …  Fast to compute: O(N) or O(N logN), but not O(N 2 )  Preserve distances with little discrepancies  Should be very fast to map a new object

12 MDSMDS  Used to discover the underlying (spatial) structure of a set of data items from the (dis)similarity information  Map objects to a k-dimensional space, so as to minimize the stress function

13 MDSMDS  Stress function  it is the average difference between the distance of the "images" and the actual distance.

14 Drawbacks of MDS  Requires O(N 2 ) time, which is impractical for large databases  Fast retrieval is questionable as MDS is not prepared for “ query-by-example ” operation

15 DefinitionsDefinitions  k-d point P i that corresponds to the object O i, will be called the ‘image’ of object O i. That is, P i = (x i,1, x i,2,…, x i,k)  k-d space containing ‘images’ will be called target space

16 Proposed algorithm  Assumption: a domain expert has only provided us with a distance/dis- similarity function D (*, *)  For instance, the Euclidean distance between two feature vectors as the distance function between the corresponding objects

17 Proposed algorithm  Pretend that objects are indeed points in some unknown n-dimensional space, and to try to project these points on k mutually orthogonal directions  The challenge is to compute these projections from the distance matrix only

18 Proposed algorithm  Project the objects on a carefully selected “ line ”  Choose O a and O b be “ pivot objects ”

19 Proposed algorithm  compute the distance of each point from the pivot points using only information we know, i.e., the distances between objects

20 Proposed algorithm OaOb Oi Xi

21 Proposed algorithm  By Cosine Law, in any triangle O a O i O b d b,i 2 = d a,i 2 + d a,b 2 – 2x i d a,b  d i,j the shorthand for the distance D (O i, O j )

22 Proposed algorithm  By simple math manipulation Xi = (d a,i 2 + d a,b 2 - d b,i 2 ) / 2d a,b  We can map objects into points on a line, preserving some of the distance information

23 Proposed algorithm  Solved 2-d space  Extend to higher dimensions

24 Proposed algorithm  Determines the coordinates of the N objects on a new axis, after each of k recursive calls  Record the “ pivot objects ” in each recursive call is to facilitate queries  Choose pivots objects by heuristic algorithm

25 Proposed algorithm  All steps are linear  Complexity is O(N k)

26 ExperimentsExperiments  Compare FastMap with MDS  speed and quality  Illustrate the visualization and clustering abilities  real and synthetic datasets

27 Comparison with MDS  Response time vs. no. of database size

28 Comparison with MDS  Response time vs. no. of dimensions k

29 Comparison with MDS  Response time vs. stress

30 Clustering/visualization properties of FastMap

31

32 ConclusionConclusion  A fast algorithm to map objects into points in k-d space  Accelerate searching by highly optimized SAMs e.g. R-trees, R*-trees etc.  Application of the algorithm to multimedia database, data-mining, clustering and document retrieval etc.

33 ReferenceReference  Christos Faloutsos, King-Ip (David) Lin FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets  Joseph B. Kruskal, Myron Wish Multidimensional scaling


Download ppt "FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets."

Similar presentations


Ads by Google