Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J.

Similar presentations


Presentation on theme: "1 NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J."— Presentation transcript:

1 1 NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J. Douglas Carroll Rutgers Business School – Newark and New Brunswick

2 2 Outline Introduction Nonlinear Mapping Algorithms Parametric Mapping Approach ISOMAP Approach Other Approaches Experimental Design and Methods Error Levels Evaluation of Mapping Performance Problem of Similarity Transformations Results Discussion and Future Direction

3 3 Introduction Problem: To determine a smaller set of variables necessary to account for a larger number of observed variables PCA and MDS are useful when relationship is linear Alternative approaches needed when the relationship is highly nonlinear

4 4 Shepard and Carroll (1966) –Locally monotone analysis of proximities: Nonmetric MDS treating large distances as missing Worked well if the nonlinearities were not too severe (in particular if the surface is not closed such as a circle or sphere) –Optimization of an index of “continuity” or “smoothness” Incorporated into a computer program called “PARAMAP” and tested on various sets of data

5 5 20 points on a circle

6 6 62 regularly spaced points on a sphere, and the azimuthal equidistant projection of the world

7 7 49 points regularly spaced on a torus embedded in four dimensions

8 8 In all cases the local structure is preserved except points at which the shape is “cut open” or “punctured” Results were successful, but severe local minimum problem existed Addition of error to the regular spacing made the local minimum problem worse Current work is stimulated by two articles on nonlinear mapping (Tenenbaum, de Silva, & Langford, 2000; Roweis & Saul, 2000)

9 9 Nonlinear Mapping Algorithms –n : number of objects –M : dimensionality of the input coordinates, in other words of the configuration for which we would like to find an underlying lower dimensional embedding. –R : dimensionality of the space of recovered configuration, where R<M –Y : n  M input matrix –X : n  R output matrix

10 10 –The distances between point i and point j in the input and output spaces respectively are calculated as:   [  ij ] D  [ d ij ]

11 11 Parametric Mapping Approach Works via optimizing an index of “continuity” or “smoothness” Early application in the context of time- series data (von Neuman, Kent, Bellison, & Hart, 1941; von Neuman, 1941)

12 12 A more general expression for the numerator is: Generalizing to the multidimensional case we reach 

13 13 Several modifications needed for the minimization procedure: –d 2 ij + Ce 2 is substituted for d 2 ij, C is a constant equal to 2 / (n - 1) and e takes on values between 0 and 1 –e has a practical effect on accelerating the numerical process –Can be thought of as an extra “specific” dimension, as e gets closer to 0 points are made to approach “common” part of space

14 14 –In the numerator the constant z, and in the denominator [2/n(n  1)] 2 Final form of function:

15 15 Implemented in C++ (GNU-GCC compiler) Program takes as input e, number of repetitions, dimensionality R to be recovered, and number of random starts or starting input configuration 200 iterations each for 100 different random configurations yields reasonable solutions Then this resulting best solution can be further fine tuned by performing more iterations

16 16 ISOMAP Approach Tries to overcome difficulties in MDS by replacing the Euclidean metric by a new metric Figure (Lee, Landasse, & Verleysen, 2002)

17 17 To approximate the “geodesic” distances ISOMAP constructs a neighborhood graph that connects the closer points –This is done by connecting the k closest neighbors or points that are close to each other by  or less distance A shortest path procedure is then applied to the resulting matrix of modified distances Finally classical metric MDS is applied to obtain the configuration in the lower dimensionality

18 18 Other Approaches Nonmetric MDS: Minimizes a cost function Needed to implement locally monotone MDS approach of Shepard (Shepard & Carroll, 1966)

19 19 Sammon’s mapping: Minimizes a mapping error function Kruskal (1971) indicated certain options used with nonmetric MDS programs would give the same results

20 20 Multidimensional scaling by iterative majorization (Webb, 1995) Curvilinear Distance Analysis (CDA) (Lee et al., 2002), analogue of ISOMAP, omits the MDS step replacing it by a minimization step Self organizing map (SOM) (Kohonen 1990, 1995) Auto associative feedforward neural networks (AFN) (Baldi & Hornik, 1989; Kramer, 1991)

21 21 Experimental Design and Methods Primary focus: 62 located at the intersection of 5 equally spaced parallels and 12 equally spaced meridians Two types of error A and B –A: 0%, 10%, 20% –B: ±0.00, ±0.01, ±0.05, ±0.10, ±0.20 Control points being irregularly spaced and being inside or outside the sphere respectively

22 22

23 23 To evaluate mapping performance:We calculate “rate of agreement in local structure”abbreviated “agreement rate” or A –Similar to RAND index used to compare partitions (Rand, 1971; Hubert & Arabie, 1985) – Let a i stand for the number of points that are in the k-nearest neighbor list for point i in both X and Y. A will be equal to

24 24 12345 21121 34432 12345 43213 54454 k=2, Agreement rate = 2/10 or 20 % Example of calculating agreement rate

25 25 Problem of similarity transformations: We use standard software to rotate the different solutions into optimal congruence with a landmark solution (Rohlf & Slice 1989) We use the solution for the error free and regularly spaced sphere as the landmark We report also VAF

26 26 The VAF results may not be very good Similarity transformation step is not enough An alternating algorithm is needed that reorders the points on each of the five parallels and then finds the optimal similarity transformation We also provide Shepard-like diagrams

27 27 Why similarity transformation is not enough?

28 28 Results Agreement rate for the regularly spaced and errorless sphere 82.9%, k=5 Over 1000 randomizations of the solution: Average, and standard deviation of the agreement rate 8.1% and 1.9% respectively. Minimum and maximum are 3.5% and 16.7%

29 29

30 30 We can use Chebychev’s inequality stated as: 82.9 is about 40 standard deviations away from the mean, an upper bound of the probability that this event happens by chance is 1/40 2 or 0.000625, very low!

31 31 (a) (b) (c) (d)

32 32 (e)(f) (g) (h)

33 33 (i)(j) (k)(l)

34 34 (m)(n) (o)

35 35

36 36 A=48.1 % A=82.9% ISOMAP PARAMAP

37 37 Shepard-like Diagrams

38 38 Agreement rate=ISOMAP 59.7%, PARAMAP 70.5% SWISS Roll Data – 130 points

39 39 Discussion and Future Direction Disadvantage of PARAMAP: Run time Advantage of ISOMAP: Noniterative procedure, can be applied to very large data sets with ease Disadvantage of ISOMAP: Bad performance in closed data sets like the sphere

40 40 Improvements in computational efficiency of PARAMAP should be explored: –Use of a conjugate gradient algorithm instead of straight gradient algorithm –Use of conjugate gradient with restarts algorithm –Possible combination of straight gradient and conjugate gradient approaches Improvements that could both benefit ISOMAP and PARAMAP: –A wise selection of landmarks and an interpolation or extrapolation scheme to recover the rest of the data


Download ppt "1 NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J."

Similar presentations


Ads by Google