Download presentation

Presentation is loading. Please wait.

Published byWill Doidge Modified over 2 years ago

1
Structure-Based Distance Metric for High-Dimensional Space Exploration with Multi-Dimensional Scaling Jenny Hyunjung Lee , Kevin T. McDonnell, Alla Zelenyuk , Dan Imre, and Klaus Mueller Visual Analytics and Imaging Lab Computer Science Department Stony Brook University and SUNY Korea Computer Science and Mathematics Department Dowling College Chemical and Material Sciences Division Pacific Northwest National Lab

2
MDS and Parallel Coordinates Often used in conjunction MDS gives the overview parallel coordinates allows an inspection of the raw data MDS Parallel Coordinates

3
Cognition-Equivalent Mapping (CEM) MDSParallel Coordinates Not a cognition-equivalent mapping

4
Cognition-Equivalent Mapping (CEM) MDSParallel Coordinates Cognition-equivalent mappingNot a cognition-equivalent mapping

5
CEM In Practice – Same or Different? Different

6
CEM In Practice – Same or Different? Not Different

7
Our Method – Same or Different? Different

8
What’s the Magic? A New, Perceptual Distance Metric

9
Distance in High-D Space MDS optimization function: Euclidian distance measures point-pair error sums all distance in ND distance in 2D

10
Why is The Euclidian Distance less ideal? Perceptual (dis)similarity is not gauged by a Euclidian metric our cognitive faculties look for pattern similarity poly lines with similar pattern signature are deemed closer the equivalent points need to also be closer in the MDS plot need a new perceptual distance metric that gauges this pattern similarity

11
Structural Similarity Index (SSIM) luminance contrast structure x y

12
Structural Similarity Index (SSIM) frequently pooled over 11×11 sliding window luminance contrast structure

13
Example from Image Processing Both images have the same MSE but different SSIM SSIM = 0.91, 0.71, 0.77 Euclidian distance expression is similar to that of MSE hence it can be expected that SSIM will do better when ported original blurred salt + pepper contrast stretched

14
Analogy of SSIM to ND Distance Just like images, polylines have luminance mean contrast structure evaluates the structural similarity after the differences in mean and contrast have been accounted for

15
Cases 1234567812345678

16
Effect of Windowing Procedure order dimensions such that sum of pairwise correlations is maximized use 11-point window Observations purple cluster has higher local variance than blue with windowing it has an equivalent spread in the MDS plot without windowing this effect is averaged out and likewise in MDS not windowedwindowed clusters in G1local variance

17
Achieves Better Cluster Separability Euclidian SSIM 6 40 100 800 # dimensions 8 clusters, 800 data points

18
Mass Spectra Data EuclidianSSIMParallel Coordinates Mass Spectra of Aerosol Particles, 450 D, 2,000 data points noisier (more skew)

19
Mass Spectra Data EuclidianSSIMParallel Coordinates Mass Spectra of Aerosol Particles, 450 D, 2,000 data points third peak)

20
In-Layout Manual Clustering SSIM-based layout clustering has better neighborhood definition Euclidian SSIM Parallel Coordinates

21
Cluster Compression SSIM slightly compresses local neighborhoods but much less than LDA Solution: use Euclidian at the local level EuclidianSSIM LDA

22
Bi-Scale Layout Bi-scale layout algorithm lay out cluster centers via SSIM then lay out cluster points via Euclidian plot onto tiles (partially) remove tile overlaps using a proximity stress algorithm original20% overlap5% overlap tiles resized

23
Effect on Curse of Dimensionality Relative contrast metric: Terms: dist max : maximum distance in a given N-D data distribution dist min : minimum distance in this N-D data distribution m: number of dimensions Curse of dimensionality as m increases, the distances between pairs of data points become increasingly indistinguishable this adversely affects the MDS layout.

24
Effect on Curse of Dimensionality SSIM pushes the curse of dimensionality relative contrast consistently and significantly higher wider spread of distinct distances better separates inter-cluster distances from intra-cluster distances can distinguish clusters easier and more accurately Euclidian SSIM one Gaussian with l,000 points 5 Gaussians with 250 points each

25
Future Work Use SSIM in cluster analysis applications, such as k-means Incorporate into multi-scale and multi-resolution analysis to compare patterns at different levels of scale Confirm our currently more empirical successes with rigorous psycho-physical experiments

26
Questions? Funding provided by: NSF grants 1050477, 0959979, and 1117132 US Department of Energy (DOE) Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences, and Biosciences The IT Consilience Creative Project through the Ministry of Knowledge Economy, Republic of Korea

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google