Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.

Similar presentations


Presentation on theme: "Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory."— Presentation transcript:

1 Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory

2 Reduce complexity Visual Computational Identify the intrinsic dimensionality of data Identify the most relevant aspects of data given a task

3 Lower Dimension Higher Dimension

4 a) b) Not all projections are equal

5 Desired properties Reduced, compressed representation Preserved useful/intrinsic properties of the data Applify patterns of interest (e.g. outliers) Simple, interpretable Trade-off between simplicity and preservation of structure

6 Helps us organize the data Helps us discriminate patterns

7 Manhattan distance (1 norm, taxicab distance) Euclidean distance (2 norm)

8 L-p Distance As p grows the largest coordinate distances tends to dominate the global distance

9

10 Projective methods: preserve a property of data Principal Component Analysis (PCA) Many others: ICA, Factor Analysis, Manifold Learning Multidimensional Dimension Reduction (MDS) LLE, Isomap

11 Goal: Find a linear projection that captures most of variance 1 st Principal Component 2 nd Principal Component 1 st Principal Component

12 PCA pseudo code: Centralize the data by subtracting the mean Calculate the covariance matrix: Calculate the eigenvectors(principal components) of the covariance matrix Select top few(2-3) eigenvectors (highest eigenvalues) Project the data using these eigenvectors as axis

13 Screeplot Biplot

14 Goal: Find a lower embedding of the data that preserves pairwise distances Formally: : Input distance values : Output distances values

15

16 Shepard Diagram MDS Distances Data Distances

17 More features are not necessarily better Understand the assumptions of different modeling choices When choosing distance functions, projection methods Consider the characteristics of the data Consider the learning objective Explore multiple choices simultaneously to gain better insight

18 http://statweb.stanford.edu/~jtaylo/courses/stats202/mds.html https://planspacedotorg.wordpress.com/2013/02/03/pca-3d-visualization-and- clustering-in-r/ Multidimensional Scaling, Leland Wilkinson Dimension Reduction: A Guided Tour, Christopher J.C. Burgesti When is “nearest neighbor” meaningful?, Beyer, K.S., GoldStein, J. Ramakrishnan, R. & Shaft g, by

19 The effect of concentration of distances Lower DimensionHigher Dimension


Download ppt "Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory."

Similar presentations


Ads by Google