Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimensionality Reduction Motivation I: Data Compression Machine Learning.

Similar presentations


Presentation on theme: "Dimensionality Reduction Motivation I: Data Compression Machine Learning."— Presentation transcript:

1 Dimensionality Reduction Motivation I: Data Compression Machine Learning

2 Data Compression (inches) (cm) Reduce data from 2D to 1D

3 Data Compression Reduce data from 2D to 1D (inches) (cm)

4 Data Compression Reduce data from 3D to 2D

5 Dimensionality Reduction Motivation II: Data Visualization Machine Learning

6 Data Visualization [resources from en.wikipedia.org] Country GDP (trillions of US$) Per capita GDP (thousands of intl. $) Human Develop- ment Index Life expectancy Poverty Index (Gini as percentage) Mean household income (thousands of US$)… Canada1.57739.170.90880.732.667.293… China5.8787.540.6877346.910.22… India1.6323.410.54764.736.80.735… Russia1.4819.840.75565.539.90.72… Singapore0.22356.690.8668042.567.1… USA14.52746.860.9178.340.884.3… …………………

7 Data Visualization Country Canada1.61.2 China1.70.3 India1.60.2 Russia1.40.5 Singapore0.51.7 USA21.5 ………

8 Data Visualization

9 Dimensionality Reduction Principal Component Analysis problem formulation Machine Learning

10 Principal Component Analysis (PCA) problem formulation The red line is the projections error PCA tries to minimize the projections error

11 Principal Component Analysis (PCA) problem formulation The red line is the projections error Example for large projection error

12 Principal Component Analysis (PCA) problem formulation Reduce from 2-dimension to 1-dimension: Find a direction (a vector ) onto which to project the data so as to minimize the projection error. Reduce from n-dimension to k-dimension: Find vectors onto which to project the data, so as to minimize the projection error.

13 PCA is not linear regression

14

15 Dimensionality Reduction Principal Component Analysis algorithm Machine Learning

16 Training set: Preprocessing (feature scaling/mean normalization): Data preprocessing Replace each with. If different features on different scales (e.g., size of house, number of bedrooms), scale features to have comparable range of values.

17 Principal Component Analysis (PCA) algorithm Reduce data from 2D to 1D Reduce data from 3D to 2D

18 Principal Component Analysis (PCA) algorithm Reduce data from -dimensions to -dimensions Compute “covariance matrix”: Compute “eigenvectors” of matrix : [U,S,V] = svd(Sigma);

19 Covariance Matrix Covariance measures the degree to which two variables change or vary together (i.e. co-vary). On the one hand, the covariance of two variables is positive if they vary together in the same direction relative to their expected values (i.e. if one variable moves above its expected value, then the other variable also moves above its expected value). On the other hand, if one variable tends to be above its expected value when the other is below its expected value, then the covariance between the two variables is negative. If there is no linear dependency between the two variables, then the covariance is 0.

20 Principal Component Analysis (PCA) algorithm From, we get: [U,S,V] = svd(Sigma)

21 Principal Component Analysis (PCA) algorithm summary After mean normalization (ensure every feature has zero mean) and optionally feature scaling: Sigma = [U,S,V] = svd(Sigma); Ureduce = U(:,1:k); z = Ureduce’*x;

22 Dimensionality Reduction Reconstruction from compressed representation Machine Learning

23 Reconstruction from compressed representation

24 Dimensionality Reduction Choosing the number of principal components Machine Learning

25 Choosing (number of principal components) Average squared projection error: Total variation in the data: Typically, choose to be smallest value so that “99% of variance is retained” (1%)

26 Choosing (number of principal components) Algorithm: Try PCA with Compute Check if [U,S,V] = svd(Sigma)

27 Choosing (number of principal components) [U,S,V] = svd(Sigma) Pick smallest value of for which (99% of variance retained)

28 Dimensionality Reduction Advice for applying PCA Machine Learning

29 Supervised learning speedup Extract inputs: Unlabeled dataset: New training set: Note: Mapping should be defined by running PCA only on the training set. This mapping can be applied as well to the examples and in the cross validation and test sets.

30 Application of PCA -Compression -Reduce memory/disk needed to store data -Speed up learning algorithm -Visualization

31 Bad use of PCA: To prevent overfitting Use instead of to reduce the number of features to Thus, fewer features, less likely to overfit. This might work OK, but isn’t a good way to address overfitting. Use regularization instead.

32 PCA is sometimes used where it shouldn’t be Design of ML system: -Get training set -Run PCA to reduce in dimension to get -Train logistic regression on -Test on test set: Map to. Run on How about doing the whole thing without using PCA? Before implementing PCA, first try running whatever you want to do with the original/raw data. Only if that doesn’t do what you want, then implement PCA and consider using.


Download ppt "Dimensionality Reduction Motivation I: Data Compression Machine Learning."

Similar presentations


Ads by Google