Dimensionality Reduction Motivation I: Data Compression Machine Learning.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Indian Statistical Institute Kolkata
Computer Vision – Image Representation (Histograms)
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 3 Data Exploration and Dimension Reduction 1.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CSE 185 Introduction to Computer Vision Face Recognition.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Chapter 4 –Dimension Reduction Data Mining for Business Analytics Shmueli, Patel & Bruce.
Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Component Analysis (PCA)
Dimensionality Reduction
Lecture 8:Eigenfaces and Shared Features
CS 2750: Machine Learning Dimensionality Reduction
Face Recognition and Feature Subspaces
Information Management course
Recognition: Face Recognition
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Machine Learning Dimensionality Reduction
Logistic Regression Classification Machine Learning.
Principal Component Analysis
Object Modeling with Layers
Probabilistic Models with Latent Variables
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Introduction PCA (Principal Component Analysis) Characteristics:
Parallelization of Sparse Coding & Dictionary Learning
Cheng-Yi, Chuang (莊成毅), b99
Principal Components Analysis
Dimensionality Reduction
INTRODUCTION TO Machine Learning
Feature space tansformation methods
Feature Selection Methods
Machine Learning – a Probabilistic Perspective
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Lecture 16. Classification (II): Practical Considerations
Chapter 4 –Dimension Reduction
Presentation transcript:

Dimensionality Reduction Motivation I: Data Compression Machine Learning

Data Compression (inches) (cm) Reduce data from 2D to 1D

Data Compression Reduce data from 2D to 1D (inches) (cm)

Data Compression Reduce data from 3D to 2D

Dimensionality Reduction Motivation II: Data Visualization Machine Learning

Data Visualization [resources from en.wikipedia.org] Country GDP (trillions of US$) Per capita GDP (thousands of intl. $) Human Develop- ment Index Life expectancy Poverty Index (Gini as percentage) Mean household income (thousands of US$)… Canada … China … India … Russia … Singapore … USA … …………………

Data Visualization Country Canada China India Russia Singapore USA21.5 ………

Data Visualization

Dimensionality Reduction Principal Component Analysis problem formulation Machine Learning

Principal Component Analysis (PCA) problem formulation The red line is the projections error PCA tries to minimize the projections error

Principal Component Analysis (PCA) problem formulation The red line is the projections error Example for large projection error

Principal Component Analysis (PCA) problem formulation Reduce from 2-dimension to 1-dimension: Find a direction (a vector ) onto which to project the data so as to minimize the projection error. Reduce from n-dimension to k-dimension: Find vectors onto which to project the data, so as to minimize the projection error.

PCA is not linear regression

Dimensionality Reduction Principal Component Analysis algorithm Machine Learning

Training set: Preprocessing (feature scaling/mean normalization): Data preprocessing Replace each with. If different features on different scales (e.g., size of house, number of bedrooms), scale features to have comparable range of values.

Principal Component Analysis (PCA) algorithm Reduce data from 2D to 1D Reduce data from 3D to 2D

Principal Component Analysis (PCA) algorithm Reduce data from -dimensions to -dimensions Compute “covariance matrix”: Compute “eigenvectors” of matrix : [U,S,V] = svd(Sigma);

Covariance Matrix Covariance measures the degree to which two variables change or vary together (i.e. co-vary). On the one hand, the covariance of two variables is positive if they vary together in the same direction relative to their expected values (i.e. if one variable moves above its expected value, then the other variable also moves above its expected value). On the other hand, if one variable tends to be above its expected value when the other is below its expected value, then the covariance between the two variables is negative. If there is no linear dependency between the two variables, then the covariance is 0.

Principal Component Analysis (PCA) algorithm From, we get: [U,S,V] = svd(Sigma)

Principal Component Analysis (PCA) algorithm summary After mean normalization (ensure every feature has zero mean) and optionally feature scaling: Sigma = [U,S,V] = svd(Sigma); Ureduce = U(:,1:k); z = Ureduce’*x;

Dimensionality Reduction Reconstruction from compressed representation Machine Learning

Reconstruction from compressed representation

Dimensionality Reduction Choosing the number of principal components Machine Learning

Choosing (number of principal components) Average squared projection error: Total variation in the data: Typically, choose to be smallest value so that “99% of variance is retained” (1%)

Choosing (number of principal components) Algorithm: Try PCA with Compute Check if [U,S,V] = svd(Sigma)

Choosing (number of principal components) [U,S,V] = svd(Sigma) Pick smallest value of for which (99% of variance retained)

Dimensionality Reduction Advice for applying PCA Machine Learning

Supervised learning speedup Extract inputs: Unlabeled dataset: New training set: Note: Mapping should be defined by running PCA only on the training set. This mapping can be applied as well to the examples and in the cross validation and test sets.

Application of PCA -Compression -Reduce memory/disk needed to store data -Speed up learning algorithm -Visualization

Bad use of PCA: To prevent overfitting Use instead of to reduce the number of features to Thus, fewer features, less likely to overfit. This might work OK, but isn’t a good way to address overfitting. Use regularization instead.

PCA is sometimes used where it shouldn’t be Design of ML system: -Get training set -Run PCA to reduce in dimension to get -Train logistic regression on -Test on test set: Map to. Run on How about doing the whole thing without using PCA? Before implementing PCA, first try running whatever you want to do with the original/raw data. Only if that doesn’t do what you want, then implement PCA and consider using.