Background on Classification

Background on Classification

What is Classification?
Objects of Different Types : Classes Sensing and Digitizing Calculating Properties : Features Mapping Features to Classes

Features (term means different things to different people)
A Feature is a quantity used by a classifier A Feature Vector is an ordered list of features Examples: A full spectrum A dimensionality reduced spectrum A differentiated spectrum Vegetation indices

Features (terms means different things to different people)
For the math behind Classifiers, feature vectors are thought of as points in n-dimensional space Spectra in 426 dimensional (or 426D) space (NDVI, Nitrogen, ChlorA, ChlorB) in 4D space Dimensionality reduction used to Visualize in 2D or 3D Mitigate the Curse of Dimensionality

What is a Classifier A bit more formal
x = Feature Vector (x1, x2, …, xB)’ L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Discrete) classifier is a function

What is a Classifier A bit more formal
x = Feature Vector (x1, x2, …, xB)’ L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Continuous) classifier is a function

Linear Discriminants Discriminant classifiers are designed to
discriminate between classes Generative model classes

Linear Discriminant or Linear Discriminant Analysis
There are many differ types. Here are some: Ordinary Least Squares Ridge Regression Lasso Canonical Perceptron Support Vector Machine (without kernels) Relevance Vector Machine

Linear Discriminants Points on left side of lines are in blue class
Points on right side of lines are in red class Which line is best? What does best mean? Blue Class Red Class

Linear Discriminants – 2 Classes
bias wk0 x0=1 wk1 x1 S wk2 x2 . . wkm BIG Numbers for Class 1 small Numbers for Class 2 xm Features Weights

Example of “Best” Support Vector Machine
Pairwise (2 classes at a time) Maximizes Margin Between Classes Minimizes Objective Function by Solving Quadratic Program

yn = w0 +w1xn,1+w2xn,2+…+wBxn,B
Back to Classifiers Definition: Training Given a data set X = {x1, x2, …, xN} Corresponding desired, or target, outputs Y = {y1, y2, …, yN} user defined functional form of a classifier, e.g. yn = w0 +w1xn,1+w2xn,2+…+wBxn,B Estimate the parameters {w0, w1,…, wB} X is called the training set

Linear Classifiers Ordinary Least Squares
Continuous Classifier Target Outputs usually {0,1} or {-1,1} Minimize the squared error:

Linear Classifiers – Least Squares
How do we minimize? Take derivative and set to zero or

Example Rows ~ Spectra X = t = w = pinv(X)*t = X*w =

Ridge Regression Ordinary Least Squares + Regularization
Diagonal Loading: Solution:

Ridge Regression Diagonal Loading can be crucial for
Ordinary Least Squares Solution: Ridge Regression Solution: Diagonal Loading: Diagonal Loading can be crucial for Numerical Stability

Illustrative Example We’ll see value later

Notes on Methodology When developing a “Machine Learning” algorithm, one should test it on simulated data first Necessary but not sufficient Necessary: If it doesn’t work on simulated data, then it almost certainly will not work with real data Sufficient: If it works on simulated data, then it may or may not work on real data Question: How do we simulate?

Simulating Data Usually use Gaussians because they are often assumed (although not accurate nearly as often) Multivariate Gaussians completely determined by Mean Covariance Matrix

Some Single Gaussians in One - Dimension

Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Same Y-Axis

Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Different Y-Axis

Some Single Gaussians in Two Dimensions

Formulas for Gaussians
To generate simulated data, we need to draw samples from these distributions Univariate Gaussian Multivariate Gaussian, e.g. x is a spectrum Covariance Matrix

Sample Covariance Definition
Called Outer Product Example Outer Product

Covariance Matrices If S is a covariance matrix, then people who need to know can calculate matrices U and D with the properties that S is Diagonalized U is Orthogonal (Like a Rotation) D is Diagonal

Generating Covariance Matrices
(1) Any Matrix of the form AtA is a covariance for some distribution So we can do the following: Set A = random square matrix S = AtA (2) So we also can do the following: Make a diagonal matrix D Make a rotation matrix U Make a covariance matrix using by setting S = UtDU We will generate covariance matrices S using Python

Go To Python

Linear Dimensionality Reduction
PCA: Principal Components Analysis Maximize amount of variance in first k bands compared to all other linear (orthogonal) transforms MNF: Minimum Noise Fraction Minimizes estimate of Noise/Signal or Maximizes estimate of Signal/Noise

PCA Start with a data set of spectra or other samples. Implicitly assumed drawn from same distribution. Compute sample mean over all spectra: Compute Sample Covariance: Diagonalize S: PCA is defined to be:

PCA – Easy Examples Eigenvector (Columns of V determine major and minor axes New coordinate system is a rotation (U) and shift (x-xbar) of original coordinate system Assumes elliptical, which is Gaussian Eigenvalues determine length of major and minor axes

PCA, “Dark Points”, and BRDF
These Black Points are from Oak Trees These Red Points are from Soil in a Baseball Field

Go To Python

MNF Assumption: Observed Spectrum = Signal + Noise
Want to transform x so is minimized How do we represent this ratio?

Noise Variance / Signal Variance
MNF Assume the signal and noise are both random vectors with multivariate Gaussian distributions Assume the noise is zero mean. Equally likely to add or subtract by the same amounts. The noise variance uniquely determines how much the signal is modified by noise. Therefore, we should try to minimize the ratio Noise Variance / Signal Variance

MNF – Noise/Signal Ratio
How do we compute it for spectra? 426 bands -> 426 variances and 425*424/2 covariances Dividing element-wise won’t work What should we do? Diagonalize! Covariance of n is diagonalizable: Covariance of x is diagonalizable:

MNF – Noise/Signal Ratio
Covariance of n is diagonalizable: Covariance of x is diagonalizable: GOOD NEWS! They can be simultaneously diagonalized. It’s a little complicated by basically looks like this: So

MNF: Algorithm Estimate n Calculate Covariance of n
Calculate Covariance of x Calculate Left Eigenvectors and Eigenvalues of Make sure eigenvalues are sorted in order of Big to Little if maximizing Little to Big if minimizing Only keep the Eigenvectors that early in the sort or

Background on Classification

Similar presentations

Presentation on theme: "Background on Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Background on Classification

Similar presentations

Presentation on theme: "Background on Classification"— Presentation transcript:

Similar presentations

About project

Feedback