Presentation is loading. Please wait.

Presentation is loading. Please wait.

Background on Classification

Similar presentations


Presentation on theme: "Background on Classification"— Presentation transcript:

1

2 Background on Classification

3 What is Classification?
Objects of Different Types : Classes Sensing and Digitizing Calculating Properties : Features Mapping Features to Classes

4 Features (term means different things to different people)
A Feature is a quantity used by a classifier A Feature Vector is an ordered list of features Examples: A full spectrum A dimensionality reduced spectrum A differentiated spectrum Vegetation indices

5 Features (terms means different things to different people)
For the math behind Classifiers, feature vectors are thought of as points in n-dimensional space Spectra in 426 dimensional (or 426D) space (NDVI, Nitrogen, ChlorA, ChlorB) in 4D space Dimensionality reduction used to Visualize in 2D or 3D Mitigate the Curse of Dimensionality

6 What is a Classifier A bit more formal
x = Feature Vector (x1, x2, …, xB)’  L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Discrete) classifier is a function

7 What is a Classifier A bit more formal
x = Feature Vector (x1, x2, …, xB)’  L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Continuous) classifier is a function

8 Linear Discriminants Discriminant classifiers are designed to
discriminate between classes Generative model classes

9 Linear Discriminant or Linear Discriminant Analysis
There are many differ types. Here are some: Ordinary Least Squares Ridge Regression Lasso Canonical Perceptron Support Vector Machine (without kernels) Relevance Vector Machine

10 Linear Discriminants Points on left side of lines are in blue class
Points on right side of lines are in red class Which line is best? What does best mean? Blue Class Red Class

11 Linear Discriminants – 2 Classes
bias wk0 x0=1 wk1 x1 S wk2 x2 . . wkm BIG Numbers for Class 1 small Numbers for Class 2 xm Features Weights

12 Example of “Best” Support Vector Machine
Pairwise (2 classes at a time) Maximizes Margin Between Classes Minimizes Objective Function by Solving Quadratic Program

13 yn = w0 +w1xn,1+w2xn,2+…+wBxn,B
Back to Classifiers Definition: Training Given a data set X = {x1, x2, …, xN} Corresponding desired, or target, outputs Y = {y1, y2, …, yN} user defined functional form of a classifier, e.g. yn = w0 +w1xn,1+w2xn,2+…+wBxn,B Estimate the parameters {w0, w1,…, wB} X is called the training set

14 Linear Classifiers Ordinary Least Squares
Continuous Classifier Target Outputs usually {0,1} or {-1,1} Minimize the squared error:

15 Linear Classifiers – Least Squares
How do we minimize? Take derivative and set to zero or

16 Example Rows ~ Spectra X = t = w = pinv(X)*t = X*w =

17 Ridge Regression Ordinary Least Squares + Regularization
Diagonal Loading: Solution:

18 Ridge Regression Diagonal Loading can be crucial for
Ordinary Least Squares Solution: Ridge Regression Solution: Diagonal Loading: Diagonal Loading can be crucial for Numerical Stability

19 Illustrative Example We’ll see value later

20 Notes on Methodology When developing a “Machine Learning” algorithm, one should test it on simulated data first Necessary but not sufficient Necessary: If it doesn’t work on simulated data, then it almost certainly will not work with real data Sufficient: If it works on simulated data, then it may or may not work on real data Question: How do we simulate?

21 Simulating Data Usually use Gaussians because they are often assumed (although not accurate nearly as often) Multivariate Gaussians completely determined by Mean Covariance Matrix

22 Some Single Gaussians in One - Dimension

23 Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Same Y-Axis

24 Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Different Y-Axis

25 Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Different Y-Axis

26 Some Single Gaussians in Two Dimensions

27 Formulas for Gaussians
To generate simulated data, we need to draw samples from these distributions Univariate Gaussian Multivariate Gaussian, e.g. x is a spectrum Covariance Matrix

28 Sample Covariance Definition
Called Outer Product Example Outer Product

29 Covariance Matrices If S is a covariance matrix, then people who need to know can calculate matrices U and D with the properties that S is Diagonalized U is Orthogonal (Like a Rotation) D is Diagonal

30 Generating Covariance Matrices
(1) Any Matrix of the form AtA is a covariance for some distribution So we can do the following: Set A = random square matrix S = AtA (2) So we also can do the following: Make a diagonal matrix D Make a rotation matrix U Make a covariance matrix using by setting S = UtDU We will generate covariance matrices S using Python

31 Go To Python

32 Linear Dimensionality Reduction
PCA: Principal Components Analysis Maximize amount of variance in first k bands compared to all other linear (orthogonal) transforms MNF: Minimum Noise Fraction Minimizes estimate of Noise/Signal or Maximizes estimate of Signal/Noise

33 PCA Start with a data set of spectra or other samples. Implicitly assumed drawn from same distribution. Compute sample mean over all spectra: Compute Sample Covariance: Diagonalize S: PCA is defined to be:

34 PCA – Easy Examples Eigenvector (Columns of V determine major and minor axes New coordinate system is a rotation (U) and shift (x-xbar) of original coordinate system Assumes elliptical, which is Gaussian Eigenvalues determine length of major and minor axes

35 PCA, “Dark Points”, and BRDF
These Black Points are from Oak Trees These Red Points are from Soil in a Baseball Field

36 Go To Python

37 MNF Assumption: Observed Spectrum = Signal + Noise
Want to transform x so is minimized How do we represent this ratio?

38 Noise Variance / Signal Variance
MNF Assume the signal and noise are both random vectors with multivariate Gaussian distributions Assume the noise is zero mean. Equally likely to add or subtract by the same amounts. The noise variance uniquely determines how much the signal is modified by noise. Therefore, we should try to minimize the ratio Noise Variance / Signal Variance

39 MNF – Noise/Signal Ratio
How do we compute it for spectra? 426 bands -> 426 variances and 425*424/2 covariances Dividing element-wise won’t work What should we do? Diagonalize! Covariance of n is diagonalizable: Covariance of x is diagonalizable:

40 MNF – Noise/Signal Ratio
Covariance of n is diagonalizable: Covariance of x is diagonalizable: GOOD NEWS! They can be simultaneously diagonalized. It’s a little complicated by basically looks like this: So

41 MNF: Algorithm Estimate n Calculate Covariance of n
Calculate Covariance of x Calculate Left Eigenvectors and Eigenvalues of Make sure eigenvalues are sorted in order of Big to Little if maximizing Little to Big if minimizing Only keep the Eigenvectors that early in the sort or


Download ppt "Background on Classification"

Similar presentations


Ads by Google