Presentation is loading. Please wait.

Presentation is loading. Please wait.

This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.

Similar presentations


Presentation on theme: "This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is."— Presentation transcript:

1 This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is probability Equation means: “How does the probability of an item being a member of group change, given evidence x” Bayesian Discriminant Analysis Prior probability This can be a problem!

2 Bayesian Discriminant Analysis Bayes’ rule can be turned into a classification rule: => Choose group 1 *If priors are both 0.5, decision boundaries are where curves cross

3 If the data is multivariate normal drawn from the same population, the decision rule becomes: Bayes-Gaussian Discriminant Analysis slope intercept with the “distance” defined as: and Note that if the data is just 1D this is just an equation for a line: Like an average cov mat

4 If the data is multivariate normal but drawn from different populations, the decision rule is the same but the “decision distance” becomes: Bayes-Gaussian Discriminant Analysis b c Note that if the data is just 1D this is an equation for a parabola: New quadratic term a

5 The “quadratic” version is always called quadratic discriminant analysis, QDA The “linear” version is called by a number of names! linear discriminant analysis, LDA Some combination of of the above with the words, Gaussian or classification A number of techniques use the name LDA! Important to specify the equations used to tell the difference! Bayes-Gaussian Discriminant Analysis

6 Groups have similar covariance structure: linear discriminant rule should work well Groups have different covariance structure: quadratic discriminant rule may work better

7 This supervised technique is called Linear Discriminant Analysis (LDA) in R Also called Fisher linear discriminant analysis CVA is closely related to linear Bayes-Gaussian discriminant analysis Works on a principle similar to PCA: Look for “interesting directions in data space” CVA: Find directions in space which best separate groups. Technically: find directions which maximize ratio of between group to within variation Canonical Variate Analysis

8 Project on PC1: Not necessarily good group separation! Project on CV1: Good group separation! Note: There are #groups -1 or p CVs which ever is smaller

9 Use between-group to within-group covariance matrix, W -1 B to find directions of best group separation (CVA loadings, A cv ): CVA can be used for dimension reduction. Caution! These “dimensions” are not at right angles (i.e. not orthogonal) CVA plots can thus be distorted from reality Always check loading angles! Caution! CVA will not work well with very correlated data Canonical Variate Analysis

10 2D CVA of gasoline data set: 2D PCA of gasoline data set:

11 Distance metric used in CVA to assign group i.d. of an unknown data point: If data is Gaussian and group covariance structures are the same then CVA classification is the same as Bayes-Gaussian classification. Canonical Variate Analysis

12 PLS-DA is a supervised discrimination technique and very popular in chemometrics Works well with highly correlated variables (like in spectroscopy) Lots of correlation causes CVA to fail! Group labels coded into a “response matrix” Y PLS searches for directions of maximum covariance in X and Y. Loading for X can be used like PCA loading Dimension reduction Loading plots Partial Least Squares Discriminant Analysis

13 PLS-DA theory: Find an (approximate) linear relationship between experimental (explanatory) variables and group labels (response variables) Y=XB+E X=TP T +E X Y=UQ T +E Y So substituting: UQ T =TP T B+E exp. vars. lbls. “error” or “residuals” matrix PLS-scores PLS-loadings *Use these “Y-scores” with a “soft-max” or “Bayes” to pick “most-likely” group label

14 Partial Least Squares Discriminant Analysis How do we solve this for T, P and U?? Objective: maximize covariance between X and Y scores, T and U. Various procedure to do this: Kernel-PLS SIMPLS NIPLS Give close, but slightly different numerical results In R, functions are: plsr (pls package) spls (spls package) Easiest: plsda (caret pakage)

15 Partial Least Squares Discriminant Analysis 2D PLS of gasoline data set: 2D PCA of gasoline data set:

16 Group assignments of observation vectors are made by interpreting Y scores. Typically “soft-max” function is used. Partial Least Squares Discriminant Analysis Observation Vectors Y-scores


Download ppt "This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is."

Similar presentations


Ads by Google