# Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2010.

## Presentation on theme: "Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2010."— Presentation transcript:

Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2010

3a. Aula Parte B

Objetivos desta aula n Apresentar mais duas técnicas de Statistical Machine Learning: –PCA. –LDA e MLDA. n Aula de hoje: –Capítulos 3 e 4 do Hastie. –A Tutorial on Principal Components Analysis - Lindsay I Smith. –Wikipedia.

Linear Discriminant Analysis

Introduction n Ronald A. Fisher, 1936. “The elaborate mechanism built on the theory of infinitely large samples is not accurate enough for simple laboratory data. Only by systematically tackling small sample problems on their merits does it seem possible to apply accurate tests to practical data.”

Introduction n What is LDA? n Linear Discriminant Analysis, or simply LDA, is a well-known feature extraction technique that has been used successfully in many statistical pattern recognition problems. n LDA is often called Fisher Discriminant Analysis (FDA).

Motivation n The primary purpose of LDA is to separate samples of distinct groups by: –maximising their between-class separability while, –minimising their within-class variability. n It assumes that the true covariance matrices of each class are equal because the same within-class scatter matrix is used for all the classes.

Geometric Idea n Minimise n Maximise

LDA Method n First, let’s define: –The sample mean: –Sample covariance: –Grand mean vector:

Method (cont.) Let the between-class scatter matrix S b be defined as: and the within-class scatter matrix S w be defined as: where x i,j is the n -dimensional pattern j from class  i, N i is the number of training examples from class  i, and g is the total number of classes or groups.

Method (cont.) The main objective of LDA is to find a projection matrix P lda that maximises the ratio of the determinant of S b to the determinant of S w (Fisher’s criterion), that is:

Method (cont.) It has been shown that P lda is in fact the solution of the following eigensystem problem: Multiplying both sides by the inverse of S w

Standard LDA If S w is a non-singular matrix then the Fisher’s criterion is maximised when the projection matrix P lda is composed of the eigenvectors of with at most (g-1) nonzero corresponding eigenvalues.

Geometric Idea 14

LDA versus Regression 15

LDA versus PCA n LDA seeks directions that are efficient for discriminating data whereas PCA seeks directions that are efficient for representing data. n The directions that are discarded by PCA might be exactly the directions that are necessary for distinguishing between groups.

LDA versus PCA PCA:     ) LDA: 

18 Example: Vowel training

19 Example: Vowel training

Limited Sample Size Problem n The performance of the standard LDA can be seriously degraded if there are only a limited number of total training observations N compared to the dimension of the feature space n. –Since S w is a function of (N - g) or fewer linearly independent vectors, its rank is (N - g) or less. Therefore, S w is a singular matrix if N is less than (n+g), or, analogously might be unstable if N >> n.

Limited Sample Size Problem n LDA has a drawback: –It requires the within-class covariance matrix to be nonsingular. n For this reason, when the number of features is greater or equal to the number of examples LDA cannot be applied without dimension reduction.

So… n Any idea of how we can overcome that?

Two-stage feature extraction technique n First the n-dimensional training samples from the original vector space are projected to a lower dimensional space using PCA. n Then LDA is applied next to find the best linear discriminant features on that PCA subspace. –This is often called the Most Discriminant Features (MDF) method.

Two-stage feature extraction technique n Then LDA is applied next to find the best linear discriminant features on that PCA subspace. –This is often called the Most Discriminant Features (MDF) method.

Two-stage feature extraction technique (cont.) n Thus, the Fisher’s criterion is maximised when the projection matrix P lda is composed of the eigenvectors of: with at most (g – 1) nonzero eigenvalues. n Therefore the singularity of S w is overcome if

Maximum uncertainty Linear Discriminant Analysis -MLDA n Thomaz et al. suggested expanding the smaller eigenvalues of the within-class covariance matrix in LDA, keeping unchanged most of the larger eigenvalues (which contain most of the relevant information). –This expansion is carried out by replacing the eigenvalues that are less than the average of all eigenvalues by the latter.

Maximum uncertainty Linear Discriminant Analysis -MLDA n Let us consider the issue of stabilising the Sw estimate with a multiple of the (n x n) identity matrix I. n Since the estimation errors of the non- dominant or small eigenvalues are much greater than those of the dominant or large eigenvalues, we can propose the following selection algorithm.

MLDA Algorithm The algorithm expands the smaller (less reliable) eigenvalues of S w and keeps most of its larger eigenvalues unchanged.

Geometric Idea It is reasonable to expect that the Fisher’s linear basis found by minimising a more difficult “inflated” S w estimate would also minimise a less reliable “shrivelled” S w.

Example: Neonatal Brain Analysis n Given a neonatal MR brain data set that contains images of 67 preterm infants and 12 term control ones.

PCA Analysis

PCA + MLDA Analysis

Matlab Example: Fisher Iris n The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Aylmer Fisher (1936) as an example of discriminant analysis. –Quantify the geographic variation of Iris flowers in the Gaspé Peninsula –The output is qualitative (species of Iris) finite set G = {Virginica, Setosa and Versicolor}. 33

Iris Setosa 34

Iris Versicolor 35

Iris Virginica 36 http://en.wikipedia.org/wiki/Iris_flower_data_set

Fisher's Iris Data – Classification Problem 37 http://en.wikipedia.org/wiki/Iris_flower_data_set

Classify n class = classify(sample,training,group) –Classifies each row of the data in sample into one of the groups in training. –sample and training must be matrices with the same number of columns. –group is a grouping variable for training. 38

Classify n class = classify(sample,training,group,'ty pe') –Specify the type of discriminant function. –type is one of: linear: LDA - This is the default. diaglinear: Similar to linear, but with a diagonal covariance matrix (Naive Bayes classifiers). quadratic diagquadratic mahalanobis. 39

Matlab code load fisheriris; SL = meas(51:end,1); SW = meas(51:end,2); group = species(51:end); h1 = gscatter(SL,SW,group,'rb','v^',[], 'off'); set(h1,'LineWidth',2); legend('Fisher versicolor','Fisher virginica', 'Location','NW') 40

Fisher’s Iris 41

Classify [X,Y] = meshgrid(linspace(4.5,8),linspace( 2,4)); X = X(:); Y = Y(:); [C,err,P,logp,coeff] = classify([X Y],[SL SW], group,'linear'); 42

Plotting the result hold on; gscatter(X,Y,C,'rb','.',1,'off'); K = coeff(1,2).const; L = coeff(1,2).linear; Q = coeff(1,2).quadratic; f = @(x,y) K + [x y]*L + sum(([x y]*Q).* [x y], 2); 43

Plotting the result h2 = ezplot(f,[4.5 8 2 4]); set(h2,'Color','m','LineWidth',2) axis([4.5 8 2 4]) xlabel('Sepal Length') ylabel('Sepal Width') title('{\bf Classification with Fisher Training Data}') 44

LDA 45

Diglinear (Naive Bayes) 46

Quadratic Discriminant Analysis 47 http://www.mathworks.com/hel p/toolbox/stats/classify.html