Generally Discriminant Analysis

Generally Discriminant Analysis
報告 : 張志豪日期 : 2004/10/29

Outline Introduction Classification Feature Space Transformation
Criterion

Introduction Discriminant Analysis
discriminant (classification, predictor) 當已知class的分佈時, 如果有新的樣本進來, 則可以利用所選定的鑑別方法, 來判定新樣本歸於哪個類別中事實上, 我們沒有辦法知道class真實的分佈, 只好利用train data來統計class的分佈 Train data越多, 相對的, 統計的越準在純統計的情況下, discriminant analysis只是用來做predictor, 並沒有使用到feature space transformation ?? 都是兩個class的判別

Introduction Component of Discriminant Analysis Classification
Linear, Quadratic, Nonlinear (Kernel) Feature Space Transformation Linear Criterion ML, MMI, MCE

Introduction Exposition 先統計labeled資料間class的分佈
在feature space中, class的分佈有所重疊, 所以利用feature space transformation來改變feature space 利用某種criterion來找出最合適的轉換基底當有新pattern進來後, 可利用統計到(轉換後)的分佈來做predictor feature space transformation 以LDA為例

Introduction Criterion linear quadratic nonlinear ML LDA HDA FDA
Kernel LDA MMI MMI LDA MCE MCE LDA

Classification Outline
Linear Discriminant Analysis & Quadratic Discriminant Analysis Linear Discriminant Analysis Quadratic Discriminant Analysis Problem Practice Flexible Discriminant Analysis Linear Discriminant Analysis → Multivariable Linear Regressions Parametric → Non-Parametric Kernel LDA

Classification Linear Discriminant Analysis
A simple application of Bayes theorem gives us Assumption : Class is single Gaussian distribution.

Classification (count.) In comparing two classes k and l, it is sufficient to look at the log-ratio Assumption : Common covariance Intuition : Classify Two class

Classification (count.) This linear log-odds function implies that the decision boundary between classes k and l is linear in x. This is of course true for any pair of classes, so all the decision boundaries are linear. The linear discriminant functions

Classification Quadratic Discriminant Analysis
If the covariance are not assumed to be equal, then the convenient cancellations do not occur; in particular the pieces quadratic in x remain. The quadratic discriminant functions

Classification LDA & QDA
Parametric LDA : P2 QDQ : JP2 Accuracy LDA is mismatch. 圓圈的大小代表著分佈散設的程度 LDA QDA

Classification Problem
How do we use a linear discriminant when we have more than two classes ? There are two approaches : Learn one discriminant function for each class Learn a discriminant function for all pairs of classes => If c is the number of classes, in the first case we have c functions and in the second c(c-1) / 2 functions. => In both cases we are left with ambiguous regions.

ambiguous regions we can use linear machines: We define c linear discriminant functions and choose the one with highest value for a given x.

Classification Conclusion : LDA & QDA
Classify x to kth class : common covariance posterior prob 為最大 linear discriminant function score 為最大 QDA Classify x to kth class : variant covariance quadratic discriminant function score 為最大

Classification Practice : LDA
Mind 希望經過特徵空間轉換後, class間可以較容易做線性鑑別 Component Classification Linear decision boundaries Feature Space Transformation Linear : Criterion ML

Linear transformation Likelihood is same, but scale is larger.

Maximum likelihood criterion => assumed Single Gaussian distribution Class prior prob. are the same Diagonal and Common covariance (within-class) Lack of classification information is equivalent distribution (total-class) Lack of classification information : 假設所有維度中, 有一部分是不含有classification information. 藉由這個假設, 我們可以把其拿掉 JHU有證明, Appendix C

Intuition T = B+W B is between-class covariance. W is within-class covariance. is transformation matrix.

Classification Practice : HDA
Mind 希望經過特徵空間轉換後, class間可以較容易做二次式的鑑別 Component Classification Quadratic decision boundaries Feature Space Transformation Linear : Criterion ML

Maximum likelihood criterion => assumed Single Gaussian distribution Class prior prob. are the same Diagonal covariance Lack of classification information is equivalent distribution JHU use the steepest-descent algorithm Cambridge useing semi-tied is guaranteed to find a locally optimal solution and to be stable.

Intuition T = B+W B is between-class covariance. W is within-class covariance. is transformation matrix. HDA is worse than LDA.

Classification Practice
Problem Linear transformation為何有效? Information theory It is impossible to create new information by transforming data, transformations can only lead to information loss. => One finds the K < P dimensional subspace of Rp in which the group centroids are most separated. Single muti-dimensional gaussian 當每個class只用一個Gaussian來紀錄時, 可以classify的好, 那麼當每個class使用mixture Gaussian來紀錄, 直覺的, 可以classify更好 ?? Observation probability is classification ?

Classification LDA : Linear Regression
mind Linear discriminant analysis is equivalent to multi-response linear regression using optimal scorings to represent the groups. In this way, any multi-response regression technique can be post-processed to improve their classification performance. We obtain nonparametric versions of discriminant analysis by replacing linear regression by any nonparametric regression method. 迴歸分析為迴歸分析 : 探討各變數之間的關係, 並找出一適當的數學方程式表示其關係, 進而藉由該方程式預測未來根據某變數來預測另一變數. 迴歸分析是以相關分析為基礎, 因任何預測的可靠性是依變數間關係的強度而有所不同

Suppose is a function that assigns scores to the classes, such that the transformed class labels are optimally predicted by linear regression on X. So we have to choose and such that It gives a one-dimensional separation between classes. Least squares estimator

Multi-Response Linear Regression Independent scoring labels : Linear maps the scores and the maps are chosen to minimum (1) 第i個observation投影到第k維的值第i個observation的label在第k維的分數(mean ??) The set of scores are assumed to be mutually orthogonal and normalized.

Multi-Response Linear Regression (count.) It can be show [Mardia79, Hastie95] that The are equivalent up to a constant to the fisher discriminant coordinates The Mahalanobis distances can be derived from the ASR solutions LDA can be performed by a sequence of linear regressions, followed by a classification in the space of fits (Mardia, Kent and Bibby, 1979)

Multi-Response Linear Regression (count.) Let Y be the N*J indicator matrix corresponding to the dummy-variable coding for the classes. That is, the ijth element of Y is 1 if the ith observation falls in class j, and 0 otherwise. Let , be a matrix of K score vectors for the J classes. be the N*K matrix of transformed values of the classes with ikth element Y

Solution 1 Looking at (1), it is clear that if the scores were fixed we could minimize ASR by regressing on x. If we let project onto the column space of the predictors, this says (2)

Solution 1 (count.) If we assume the scores have mean zero, unit variance and are uncorrelated for the N observations. Minimizing (2) amounts to finding the K largest eigenvectors , with normalization , where , a diagonal matrix of the sample class proportions We could do this by constructing the matrix , computing , and then calculating its eigenvectors. But a more convenient approach avoids explicit construction of and takes advantage of the fact that computes a linear regression. 為(N*N) matrix, 太大了, 沒有辦法建構

Solution 2 Y : (N*J), 正確答案 > class與observation的關係 : (N*J), 預測的結果 > observation與class的關係 YT : (J*J), covariance?? -> class與class的關係 B : (P*J), coefficient matrix -> 維度與class的關係 XTY X : (N*P), training data > observation與維度的關係 It turns out that LDA amounts to the regression followed by an eigen-decomposition of

Solution 2 (count.) The final coefficient matrix B is, up to a diagonal scale matrix, the same as the discriminant analysis coefficient matrix is the kth largest eigenvalue computed in step 3 above. LDA transformation matrix

Classification Flexible Discriminants Analysis
Nonparametric version We replace the linear-projection operator by a nonparametric regression procedure, which we denote by the linear operator S. One simple and effective approach to this end is to expand X into a larger set of basis variables h(X), and then simply use in place of 凡是有內積運算都可以套用kernel fuction

Non-Parametric Algorithm (count.)

Classification Kernel LDA
Linear Discriminant Analysis

Kernel Linear Discriminant Analysis

Kernel Linear Discriminant Analysis (count.)

Kernel Linear Discriminant Analysis (count.) This problem can be solved by finding the leading eigenvector of N-1M. The projection of a new pattern x onto w is given by

Generally Discriminant Analysis

Similar presentations

Presentation on theme: "Generally Discriminant Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Generally Discriminant Analysis

Similar presentations

Presentation on theme: "Generally Discriminant Analysis"— Presentation transcript:

Similar presentations

About project

Feedback