Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan.

Similar presentations


Presentation on theme: "Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan."— Presentation transcript:

1 Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan

2 Outline Introduction - problem and solution LDA - Linear Discriminant Analysis LR : Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes

3 Introduction- the problem X Group k Observation Or Group l? *We can think of G as “group label” Posteriori Pj=P(G=j|X=x)

4 Introduction- the solution Linear Decision boundary: p k =p l p k >p l  choose K p l >p k  choose L

5 Linear Discriminant Analysis Let P(G = k) =  k and P(X=x|G=k) = f k (x) Then by bayes rule: Decision boundary:

6 Linear Discriminant Analysis Assuming f k (x) ~ gauss(  k,  k ) and  1 =  2 = … =  K =  We get Linear (in x) decision boundary For not common  we get QDA (RDA)

7 Linear Discriminant Analysis Using empirical estimation methods: Top classifier (Michie et al., 1994) – the data supports linear boundaries, stability

8 Logistic Regression Models posterior prob. Of K classes; they sum to one and remain in [0,1]: Linear Decision boundary:

9 Logistic Regression Model fit: In max. ML Newton-Raphson algorithm is used

10 Linear Regression Recall the common features of multivariate regression: +Lack of multicollinearity etc. Here: Assuming N instances (N*p observation matrix X), Y is a N*K indicator response matrix (K classes).

11 Linear Regression

12

13 LDA Vs. LR Similar results, LDA slightly better (56% vs. 67% error rate for LR) Presumably, they are identical because of the linear end-form of decision boundaries (return to see).

14 LDA Vs. LR LDA: parameters fit by max. full log-likelihood based on the joint density which assumes Gaussian density (Efron 1975 – worst case of ignoring gaussianity 30% eff. reduction) Linearity is derived LR: P(X) arbitrary (advantage in model selection and abitility to absorb extreme X values), fits parameters of P(G|X) by maximizing the conditional likelihood. Linearity is assumed

15 In a word – separating hyperplanes


Download ppt "Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan."

Similar presentations


Ads by Google