Presentation is loading. Please wait.

Presentation is loading. Please wait.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Similar presentations


Presentation on theme: "Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)"— Presentation transcript:

1 Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

2 Project 03/15 (Phase 1): 10% of training data is available for algorithm development 04/05 (Phase 2): full training data and test examples are available 04/18 (submission): submit your prediction before 11:59pm Apr. 18 (Wednesday) 04/24 and 04/26: Project presentation Announce the competition results 04/30: project report is due

3 Logistic Regression Rong Jin

4 Logistic Regression Generative models often lead to linear decision boundary Linear discriminatory model Directly model the linear decision boundary w is the parameter to be decided

5 Logistic Regression

6 Learn parameter w by Maximum Likelihood Estimation (MLE) Given training data

7 Logistic Regression Convex objective function, global optimal Gradient descent Classification error

8 Logistic Regression Convex objective function, global optimal Gradient descent Classification error

9 Illustration of Gradient Descent

10 How to Decide the Step Size ? Back track line search

11 Example: Heart Disease Input feature x: age group id Output y: if having heart disease y=1: having heart disease y=-1: no heart disease 1: 25-29 2: 30-34 3: 35-39 4: 40-44 5: 45-49 6: 50-54 7: 55-59 8: 60-64

12 Example: Heart Disease

13 Example: Text Categorization Learn to classify text into two categories Input d: a document, represented by a word histogram Output y=  1: +1 for political document, -1 for non- political document

14 Example: Text Categorization Training data

15 Example 2: Text Classification Dataset: Reuter-21578 Classification accuracy Naïve Bayes: 77% Logistic regression: 88%

16 Logistic Regression vs. Naïve Bayes Both are linear decision boundaries Naïve Bayes: Logistic regression: learn weights by MLE Both can be viewed as modeling p(d|y) Naïve Bayes: independence assumption Logistic regression: assume an exponential family distribution for p(d|y) (a broad assumption)

17 Logistic Regression vs. Naïve Bayes

18 Discriminative vs. Generative Discriminative Models Model P(y|x) Pros Usually good performance Cons Slow convergence Expensive computation Sensitive to noise data Generative Models Model P(x|y) Pros Usually fast converge Cheap computation Robust to noise data Cons Usually performs worse

19 Overfitting Problem Consider text categorization What is the weight for a word j appears in only one training document d k ?

20 Overfitting Problem

21 Using regularization Without regularization Iteration Overfitting Problem Decrease in the classification accuracy of test data

22 Solution: Regularization Regularized log-likelihood The effects of regularizer Favor small weights Guarantee bounded norm of w Guarantee the unique solution

23 Regularized Logistic Regression Using regularization Without regularization Iteration Classification performance by regularization

24 Regularization as Robust Optimization Assume each data point is unknown but bounded in a sphere of radius r and center x i

25 Sparse Solution by Lasso Regularization RCV1 collection: 800K documents 47K unique words

26 Sparse Solution by Lasso Regularization How to solve the optimization problem? Subgradient descent Minimax

27 Bayesian Treatment Compute the posterior distribution of w Laplacian approximation

28 Bayesian Treatment Laplacian approximation

29 Multi-class Logistic Regression How to extend logistic regression model to multi-class classification ?

30 Conditional Exponential Model Let classes be Need to learn Normalization factor (partition function)

31 Conditional Exponential Model Learn weights ws by maximum likelihood estimation Any problem ?

32 Modified Conditional Exponential Model


Download ppt "Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)"

Similar presentations


Ads by Google