Presentation on theme: "Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)"— Presentation transcript:
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Project 03/15 (Phase 1): 10% of training data is available for algorithm development 04/05 (Phase 2): full training data and test examples are available 04/18 (submission): submit your prediction before 11:59pm Apr. 18 (Wednesday) 04/24 and 04/26: Project presentation Announce the competition results 04/30: project report is due
Example 2: Text Classification Dataset: Reuter-21578 Classification accuracy Naïve Bayes: 77% Logistic regression: 88%
Logistic Regression vs. Naïve Bayes Both are linear decision boundaries Naïve Bayes: Logistic regression: learn weights by MLE Both can be viewed as modeling p(d|y) Naïve Bayes: independence assumption Logistic regression: assume an exponential family distribution for p(d|y) (a broad assumption)
Discriminative vs. Generative Discriminative Models Model P(y|x) Pros Usually good performance Cons Slow convergence Expensive computation Sensitive to noise data Generative Models Model P(x|y) Pros Usually fast converge Cheap computation Robust to noise data Cons Usually performs worse
Overfitting Problem Consider text categorization What is the weight for a word j appears in only one training document d k ?