Presentation is loading. Please wait.

Presentation is loading. Please wait.

First Lecture of Machine Learning Hung-yi Lee. Learning to say “yes/no” Binary Classification.

Similar presentations


Presentation on theme: "First Lecture of Machine Learning Hung-yi Lee. Learning to say “yes/no” Binary Classification."— Presentation transcript:

1 First Lecture of Machine Learning Hung-yi Lee

2 Learning to say “yes/no” Binary Classification

3 Learning to say yes/no Spam filtering Is an e-mail spam or not? Recommendation systems recommend the product to the customer or not? Malware detection Is the software malicious or not? Stock prediction Will the future value of a stock increase or not with respect to its current value? Binary Classification

4 Example Application: Spam filtering (http://spam-filter-review.toptenreviews.com/) E-mail Spam Not spam

5 Example Application: Spam filtering How to estimate P(yes|x)?  What does the function f look like?

6 Example Application: Spam filtering To estimate P(yes|x), collect examples first Yes (Spam) No (Not Spam) Yes (Spam) ……. ….. Earn … free …… free Talk … Meeting … Win … free……  Some words frequently appear in the spam e.g., “free”  Use the frequency of “free” to decide if an e-mail is spam x1x1 x2x2 x3x3  Estimate P(yes|x free = k) x free is the number of “free” in e-mail x

7 Regression Frequency of “Free” (x free ) in an e-mail x p(yes|x free ) Problem: What if one day you receive an e-mail with 3 “free” …. In training data, there is no e- mail containing 3 “free”. p(yes| x free = 1 ) = 0.4 p(yes| x free = 0 ) = 0.1

8 Regression f(x free ) = wx free + b (f(x free ) is an estimate of p(yes|x free ) ) Store w and b Frequency of “Free” (x free ) in an e-mail x Regression p(yes|x free )

9 Regression Problem: What if one day you receive an e-mail with 6 “free” …. Frequency of “Free” (x free ) in an e-mail x p(yes|x free ) Regression f(x free ) = wx free + b The output of f is not between 0 and 1

10 Logit vertical line: Probability to be spam p(yes|x free ) (p) p is always between 0 and 1 vertical line: logit(p) x free

11 Logit vertical line: Probability to be spam p(yes|x free ) (p) p is always between 0 and 1 vertical line: logit(p) x free f ’ (x free ) = w ’ x free + b ’ (f ’ (x free ) is an estimate of logit(p) )

12 Logit > 0.5, so “yes” “yes” vertical line: logit(p) x free Store w ’ and b ’ f ’ (x free ) = w ’ x free + b ’ (f ’ (x free ) is an estimate of logit(p) )

13 Multiple Variables x free Consider two words “free” and “hello” compute p(yes|x free,x hello ) (p) x hello

14 Multiple Variables Regression x free x hello Consider two words “free” and “hello” compute p(yes|x free,x hello ) (p)

15 Multiple Variables Of course, we can consider all words {t 1, t 2, … t N } in a dictionary z is to approximate logit(p)

16 approximate Logistic Regression If the probability p = 1 or 0, ln(p/1-p) = +infinity or –infinity Can not do regression t 1 appears 3 times t 2 appears 0 time … t N appears 1 time x  The probability to be spam p is always 1 or 0.

17 Logistic Regression Sigmoid Function

18 Logistic Regression Yes (Spam) x1x1 close to 1 No (not Spam) x2x2 close to 0

19 Logistic Regression … bias feature Yes 1 0 No This is a neuron in neural network. This is a neuron in neural network.

20 More than saying “yes/no” Multiclass Classification

21 More than saying “yes/no” Handwriting digit classification This is Multiclass Classification

22 More than saying “yes/no” Handwriting digit classification Simplify the question: whether an image is “2” or not … feature of an image Each pixel corresponds to one dimension in the feature Describe the characteristics of input object

23 More than saying “yes/no” Handwriting digit classification Simplify the question: whether an image is “2” or not … “2” or not

24 More than saying “yes/no” Handwriting digit classification Binary classification of 1, 2, 3 … … “1” or not “2” or not “3” or not …… If y 2 is the max, then the image is “2”.

25 This is not good enough …

26 Limitation of Logistic Regression Input Output x1x1 x2x2 00No 01Yes 10 11No Yes No

27 So we need neural network …… InputOutputLayer 1 …… Layer 2 …… Layer L …… y1y1 y2y2 yMyM Deep means many layers

28 Thank you for your listening!

29 Appendix

30 More reference http://www.ccs.neu.edu/home/vip/teach/MLcourse/2_ GD_REG_pton_NN/lecture_notes/logistic_regression_l oss_function/logistic_regression_loss.pdf http://mathgotchas.blogspot.tw/2011/10/why-is-error- function-minimized-in.html https://cs.nyu.edu/~yann/talks/lecun-20071207- nonconvex.pdf http://www.cs.columbia.edu/~blei/fogm/lectures/glms.pdf http://grzegorz.chrupala.me/papers/ml4nlp/linear- classifiers.pdf


Download ppt "First Lecture of Machine Learning Hung-yi Lee. Learning to say “yes/no” Binary Classification."

Similar presentations


Ads by Google