Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you.

Similar presentations


Presentation on theme: "Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you."— Presentation transcript:

1 Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you in absolute confidence primarily to seek your assistance to transfer our cash of twenty one Million Dollars ($21,000.000.00) now in the custody of a private Security trust firm in Europe the money is in trunk boxes deposited and declared as family valuables by my late father as a matter of fact the company does not know the content as money, although my father made them to under stand that the boxes belongs to his foreign partner. …

2 This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See http://spamassassin.org/tag/ for more details. Content analysis details: (12.20 points, 5 required) NIGERIAN_SUBJECT2 (1.4 points) Subject is indicative of a Nigerian spam FROM_ENDS_IN_NUMS (0.7 points) From: ends in numbers MIME_BOUND_MANY_HEX (2.9 points) Spam tool pattern in MIME boundary URGENT_BIZ (2.7 points) BODY: Contains urgent matter US_DOLLARS_3 (1.5 points) BODY: Nigerian scam key phrase ($NN,NNN,NNN.NN) DEAR_SOMETHING (1.8 points) BODY: Contains 'Dear (something)' BAYES_30 (1.6 points) BODY: Bayesian classifier says spam probability is 30 to 40% [score: 0.3728]

3 Bayes Classifiers Bayesian classifiers use Bayes theorem, which says p(c j | d) = p(d | c j ) p(c j ) p(d) where p(c j | d) = probability of instance d being in class c j, p(d | c j ) = probability of generating instance d given class c j, p(c j ) = probability of occurrence of class c j, and p(d) = probability of instance d occurring

4 Bayesian classifiers use Bayes theorem, which says p(c j | d) = p(d | c j ) p(c j ) p(d) where p(c j | d) = probability of instance d being in class c j, p(d | c j ) = probability of generating instance d given class c j, p(c j ) = probability of occurrence of class c j, and p(d) = probability of instance d occurring Assume that we have two classes c 1 = male, and c 2 = female. We have a person whose sex we do no know, say “drew” or d. Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male | drew) or p(female | drew) p(male | drew) = p(drew | male ) p(male) p(drew)

5 p(c j | d) = probability of instance d being in class c j, p(d | c j ) = probability of generating instance d given class c j, p(c j ) = probability of occurrence of class c j, and p(d) = probability of instance d occurring p(male | drew) = p(drew | male ) p(male) p(drew) p(c j | d) = p(d | c j ) p(c j ) p(d) Officer Drew NameSex DrewMale SusanFemale DrewFemale DrewFemale MickMale PritaFemale LiFemale JoeMale p(male | drew) = 1/3 * 3/8 = 0.125 3/8 3/8 p(female | drew) = 2/5 * 5/8 = 0.250 3/8 3/8

6 Officer Drew p(male | drew) = 1/3 * 3/8 = 0.125 3/8 p(female | drew) = 2/5 * 5/8 = 0.250 3/8 Officer Drew IS a female!

7 Naïve Bayesian Classifiers Bayesian classifiers require computation of p(d | c j ) computation of p(c j ) p(d) can be ignored since it is the same for all classes To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d|c j ) = p(d 1 |c j ) * p(d 2 |c j ) * ….* (p(d n |c j ) Each of the p(d i |c j ) can be estimated from the training data p(d|c j ) = p(d 1 |c j ) * p(d 2 |c j ) * ….* p(d n |c j ) Height Eye-color … Long-hair

8 Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) p(d|c j ) = p(d 1 |c j ) * p(d 2 |c j ) * ….* p(d n |c j )

9 Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) p(d|c j ) = p(d 1 |c j ) * p(d 2 |c j ) * ….* p(d n |c j ) p(drew|male) = p(d 1 |c j ) * ….* p(blue_eyes| male) p(drew|female) = p(d 1 |c j ) *….* p(blue_eyes |female) Naïve Bayes is NOT sensitive to irrelevant features. Suppose we are trying to classify sex based on eye color…

10 Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) Naïve Bayes is fast and does not need much space We can look up all the probabilities once and store them in a table.. SexOver 6 foot MaleYes0.15 No0.85 FemaleYes0.01 No0.99

11 Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) Problem! Naïve Bayes assumes independence of features… SexOver 6 foot MaleYes0.15 No0.85 FemaleYes0.01 No0.99 SexOver 200 pounds MaleYes0.11 No0.80 FemaleYes0.05 No0.95

12 Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) Solution Consider the relationships between attributes… SexOver 6 foot MaleYes0.15 No0.85 FemaleYes0.01 No0.99 SexOver 200 pounds MaleYes and Over 6 foot0.11 No and Over 6 foot0.59 Yes and NOT Over 6 foot0.05 No and NOT Over 6 foot0.35 FemaleYes and Over 6 foot0.01

13 Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) Solution Consider the relationships between attributes… But how do we find the set of connecting arcs?? Read Keogh, E. & Pazzani, M. (1999). Learning augmented Bayesian classifiers: A comparison of distribution- based and classification-based approaches. In Uncertainty 99, 7th. Int'l Workshop on AI and Statistics, Ft. Lauderdale, FL, pp. 225--230. Don’t bother writing a reaction paper, but if we had a pop quiz…

14 Naïve Bayesian Classifiers Visual Intuition I 5 foot 8 6 foot 64 foot 8

15 Naïve Bayesian Classifiers Visual Intuition II 10 2 P(male | 5 foot 8 ) = 10 / (10 + 2)= 0.833 P(female | 5 foot 8 ) = 2 / (10 + 2)= 0.166 p(c j | d) = probability of instance d being in class c j, 5 foot 8


Download ppt "Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you."

Similar presentations


Ads by Google