text processing And naïve bayes

text processing And naïve bayes
ICCM

Using naïve bayes A classification algorithm
Naïve Bayes is popular due to its simplicity of implementation and overall effectiveness Based on (of course) Bayes theorem “Naïve” because of no dependency between words Well suited for: Sifting out spam from Predictive analysis

Naïve Bayes Example

Naïve Bayes Learning Phase
Outlook Play=Yes Play=No Sunny 2/9 3/5 Overcast 4/9 0/5 Rain 3/9 2/5 Temperature Play=Yes Play=No Hot 2/9 2/5 Mild 4/9 Cool 3/9 1/5 Humidity Play=Yes Play=No High 3/9 4/5 Normal 6/9 1/5 Wind Play=Yes Play=No Strong 3/9 3/5 Weak 6/9 2/5 P(Play=Yes) = 9/14 P(Play=No) = 5/14

Our working example Comparison of s from colleagues vs commercial/sales s.

General Steps Have samples of s from colleagues and another set of s that are known to be about other things (SPAM) Data cleansing: Clean out small words and articles. Consistently use either upper or lower case. Breakout extraneous punctuation (judgment call) Within each category, count how many times each word is used among all the s.

Testing the model Reuse known emails.
For each , parse out each token just as before. Within each individual , do a count of the tokens. Knime’s NB Module will: For each token found, record the probability value of that token based on the probability values earlier. Do this separately for each category. Processes the values for each category for the . Compare the two values. The one with the higher value is most likely from the associated category.

Data issues Additive smoothing: Eventually, when you evaluate a message, that message may have a token not in the training set. So, consistently increment each count. Dealing with Floating-Point Underflow: Due to the very small decimal value that could be produced, an option is to use the natural logarithm of the number.

reference classifier/

text processing And naïve bayes

Similar presentations

Presentation on theme: "text processing And naïve bayes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

text processing And naïve bayes

Similar presentations

Presentation on theme: "text processing And naïve bayes"— Presentation transcript:

Similar presentations

About project

Feedback