Presentation on theme: "1 Classification Naïve Bayes Business Intelligence."— Presentation transcript:
1 Classification Naïve Bayes Business Intelligence
2 Naïve Bayes: The concept Bayes Theorem is used for conditional probability calculation in presence of some information. The conditional probability is typically of the following form: Pr(C|X1, X2, X3, etc.) = [Pr(X1, X2, X3, etc.|C)*Pr(X)]/[Pr(X1, X2, X3, etc.|C)*Pr(C)+ Pr(X1, X2, X3, etc.|C bar)*Pr(C bar)] Where, Pr(C|X1, X2, X3, etc.) means probability of event C in the presence of condition/information X1, X2, X3, etc.) and C bar is the complement event of C. Example:Let C denotes the event that 405 will be moving really slow without any prior information. We can estimate that from our prior experience and put a value of 30%. Let now X1, X2, and X3 denote the given information that it is raining, there is an accident, and one of the left lane is closed, then clearly, the probability of C given this new information will change dramatically. Bayes theorem provides a precise way to calculate this.
3 Naïve Bayes contd.. However, with many conditional information present, calculation of posterior probability with Bayes can be very involved. We then use a simplified version of Bayes Theorem based on the independence of the conditional probabilities. In this case we use the formula P(C|X1, X2, X3, etc.) = P(X1|C)*P(X2|C)*P(X3|C)etc.*P(C)/ P(X1|C)*P(X2|C)*P(X3|C)etc.*P(C)+ P(X1|C bar)*P(X2|C bar)*P(X3|C bar), etc. * P(C bar)]
4 Naïve Bayes: Example We have a list with the following information about the size of a company, their audit status, and if there were filed charged against them.
5 Naïve Bayes If we want to know the probability that a company will be fraudulent given it is small in size and there is a charge filed against it or, P(fraudulent|size = small, charges=y) From the crosstab/pivot tables we can see that the above probability = ½ (there are 2 companies that are small and have charges filed against them and 1 of them is fraudulent). Similarly, P(fraudulent|small, no) = 0/3 = 0 P(fraudulent|large, y) = 2/2 = 1 P(fraudulent|large. N) = 1/3 = 0.33 Using Naïve Bayes we can get the following: P(fraudulent|small, y) = P(small|fraudulent)*P(y|fraudulent)*P(fraudulent)/[P(small|fraud ulent)*P(y|fraudulent)*P(fraudulent)+ P(small|truthful)*P(y|truthful)*P(truthfult)] = (1/4)*(3/4)*(4/10)/[(1/4)*(3/4)*(4/10)+(4/6)*(1/6)*(6/10)] = 0.53 and is very close to the 0.5 value that we had from exact calculation!
6 Naïve Bayes: Flight Delay Example Let us use XLMiner to create the conditional probabilities (each of the individual probability items) for flight delay. We will use only –Carrier, Day of the week, Dep Time in one hour block, Destination, Origin, and Weather Run XLMiner and look at the conditional probability of the training set For any record in the validation set, the probability for classification is computed by multiplying the corresponding conditional probabilities and the prior probability of that particular class Let us do two examples
7 Examples Example 1: Record Details (row 633 in …NNBforlecture.xlsx) Multiply all the relevant conditional probabilities for ontime and get p1 Multiply all the relevant conditional probabilities for delayed and get p2 Weigh each one of them with the corresponding prior class probabilities and add the two numbers (w1*p1 + w2*p2) Probability for class i = wipi/(w1*p1 + w2*p2) Classify based on if the above prob > cut-off Example 2: Record Details (row 610 in …NNBforlecture.xlsx)
8 Details Record 1. Let us list the conditions. Corresponding conditional probabilities for ontime are (I used a vlookup from the conditional probability tables given by XLMiner) extracted from the ontime side. Calculate p1 or by multiplying the numbers above. For p1*w1, mutiply the number below with p1 = CARRIERDH DEP_TIME1640 DESTJFK DISTANCE213 ORIGINDCA Weather0 DAY_WEEK4 Ontime TitleConditionConditional Prob CARRIERDH DEP_TIME DESTJFK DISTANCE ORIGINDCA Weather DAY_WEEK E-06 Day of the week is not listed to save space
9 Details By following the exact same methods and subsequent calculations for the p1*w1 and p2*w2 we can easily get the following results. Verify the results for record 2. conditional probability sum pi*wi E E E-06 pi E E-06 OntimeDelayed Day of the week is not listed to save space
10 Notes Quite simple and useful Better than exact Bayes approach because all combinations may not be present in the data (Exact Bayes will fail as there will be no conditional probability for that particular combination) However, dependent on data and thus can give erroneous results for small data set If an association makes sense, but is not present then the classification scheme will not work –Example: Yatch owners may be target for high value life insurance. However, collected data has no incidence of high value life insurance! Next: Other classification schemes!