Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology.

Similar presentations


Presentation on theme: "1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology."— Presentation transcript:

1 1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology C. Ling Dept. of Computer Science The University of Western Ontario

2 2 Example – Medical Diagnosis temperature pressure blood test cardiogram essay 39 o c ? ? ? ? Is the patient healthy? Which test should be taken first? Which test to perform next? Concern: cost the patient as little as possible while maintaining low mis-diagnosis risk

3 3 Test-Cost Sensitive Learning Great success of traditional inductive learning techniques. (decision trees, NB) – do not handle different types of costs during classification Misclassification costs (C mc ): the costs incurred by classification errors – distinguish different types of classification errors – neglect the possibility of obtaining missing values in a test case through performing attribute tests Test costs (C test ): the costs incurred by obtaining missing values of attributes. Minimize the total costs C total = C mc + C test

4 4 Some Related Work MDP-based cost-sensitive learning (Zubek and Dietterich 2002) Cast as a Markov decision process Solutions are given in terms of optimal policies  Very high computational cost to conduct the search Decision trees with minimal cost (Ling et al 2004) Consider both misclassification and test costs in tree building Splitting criterion: minimal total cost instead of InfoGain  Attributes not appearing on the testing branch are ignored, although they are still informative for classification  Not suitable for batch tests due to its sequential nature

5 5 Decision trees with minimal cost (Ling et al 2004) Attribute selection criterion: minimal total cost (C total = C mc + C test ) instead of minimal entropy in C4.5 If growing a tree has a smaller total cost, then choose an attribute with minimal total cost. Otherwise, stop and form a leaf. Label leaf also according to minimal total cost: Suppose the leaf have P positive examples and N negative examples FP denotes the cost of a false positive example and FN false negative If (P×FN  N×FP) THEN label = positive ELSE label = negative

6 6 A Tree Building Example P:NP:N P1:N1P1:N1 P2:N2P2:N2 Attribute A with a test cost C C mc = min(P×FN, N×FP) C test = 0 C total = C mc + C test A = v 1 A = v 2 Consider attribute A for a potential splitting attribute C’ mc = min(P 1 ×FN, N 1 ×FP) + min(P 2 ×FN, N 2 ×FP) C’ test = (P 1 + N 1 + P 2 + N 2 ) × C C’ total = C’ mc + C’ test If C’ total < C total, splitting on A would reduce the total cost  Choose an attribute with the minimal total cost for splitting If C’ total  C total for all remaining attributes, no further sub-tree will be built, and the set will become a leaf.

7 7 Sequential Test Strategy Optimal Sequential Test (OST): each test example goes down the tree until an attribute whose value is unknown is met in the test example. Then the test is done and the missing value is revealed. The process continues until it falls into a leaf node. The leaf node label is used as prediction. The total cost is the sum of misclassification cost and test cost. Problems with the OST strategy: The algorithm chooses a locally optimal attribute without backtracking. Thus the OST strategy is not globally optimal. Attributes not appearing on the testing branch are ignored, although they are still informative for classification Not suitable for batch tests due to its sequential nature

8 8 Problem Formulation Given: D – a training dataset of N samples {x 1,…,x N } from P classes {c 1,…,c P }, where each sample x i is described by M attributes (A 1,…,A M ) among whom there can be missing values. C – a misclassification cost matrix. C ij = C(i,j) specifies the cost of classifying a sample from c i as belong to class c j T – a test-cost vector. T k = T(k) specifies the cost of taking a test on attribute A k (1  k  M) Build: csNB – a cost sensitive naïve Bayes classifier S – a test strategy for every new case with the aim to minimize the sum of the misclassification cost C mc and test cost C test

9 9 csNB classification Two procedures: Learning and prediction Learning a csNB classifier Same as learning a traditional NB classifier Estimate prior probabilities P(c j ) and P(A m =v m,k |c j ) from the training dataset D. Missing values are simply ignored in likelihood computation. Prediction Sequential test strategy Batch test strategy

10 10 Sequential Test Strategy v.s. Batch Test Strategy What is a sequential test strategy? – decisions are made sequentially on whether a further test on an unknown attribute should be performed, and if so, which attribute to select based on the values of the attributes initially known or previously tested. – a test strategy that is designed on the fly during classification. What is a batch test strategy? – selection of tests on unknown attributes must be determined in advance before any test is carried out. – a test strategy that is designed beforehand. Both are aimed to minimize the sum of misclassification and test costs.

11 11 Suppose a patient comes with all attribute values unknown: (?,?,?,?) Sequential test: Batch test: Example: Diagnosis of Hepatitis Assume: – 21% patients are positive (c 1 ) (have hepatitis) P(c 1 )=21% – 79% patients are negative(c 2 ) (healthy) P(c 2 )=79% – Classification costs: C 12 =450, C 21 =150, C 11 =C 12 =0 – Four attributes to describe a patient Test costs and likelihoods of each attribute: (?,?,?,?) test ascites (?,?,?,pos) (?,?,?,neg) test spiders… test spleen… (?,?,?,?) Test {spleen, spiders, ascites} (?,neg,neg,pos) classify

12 12 Prediction with Sequential Test Strategy Suppose x is a test example. Let denote the set of known attributes and the unknown attributes. We define the utility of testing unknown attribute is defined as: is the test cost attribute given by T i is the reduction in the expected misclassification cost if we know ’s true value Where:

13 13 Prediction with Sequential Test Strategy is the expected C mc based on takes expectation over all possible values of Gain(, ) is defined as: Where:

14 14 Prediction with Sequential Test Strategy Overall, an attribute is worth testing on if testing it offers more gain than the cost it brings. By calculating all the utilities of testing unknown attributes in, we can decide: Whether a further test is needed? Which attribute to test? After attribute is tested, its true value is revealed and it is removed from to. The same procedure continues until: no unknown attribute is left ( ) or the utility of testing any unknown attribute is non-positive Finally, the example is predicted as class and C test is the total costs of the tests performed.

15 15 csNB-sequential-predict Algorithm further test? Compute the utility of testing every unknown attribute … classify No Select the unknown attribute with the highest utility to test Yes

16 16 Prediction with Batch Test Strategy A natural extension from the sequential test algorithm of csNB All the attributes with non-negative utility are selected. The batch of attributes selected are, and the test cost After is selected, the values of these attributes are revealed and the class label is then predicted.

17 17 Experiments Experiments were carried out on eight datasets from UCI ML repository (Ecoli, heart, Australia, Voting, Breast, … ). Four algorithms were implemented for comparison: csNB – the test-cost sensitive naïve Bayes csDT – the cost-sensitive decision trees proposed in Ling et al 2004. LNB – lazy naïve Bayes, which predicts based only on the known attributes and requires no tests to be done on any unknown attribute ENB – Exacting naïve Bayes, which requires all the missing values to be made up before prediction. The performance of the algorithms is measured in terms of the total cost C total = C mc + C test, where C mc can be obtained by comparing the predicted and true labels of the test examples.

18 18 Experimental Results – Sequential Test Average total costs comparisons on datasets: Ecoli, Breast, Heart, Thyroid LNB ENB csNB csDT

19 19 Experimental Results – Sequential Test Average total costs comparisons on datasets: Australia, Cars, Voting, Mushroom

20 20 Experimental Results – Sequential Test Comparison of LNB, csNB and csDT with increasing percentage of unknown attributes Mushroom dataset

21 21 Experimental Results – Sequential Test Compared with csDT, csNB is more effective at balancing the misclassification and test costs. Comparison of csNB and csDT with varying test costs (missing rates are set to 20% and 60%) on the Mushroom dataset

22 22 Experimental Results – Batch Test Overall, csNB incurs 29.6% less total cost than csDT. csDT is inflexible to derive batch test strategies due to its sequential nature in tree building. csNB has no such constraints and all the attributes can be evaluated at the same level.

23 23 Conclusion and future work We proposed a test-cost sensitive naïve Bayes algorithm for designing classifiers that minimize the sum of the misclassification cost and test costs In the framework of csNB, attributes can be intelligently selected to design both sequential and batch test strategies. In the future, we plan to develop more effective algorithms and consider more complicated situations where the test cost of an attribute may be conditional on other attributes. It is also interesting to consider the cost of finding the missing values for training data

24 24 THANK YOU! Q & A


Download ppt "1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology."

Similar presentations


Ads by Google