Active Learning based on Bayesian Networks Luis M. de Campos, Silvia Acid and Moisés Fernández
2 Index of Contents 1. Introduction The scenario is pool-based active learning cycle. 2. Data and evaluation We have participated in 5 from the six datasets considered. The evaluation is realized with AUC and ALC. 3. Methods Features, modules implemented, general procedure, how to query labels and a practical example. 4. Results The best result is in sixth position. 5. Conclusions 6. Acknowledgments
4 2. Data and evaluation There are 6 datasets of test-final phase. We have participated in five from the six: A, C, D, E and F. These datasets are from different application domains: Chemoinformatics. Embryology. Marketing Text ranking. Evaluation with: Area under the ROC curve (AUC) Area under the Learning Curve (ALC).
5 3. Methods. Features Hardware used: laptop with platform Ubuntu 8.10, 4GB of memory and Intel core duo to 2.53GHz. We have used three base classifiers from Bayesian Networks: Naive Bayes. It was used in dataset D. TAN (Tree Augmented Network) with score BDeu. It was used in dataset F. CHillClimber. New classifier that moves in a reduced search space centered on the node class. It was used with score BDeu and in dataset A, C and E. Method of discretization for numerical variables: Fayyad & Irani MDL in TAN and CHillClimber. None in Naive Bayes.
6 3. Methods. Features and Modules Active learning method: uncertainty sampling. We didnt use unlabeled data for training. Software implemented (several modules): Matlab: main module. It calls the module C++. C++: intermediate module. It calls the module Weka-Java. Weka-Java: final module. Its implemented with Java in Weka with several modifications. 1 54 3 2
7 3. Methods. Procedure The procedure is as follows: 1.Algorithm trains with all known instances, initially it only has got the seed. 2.It selects new examples to query using a particular method (a,b,c). See the following transparency. 3.It joins all of known instances. 4.Are they all instances known? No: go to 1. Yes: end. Number of instances to query in each iteration is fixed (three different ways): Exponencial. Equal10-All. All-Equal10. n is the total labels of dataset. (n/2)/10 (n/2) … (n/2)/10 … 1248 … 163264 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 … Iteration 2 Iteration 3 … Iteration 1 Iteration 4
8 3. Methods. How to query examples (a, b or c) For each iteration we sort the examples in increasing ordering of the probabilities of the most probable class. Then we choose x examples with the particular method elected: a.We query the x examples having the lowest probabilities. b.We query x1 and x2 examples having the lowest probabilities corresponding to class -1 and to class 1 respectively maintaining the proportion of examples of each class known so far.. x = x1 + x2. c.like method b, but x1 and x2 are calculated using the proportion of examples of each class estimated from both the tags returned by the oracle and values returned by our classifier.
9 3. Methods. An example. Prior knowledge: 6 examples corresponding to class -1 and 4 to class 1. In addition, our classifier shows the next probabilities: Our strategy of type exponencial indicates that we have to choose 4 examples (we are in the iteration three): With method a: we would choose examples 3,5,4,6. With method b: we would choose examples 3,5,2,1. With method c: we would choose examples 3,5,4,2. ExampleClass -1Class 1 10.100.90 20.200.80 30.600.40 40.700.60 50.650.35 60.750.25 ……… ExampleMaxProbClass 10.901 20.801 30.60 40.70 50.65 60.75 ……… ExampleMaxProbClass 30.60 50.65 40.70 60.75 20.801 10.901 ……… Select Max probability Sort
10 4. Results Our results are rather modest, obtaining reasonable performance only in two datasets, C and E. To the left we can see the plot of dataset E and to the right the plot of dataset C. Dataset ACDEF Method CHillClimber, exponencial, a) TAN, equal10- all, c) NaiveBayes, all-equal10, a) CHillClimber, exponencial, b) TAN, exponencial, b) Ranking 20/226/1415/1912/2013/16
11 5. Conclusions We can improve our process if we apply further processing by clustering when we have a few instances. Advantages: Simple. No time consuming. Disadvantages: Static behavior. Lack of knowledge in early stages of the process.
12 Acknowledgments This work has been supported by the Spanish research programme Consolider Ingenio 2010: MIPRCV (CSD2007-00018).