Alan P. Reynolds*, David W. Corne and Michael J. Chantler

Slides:



Advertisements
Similar presentations
MOEAs University of Missouri - Rolla Dr. T’s Course in Evolutionary Computation Matt D. Johnson November 6, 2006.
Advertisements

Evaluating Classifiers
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Curva ROC figuras esquemáticas Curva ROC figuras esquemáticas Prof. Ivan Balducci FOSJC / Unesp.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Classification and risk prediction
Model Evaluation Metrics for Performance Evaluation
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
A New Algorithm for Solving Many-objective Optimization Problem Md. Shihabul Islam ( ) and Bashiul Alam Sabab ( ) Department of Computer Science.
Evaluating Classifiers
On comparison of different approaches to the stability radius calculation Olga Karelkina Department of Mathematics University of Turku MCDM 2011.
A two-stage approach for multi- objective decision making with applications to system reliability optimization Zhaojun Li, Haitao Liao, David W. Coit Reliability.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
Evolutionary multi-objective algorithm design issues Karthik Sindhya, PhD Postdoctoral Researcher Industrial Optimization Group Department of Mathematical.
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
7. Performance Measurement
Who am I? Work in Probabilistic Machine Learning Like to teach 
Root Finding Methods Fish 559; Lecture 15 a.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Alan P. Reynolds1. , Asaad Abdollahzadeh2, David W
Introduction to Machine Learning
Machine Learning: Methodology Chapter
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Performance Evaluation 02/15/17
Rule Induction for Classification Using
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Session 7: Face Detection (cont.)
Presented by: Dr Beatriz de la Iglesia
Evaluating Results of Learning
Performance Measures II
CH. 2: Supervised Learning
Evaluating Classifiers
CS 4/527: Artificial Intelligence
Data Mining Classification: Alternative Techniques
Features & Decision regions
Students: Meiling He Advisor: Prof. Brain Armstrong
Classification Techniques: Bayesian Classification
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Association Rule Mining
Objective of This Course
Heuristic Optimization Methods Pareto Multiobjective Optimization
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Pattern Recognition and Image Analysis
Chen-Yu Lee, Jia-Fong Yeh, and Tsung-Che Chiang
Model Evaluation and Selection
Xin Qi, Matthew Keally, Gang Zhou, Yantao Li, Zhen Ren
CSC 380: Design and Analysis of Algorithms
Learning Chapter 18 and Parts of Chapter 20
RM-MEDA: A Regularity Model-Based Multiobjective Estimation of Distribution Algorithm BISCuit EDA Seminar
Information Retrieval
Data Mining (and machine learning)
Basics of ML Rohan Suri.
Roc curves By Vittoria Cozza, matr
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Machine Learning: Methodology Chapter
NAÏVE BAYES CLASSIFICATION
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
COSC 4368 Intro Supervised Learning Organization
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Information Organization: Evaluation of Classification Performance
Presentation transcript:

Alan P. Reynolds*, David W. Corne and Michael J. Chantler Feature Selection for Multi-Purpose Predictive Models: a Many-Objective Task Alan P. Reynolds*, David W. Corne and Michael J. Chantler School of Mathematical and Computer Sciences (MACS), Heriot-Watt University, Edinburgh, Scotland *A.Reynolds@hw.ac.uk Introduction Modified dominance Feature subset selection – the elimination of features from a data set to reduce costs and improve the performance of machine learning algorithms – has been treated previously as a multiobjective optimization problem, minimizing complexity while maximizing accuracy (e.g. [1]) or maximizing sensitivity and specificity [2]. We show how attempting to satisfy each potential user of the resulting data or application leads us to consider the problem as having infinitely many objectives. Using standard dominance, if feature set A is to dominate set B, the sensitivity- specificity graph for A must be at least as high as that for B at all points, and higher at at least one point. Our modified dominance relation considers the areas between the two graphs, as shown in Fig. 2. Two objectives for binary classification Binary classification is the art of creating a model, or classifier, that predicts, based on an item’s features, whether the item belongs to a class of interest (positive) or not (negative). A classifier (e.g. a spam filter) is trained on items with predetermined class labels and then used to predict for unlabelled items. Counts of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) are used to create the following two objectives: Sensitivity: the proportion of those items in the ‘positive’ class that are correctly identified, i.e. TP / (TP + FN). Specificity: the proportion of those items in the ‘negative’ class that are correctly identified, i.e. TN / (TN + FP). Fig. 2: Feature subset A dominates feature subset B if there is an orange area but no green area (standard dominance) or if the orange area divided by the green area exceeds a given dominance threshold. A dominance threshold of 1 results in maximization of the dominated area. An infinite threshold produces the standard dominance relation. Varying the threshold between these values allows us to control the strength of the dominance relation. Classification algorithms Results and conclusions Some classification algorithms simply generate a class prediction. Others generate a probability that the item belongs to the class of interest. This is converted into a class prediction by setting a probability threshold. By allowing the threshold to vary between 0 and 1, we obtain a range of classifiers with different trade-offs between sensitivity and specificity. In a sense the algorithm is multiobjective. A variant of NSGA II was applied to feature selection from three data sets. Here we show results from the Ionosphere data set, using Naive Bayes as the core classification algorithm and a dominance factor of 5. Recall that the quality of each solution is represented by a graph, so the non- dominated set gives a set of graphs. For clarity, we present the envelope of these graphs, i.e. only points non-dominated with respect to sensitivity and specificity. Feature subset selection Why reduce the number of features in a data set? Improve algorithm efficiency and speed up the learning process. Produce simpler classifiers that can be more easily comprehended. Reduce the cost of obtaining or generating the data. Prevent over-fitting. In the ‘wrapper’ approach to feature selection, feature subset quality is estimated by applying a simple classification algorithm. Here we wrap a ‘multiobjective’ classification algorithm, e.g. Naïve Bayes. The performance of a single feature set is given by a graph of specificity against sensitivity (see Fig. 1). Fig. 3: Performance on the Ionosphere data set: training data. Infinitely many objectives for feature selection! We wish to perform feature subset selection to produce a reduced data set to be used in multiple applications or in a single application used by multiple users, such as a texture search engine. How do we optimize feature subset quality when different users may require different trade-offs between sensitivity and specificity? Fig. 4: Performance on the Ionosphere data set: test data. In conclusion, we have demonstrated that feature selection may be successfully approached as a problem with an infinite set of objectives. This approach is most useful when the resulting feature set or application is to be used by many users with different sensitivity-specificity trade-off preferences. Fig. 1: The performance characteristics of a single feature subset, evaluated using a ‘multiobjective’ classifier. The objective trade-off preferences of three users are shown. References Answer: we maximize performance for each potential user, or equivalently, we maximize specificity at every value of sensitivity – an uncountable set of objectives! Oliveira, L.S., Sabourin, R., Bortolozzi, F. and Suen, C.Y.: A methodology for feature selection using multi-objective genetic algorithms for handwritten digit string recognition. International Journal of Pattern Recognition and Artificial Intelligence 17(6), 903-929 (2003) Emmanouilidis, C.,: Evolutionary multi-objective feature selection and ROC analysis with application to industrial machinery fault diagnosis. In: Evolutionary Methods for Design, Optimisation and Control (2002) Deb, K., Agrawal, S., Pratab, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: PPSN 2000, LNCS 1917, 849-858 (2000) Hughes, E.J.: Evolutionary many-objective optimisation: Many once or one many? In: Proc. 2005 IEEE Congress on Evolutionary Computation (CEC 2005) 1, 222-227 (2005) A bad idea? It has been suggested that dominance based algorithms such as NSGA II [3] perform poorly with more than 4 objectives [4]. Can NSGA II be successfully applied to a problem with an infinite set of objectives? There are reasons for hope: In practice, the graph is piecewise horizontal – the number of unique objectives is bounded by the number of items in the training set that are in the class of interest. A feature set with a good value of specificity at a sensitivity of 0.5 is likely to have a good value of specificity at a sensitivity of 0.48 – objectives are highly correlated. Modifications to the dominance relation can be made to further improve algorithm convergence (see below).