Presentation is loading. Please wait.

Presentation is loading. Please wait.

Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,

Similar presentations


Presentation on theme: "Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,"— Presentation transcript:

1 Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica CHALEARN’s reading group on model selection, June 14, 2012

2 Design cycle of a classifier Data Preprocessing Feature selection Classification method Training model Model evaluation ERROR [ ] ? … Instances Features Data

3 Full model selection Data Full model selection

4 Wrappers for model selection In most of the related works a single model is considered and their hyperparameters are optimized via heuristic optimization: – Swarm optimization, – Genetic algorithms, – Pattern search, – Genetic programming – … M. Momma, K. Bennett. A Pattern Search Method for Model Selection of Support Vector Regression. Proceedings of the SIAM conference on data mining, pp. 261—274, 2002. T. Howley, M. Madden. The Genetic Kernel Support Vector Machine: Description and Evaluation. Artificial Intelligence Review, Vol 24(3-4): 379—395, 2005. B. Zhang and H. Muhlenbein. Evolving optimal neural networks using genetic algorithms with Occam's razor. Complex Systems, Vol. 7 (1993), pp. 199-220

5 Model type selection for regression Genetic algorithms are used for the selection of model type (learning method, feature selection, preprocessing) and parameter optimization for regression problems D. Gorissen, T. Dhaene, F. de Turck. Evolutionary Model Type Selection for Global Surrogate Modeling. In Journal of Machine Learning Research, 10(Jul):2039-2078, 2009 http://www.sumo.intec.ugent.be/

6 PSMS: PSO for full model selection Particle swarm model selection: Use particle swarm optimization for exploring the search space of full models in a particular ML-toolbox Normalize + RBF-SVM (γ = 0.01) PCA + Neural-Net (10 h. units) Relief (feat. Sel.) +Naïve Bayes Normalize + PolySVM (d= 2) Neural Net ( 3 units) s2n (feat. Sel.) +K- ridge

7 Particle swarm optimization 1.Randomly initialize a population of particles (i.e., the swarm) 2.Repeat the following iterative process until stop criterion is meet: a)Evaluate the fitness of each particle b)Find personal best (p i ) and global best (p g ) c)Update particles d)Update best solution found (if needed) 3.Return the best particle (solution) Fitness value... Initt=1t=2t=max_it...

8 PSMS : PSO for full model selection Codification of solutions as real valued vectors Choice of methods for preprocessing, feature selection and classification Hyperparameters for the selected methods Preprocessing before feature selection?

9 Particle swarm optimization Each individual (particle) i has: – A position in the search space (X i t ), which represents a solution to the problem at hand, – A velocity vector (V i t ), which determines how a particle explores the search space After random initialization, particles update their positions according to: A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,2006 Local information Global information Inertia weight

10 Some results in benchmark data Comparison of PSMS and pattern search H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb):405-- 440, 2009.

11 Some results in benchmark data Comparison of PSMS and pattern search H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb):405-- 440, 2009.

12 Some results in benchmark data The inertia weight may help to avoid overfitting and to direct the search

13 Some results in benchmark data Global and local information also helps! No local information Is considered

14 Some results in benchmark data Global and local information also helps! No global information Is considered

15 Some results in benchmark data Global and local information also helps! Global and local information is considered

16 Some results in benchmark data Mechanisms of PSO that may help to avoid overfitting: – Each solution is updated using local and global information – Inertia weight – Early stopping – Randomness – Heterogeneous representation

17 PSMS in the ALvsPK challenge Five data sets for binary classification Goal: to obtain the best classification model for each data set Two tracks: – Prior knowledge – Agnostic learning http://www.agnostic.inf.ethz.ch/

18 PSMS in the ALvsPK challenge Best configuration of PSMS: http://www.agnostic.inf.ethz.ch/results.php EntryDescriptionAdaGinaHivaNovaSylvaOverallRank Interim-all-priorBest PK17.02.3327.14.710.5910.351 st psmsx_jmlr_run_IPSMS16.862.4128.015.270.6210.632 nd Logitboost-treesBest AL16.63.5330.14.690.7811.158 th Comparison of the performance of models selected with PSMS with that obtained by other techniques in the ALvsPK challenge Models selected with PSMS for the different data sets

19 Ensemble PSMS Idea: taking advantage of the large number of models that are evaluated during the search for building ensemble classifiers Problem: How to select the partial solutions from PSMS so that they are accurate and diverse to each other Motivation: The success of ensemble classifiers depends mainly in two key aspects of individual models: Accuracy and diversity

20 Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=2I=max_it... I=2I=3

21 Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=2I=max_it... I=2I=3

22 Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=2I=max_it... I=2I=3

23 Experimental results Data: – 9 Benchmark machine learning data sets (binary classification) – 1 Object recognition data set (multiclass, 10 classes) IDData setTrainingTestingFeatures 1Breast-cancer200779 2Diabetes4683008 3Flare solar6664009 4German70030020 5Heart17010013 6Image1300101020 7Splice1000217560 8Thyroid140755 9Titanic15020513 ORSCEF2378330050 Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp. 1814--1821, IEEE,, 2010 [Best Student Paper Award].

24 Experimental results: performance Benchmark data sets: better performance is obtained by ensemble methods IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI 172.03±2.2473.40±0.7874.05±0.9174.35±0.49 282.11±1.2982.60±1.5274.07±13.783.42±0.46 368.81±4.3169.38±4.5370.13±7.4872.16±1.42 473.92±1.2373.84±1.5374.70±0.7274.77±0.69 585.55±5.4887.40±2.0187.07±0.7588.36±0.88 697.21±3.1598.85±1.4595.27±3.0499.58±0.33 797.26±0.5598.02±0.6496.99±1.2198.84±0.26 896.00±4.7598.18±0.9497.29±1.5499.22±0.45 973.24±1.1673.50±0.9575.37±1.0574.40±0.91 Avg.82.90±2.6883.91±1.5982.77±3.3885.01±0.65 Average accuracy over 10-trials of PSMS and EPSMS in benchmark data Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp. 1814--1821, IEEE,, 2010 [Best Student Paper Award].

25 Experimental results: Diversity of ensemble Diversity results IDEPSMS-BSEPSMS-SEEPSMS-BI 10.2055±0.14980.5422±0.05500.5017±0.1149 20.3547±0.17110.6241±0.06190.5081±0.0728 30.1295±0.17040.4208±0.13570.4012±0.1071 40.3019±0.17320.5159±0.05960.4296±0.0490 50.2733±0.17140.5993±0.09250.5647±0.0655 60.7801±0.08180.7555±0.05240.8427±0.0408 70.5427±0.32300.7807±0.05850.8050±0.0294 80.6933±0.15580.8173±0.06260.8514±0.0403 90.7473±0.0089 Avg.0.4476±0.15620.6448±0.06030.6280±0.0588 EPSMS-SE models are more diverse Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp. 1814--1821, IEEE,, 2010 [Best Student Paper Award].

26 Experimental results: region labeling Per-concept improvement of EPSMS variants over straight PSMS IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI AUC91.53±6.893.27±5.692.79±7.494.05±5.3 MCC69.58%76.59%79.13%81.49%

27 Lessons learned Ensembles generated with EPSMS outperformed individual classifiers; including those selected with PSMS Models evaluated by PSMS are diverse to each other and accurate More stable predictions are obtained with the ensemble version of PSMS

28 Other applications of PSMS/EPSMS Successful: – Acute leukemia classification – Authorship verification (Spanish/English) – Authorship attribution – Region labeling – ML Challenges Not successful: – Review recommendation (14 features) – Region labeling (~90 classes) – Sentiment analysis on speech signals (high p – small n) – Plagiarism detection (a few samples) – ML Challenges

29 Thank you!

30


Download ppt "Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,"

Similar presentations


Ads by Google