RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Slides:

Advertisements

Similar presentations

Lazy Paired Hyper-Parameter Tuning

Advertisements

Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )

Pattern Recognition and Machine Learning

SVM—Support Vector Machines

Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.

Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.

Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Reduced Support Vector Machine

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.

Machine Learning CMPT 726 Simon Fraser University

K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.

Data Mining – Intro.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror.

CLOP A MATLAB® learning object package

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Baseline Methods for the Feature Extraction Class Isabelle Guyon Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123

Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,

Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, Five Datasets were provided for experiments: ARCENE: cancer diagnosis.

Lab 1 Getting started with CLOP and the Spider package.

1 Lab 1 Getting started with Basic Learning Machines and the Overfitting Problem.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.

1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.

Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li University of Zurich Department of Informatics The NIPS 2003 challenge was organized to.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Ensemble Methods: Bagging and Boosting

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.

PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge Hugo Jair Escalante, Manuel Montes and Enrique Sucar Computer Science Department.

Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.

AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Usman Roshan Dept. of Computer Science NJIT

Queensland University of Technology

Data Mining – Intro.

Evaluating Classifiers

An Empirical Comparison of Supervised Learning Algorithms

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Learning with information of features

Revision (Part II) Ke Chen

Pattern Recognition and Machine Learning

MILESTONE RESULTS Mar. 1st, 2007

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Modeling IDS using hybrid intelligent systems

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Thanks

Part I INTRODUCTION

Model selection Selecting models (neural net, decision tree, SVM, …) Selecting hyperparameters (number of hidden units, weight decay/ridge, kernel parameters, …) Selecting variables or features (space dimensionality reduction.) Selecting patterns (data cleaning, data reduction, e.g by clustering.)

Performance prediction challenge How good are you at predicting how good you are? Practically important in pilot studies. Good performance predictions render model selection trivial.

Model Selection Game Find which model works best in a well controlled environment. A given “sandbox”: the CLOP Matlab ® toolbox. Focus only on devising model selection strategy. Same datasets as the performance prediction challenge, but “reshuffled” Two $500 prizes offered.

Agnostic Learning vs. Prior Knowledge challenge When everything else fails, ask for additional domain knowledge… Two tracks: –Agnostic learning: Preprocessed datasets in a nice “feature-based” representation, but no knowledge about the identity of the features. –Prior knowledge: Raw data, sometimes not in a feature-based representation. Information given about the nature and structure of the data.

Game rules Date started: October 1 st, Date ended: December 1 st, 2006 Duration: 3 months. Submit in Agnostic track only. Optionally use CLOP or Spider. Five last complete entries ranked: –Total ALvsPK challenge entrants: 22. –Total ALvsPK developement entries: 546. –Number of game ranked participants: 10. –Number of game ranked submissions: 39.

Datasets Dataset Domain Type Feat- ures Training Examples Validation Examples Test Examples ADA Marketing Dense GINA Digits Dense HIVA Drug discovery Dense NOVA Text classif. Sparse binary SYLVA Ecology Dense

Baseline BER distribution (Performance prediction challenge, 145 entrants) Test BER

Agnostic track on Dec. 1 st 2006 Yellow: used a CLOP model CLOP prize winner: Juha Reunanen (both ave. rank and ave. BER) Best ave. BER still held by Reference (Gavin Cawley) with the_bad.

Part II PROTOCOL and SCORING

Protocol Data split: training/validation/test. Data proportions: 10/1/100. Online feed-back on validation data. Validation label release: not yet; one month before end of challenge. Final ranking on test data using the five last complete submissions for each entrant.

Performance metrics Balanced Error Rate (BER): average of error rates of positive class and negative class. Area Under the ROC Curve (AUC). Guess error (for the performance prediction challenge only):  BER = abs(testBER – guessedBER)

CLOP CLOP=Challenge Learning Object Package. Based on the Spider developed at the Max Planck Institute. Two basic abstractions: –Data object –Model object

CLOP tutorial  D=data(X,Y);  hyper = {'degree=3', 'shrinkage=0.1'};  model = kridge(hyper);  [resu, model] = train(model, D);  tresu = test(model, testD);  model = chain({standardize,kridge(hyper)}); At the Matlab prompt:

CLOP models

Preprocessing and FS

Model grouping for k=1:10 base_model{k}=chain({standardize, naive}); end my_model=ensemble(base_model);

Part III RESULT ANALYSIS

What did we expect? Learn about new competitive machine learning techniques. Identify competitive methods of performance prediction, model selection, and ensemble learning (theory put into practice). Drive research in the direction of refining such methods (on-going benchmark).

Method comparison (PPC)  BER Test BER Agnostic track no significant improvement so far

LS-SVM Gavin Cawley, July 2006

Logitboost Roman Lutz, July 2006

CLOP models (best entrant) DatasetCLOP models selected ADA 2*{sns,std,norm,gentleboost(neural),bias}; 2*{std,norm,gentleboost(kridge),bias}; 1*{rf,bias} GINA 6*{std,gs,svc(degree=1)}; 3*{std,svc(degree=2)} HIVA 3*{norm,svc(degree=1),bias} NOVA 5*{norm,gentleboost(kridge),bias} SYLVA 4*{std,norm,gentleboost(neural),bias}; 4*{std,neural}; 1*{rf,bias} Juha Reunanen, cross-indexing-7 sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown)

CLOP models (2 nd best entrant) DatasetCLOP models selected ADA {sns, std, norm, neural(units=5), bias} GINA {norm, svc(degree=5, shrinkage=0.01), bias} HIVA {std, norm, gentleboost(kridge), bias} NOVA {norm,gentleboost(neural), bias} SYLVA {std, norm, neural(units=1), bias} Hugo Jair Escalante Balderas, BRun sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown) Note: entry Boosting_1_001_x900 gave better results, but was older.

Danger of overfitting (PPC) BER Time (days) ADA GINA HIVA NOVA SYLVA Full line: test BER Dashed line: validation BER

Two best CLOP entrants (game) Time Ave. test BER H._Jair_Escalante Juha Reunanen Statistically significant difference for 3/5 datasets.

Stats / CV / bounds ???

Top ranking methods Performance prediction: –CV with many splits 90% train / 10% validation –Nested CV loops Model selection –Performance prediction challenge Use of a single model family Regularized risk / Bayesian priors Ensemble methods Nested CV loops, computationally efficient with with VLOO –Model selection game Cross-indexing Particle swarm

Part IV COMPETE NOW in the PRIOR KNOWLEDGE TRACK

ADA ADA is the marketing database Task: Discover high revenue people from census data. Two-class pb. Source: Census bureau, “Adult” database from the UCI machine- learning repository. Features: 14 original attributes including age, workclass, education, education, marital status, occupation, native country. Continuous, binary and categorical features.

GINA Task: Handwritten digit recognition. Separate the odd from the even digits. Two-class pb. with heterogeneous classes. Source: MNIST database formatted by LeCun and Cortes. Features: 28x28 pixel map. GINA is the digit database

HIVA HIVA is the HIV database Task: Find compounds active against the AIDS HIV infection. We brought it back to a two-class pb. (active vs. inactive), but provide the original labels (active, moderately active, and inactive). Data source: National Cancer Inst. Data representation: The compounds are represented by their 3d molecular structure.

NOVA NOVA is the text classification database Task: Classify newsgroup s into politics or religion vs. other topics. Source: The 20-Newsgroup dataset from in the UCI machine-learning repository. Data representation : The raw text with an estimated words of vocabulary. Subject: Re: Goalie masks Lines: 21 Tom Barrasso wore a great mask, one time, last season. He unveiled it at a game in Boston. It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, along with a steel mill on one side and the Civic Arena on the other. On the back of the helmet was the old Pens' logo the current (at the time) Pens logo, and a space for the "new" logo. A great mask done in by a goalie's superstition. Lori

SYLVA SYLVA is the ecology database Task: Classify forest cover types into Ponderosa pine vs. everything else. Source: US Forest Service (USFS). Data representation: Forest cover type for 30 x 30 meter cells encoded with 108 features (elavation, hill shade, wilderness type, soil type, etc.)

How to enter? Enter results on any dataset in either track until March 1 st 2007 at Only “complete” entries (on 5 datasets) will be ranked. The 5 last will count. Seven prizes: –Best overall agnostic entry. –Best overall prior knowledge entry. –Best prior knowledge result in each dataset (5 prizes). –Best paper.

Conclusions Less participation volume as in the previous challenges: –Entry level higher –Other on-going competitions Top methods in agnostic track as before –LS-SVMs and boosted logistic trees Top ranking entries closely followed by CLOP entries showing great advances in model selection. Todo: upgrade CLOP with LS-SVMs and logitboost.

Open problems Bridge the gap between theory and practice… What are the best estimators of the variance of CV? What should k be in k-fold? Are other cross-validation methods better than k- fold (e.g bootstrap, 5x2CV)? Are there better “hybrid” methods? What search strategies are best? More than 2 levels of inference?