CLOP A MATLAB® learning object package

Slides:



Advertisements
Similar presentations
Active Learning Challenge Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,
Advertisements

Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Support Vector Machines
CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.
Large-Scale, Real-World Face Recognition in Movie Trailers Week 2-3 Alan Wright (Facial Recog. pictures taken from Enrique Gortez)
Lecture 4: Embedded methods
Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.
Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Feature selection methods from correlation to causality Isabelle Guyon NIPS 2008 workshop on kernel learning.
Learning Theory Put to Work Isabelle Guyon
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Feature selection methods Isabelle Guyon IPAM summer school on Mathematics in Brain Imaging. July 2008.
RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see
Novel representations and methods in text classification Manuel Montes, Hugo Jair Escalante Instituto Nacional de Astrofísica, Óptica y Electrónica, México.
Baseline Methods for the Feature Extraction Class Isabelle Guyon Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000.
Text Classification using SVM- light DSSI 2008 Jing Jiang.
Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,
Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, Five Datasets were provided for experiments: ARCENE: cancer diagnosis.
An Example of Course Project Face Identification.
Lab 1 Getting started with CLOP and the Spider package.
1 Lab 1 Getting started with Basic Learning Machines and the Overfitting Problem.
Challenge Submissions for the Feature Extraction Class Georg Schneider my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'});
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Support vector machines for classification Radek Zíka
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li University of Zurich Department of Informatics The NIPS 2003 challenge was organized to.
Classifiers Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/09/15.
Lecture 2: Learning without Over-learning Isabelle Guyon
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu.
PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge Hugo Jair Escalante, Manuel Montes and Enrique Sucar Computer Science Department.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies.
Competitions in machine learning: the fun, the art, and the science Isabelle Guyon Clopinet, Berkeley, California
Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
Feature (Gene) Selection MethodsSample Classification Methods Gene filtering: Variance (SD/Mean) Principal Component Analysis Regression using variable.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Presented by: Isabelle Guyon Machine Learning Research.
Technische Universität München Yulia Gembarzhevskaya LARGE-SCALE MALWARE CLASSIFICATON USING RANDOM PROJECTIONS AND NEURAL NETWORKS Technische Universität.
Portfolio Selection with Support Vector Regression Henrique, Pedro Alexandre University of Brasilia, Brazil.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
AutoML challenge Isabelle Guyon
Restaurant Revenue Prediction using Machine Learning Algorithms
Building Machine Learning System with Python
Zhenshan, Wen SVM Implementation Zhenshan, Wen
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
CS 2750: Machine Learning Support Vector Machines
Gesture Recognition Challenge
Reflections on GDPR Article 15
MILESTONE RESULTS Mar. 1st, 2007
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
CAMCOS Report Day December 9th, 2015 San Jose State University
Credit Card Fraudulent Transaction Detection
Presentation transcript:

CLOP A MATLAB® learning object package

What is CLOP? CLOP stands for Challenge Learning Object Package ( It was developed for use in ML challenges with hundreds of thousands of features and/or examples)

What is CLOP? CLOP is an object-oriented Matlab package using the “Spider” interface

DATA OBJECTS

data(X, Y) % Load data:  X=load([data_dir 'gisette_train.data']);  Y=load([data_dir 'gisette_train.labels']); % Create a data object and examine it:  dat=data(X, Y);  b browse(dat, 2);

ALGORITHM OBJECTS

algo(hyperparam) % Create data objects: ttrainD=data(X,Y); ttestD=data(Xt,Yt); % Define some hyperparameters: hhyper = {'degree=3', 'shrinkage=0.1'}; % Create a kernel ridge regression model: mmodel = kridge(hyper); % Train it and test it: [[resu, Model] = train(model, trainD); ttresu = test(Model, testD); % Visualize the results:  r roc(tresu);

COMPOUND MODELS

Preprocessing % For example, create a smoothing kernel:  my_ker=gauss_ker({'dim1=11', 'dim2=11', 'sigma1=2', 'sigma2=2'});  s show(my_ker); % Create a preprocessing object of type convolve: y_prepro=convolve(my_ker); % Perform the preprocessing and visualize the results:  d=train(my_prepro, dat);  b browse(d, 2);

chain({model1, model2,…}) % Combine preprocessing and kernel ridge regression: mmodel = chain({my_prepro,kridge(hyper)}); % Combine replicas of a base learner: ffor k=1:10 bbase_model{k}=chain({my_prepro, naive}); eend mmy_model=ensemble(base_model); ensemble({model1, model2,…})

BASIC METHODS

train(model, trainD) % After creating your complex model, just one command: train mmodel=ensemble({chain({standardize,kridge(hyper)}),chain({normalize,naive})}); [[resu, Model] = train(model, trainD); % After training your complex model, just one command: test ttresu = test(My_model, testD); % You can chain with a “cv” object to perform cross-validation: ccv_model=cv(my_model); % Just call train and test on it! test(Model, testD)

BASIC OBJECTS

Some CLOP objects Basic learning machines Feature selection, pre- and post- processing Compound models

BENCHMARKS

Best BER=6.22  0.57% - n0=20 (4%) – BER0=7.33% MADELON Best BER=6.22  0.57% - n0=20 (4%) – BER0=7.33% my_classif=svc({'coef0=1', 'degree=0', 'gamma=1', 'shrinkage=1'}); my_model=chain({probe(relief,{'p_num=2000', 'pval_max=0'}), standardize, my_classif}) Best BER=8.54  0.99% - n0=1000 (1%) – BER0=12.37% DOROTHEA Best BER=8.54  0.99% - n0=1000 (1%) – BER0=12.37% my_model=chain({TP('f_max=1000'), naive, bias}); Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark, Isabelle Guyon, Jiwen Li, Theodor Mader, Patrick A. Pletscher, Georg Schneider and Markus Uhr,Pattern Recognition Letters, Volume 28, Issue 12, 1 September 2007, Pages Dataset SizeTypeFeatures Training Examples Validation Examples Test Examples Arcene 8.7 MB Dense Gisette 22.5 MB Dense Dexter 0.9 MB Sparse integer Dorothea 4.7 MB Sparse binary Madelon 2.9 MB Dense Class taught at ETH, Zurich, winter 2005 Task of the students: Baseline method provided, BER0 performance and n0 features. Get BER<BER0 or BER=BER0 but n<n0. Extra credit for beating best challenge entry. GISETTE DOROTHEA NEW YORK, October 2, 2001 – Instinet Group Incorporated (Nasdaq: INET), the world’s largest electronic agency securities broker, today announced tha DEXTER MADELON ARCENE Best BER=3.30  0.40% - n0=300 (1.5%) – BER0=5% DEXTER Best BER=3.30  0.40% - n0=300 (1.5%) – BER0=5% my_classif=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.5'}); my_model=chain({s2n('f_max=300'), normalize, my_classif}) Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'}); my_model=chain({normalize, s2n('f_max=1000'), my_classif}); Best BER= 11.9  1.2 % - n0=1100 (11%) – BER0=14.7% ARCENE Best BER= 11.9  1.2 % - n0=1100 (11%) – BER0=14.7% my_svc=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=0.1'}); my_model=chain({standardize, s2n('f_max=1100'), normalize, my_svc}) NIPS 2003 Feature Selection Challenge

NIPS 2006 Model Selection Game Data set CLOP models selected ADA 2*{sns,std,norm,gentleboost(neural),bias}; 2*{std,norm,gentleboost(kridge),bias}; 1*{rf,bias} GI NA 6*{std,gs,svc(degree=1)}; 3*{std,svc(degree=2)} HI VA 3*{norm,svc(degree=1),bias} NO VA 5*{norm,gentleboost(kridge),bias} SYL VA 4*{std,norm,gentleboost(neural),bias}; 4*{std,neural}; 1*{rf,bias} First place: Juha Reunanen, cross-indexing-7 sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown) Dat aset CLOP models selected ADA {sns, std, norm, neural(units=5), bias} GI NA {norm, svc(degree=5, shrinkage=0.01), bias} HI VA {std, norm, gentleboost(kridge), bias} N O VA {norm,gentleboost(neural), bias} SYL VA {std, norm, neural(units=1), bias} Second place: Hugo Jair Escalante Balderas, BRun sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown) Note: entry Boosting_1_001_x900 gave better results, but was older. Subject: Re: Goalie masks Lines: 21 Tom Barrasso wore a great mask, one time, last season. It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, along with a steel mill on one side and the Civic Arena on the other. On the back of the helmet was the old Pens' logo the current (at the time) Pens logo, and a space for the "new" logo. Lori NOVA GINA HIVA ADA SYLVA Dataset Domain Feature #Training #Validation #Test # ADAMarketing GINADigit recognition HIVADrug discovery NOVAText classification SYLVA Ecology Proc. IJCNN07, Orlando, FL, Aug, 2007: PSMS for Neural Networks H. Jair Escalante, Manuel Montes y G´omez, and Luis Enrique Sucar Model Selection and Assessment Using Cross-indexing, Juha Reunanen

Credits The Challenge Learning Object Package (CLOP) is based on code to which many people have contributed: - The developers of CLOP: Isabelle Guyon and Amir Reza Saffari Azar. - The creators of The Spider: Jason Weston, André Elisseeff, Gökhan BakIr, Fabian Sinz. - The developers of the packages attached to CLOP: Olivier Chapelle, Hugo Jair Escalante Balderas (PSMS), Gavin Cawley (LSSVM), Chih-Chung Chang and Chih- JenLin Jun-Cheng (LIBSVM), Chen, Kuan-Jen Peng, Chih-Yuan Yan, Chih-Huai Cheng, and Rong-En Fan (LIBSVM Matlab interface), Junshui Ma and Yi Zhao (second LIBSVM Matlab interface), Leo Breiman and Adele Cutler (Random Forests), Ting Wang (RF Matlab interface), Ian Nabney and Christopher Bishop (NETLAB). - The contributors to other Spider functions or packages: Thorsten Joachims (SVMLight), Chih-Chung Chang and Chih-JenLin (LIBSVM), Ronan Collobert (SVM Torch II), Jez Hill, Jan Eichhorn, Rodrigo Fernandez, Holger Froehlich, Gorden Jemwa, Kiyoung Yang, Chirag Patel, Sergio Rojas. - The authors of the Weka package and the R project who made code available, which was interfaced to Matlab and made accessible to CLOP.WekaR project

Book with CLOP and datasets Feature Extraction, Foundations and Applications, Isabelle Guyon, Steve Gunn, et al, Eds. Isabelle GuyonSteve Gunn Springer, CD including CLOP and the data of the NIPS2003 challenge Tutorial chapters Invited papers on the best results of the challenge