Baseline Methods for the Feature Extraction Class Isabelle Guyon Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000.

Slides:



Advertisements
Similar presentations
Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )
Limin Wang, Yu Qiao, and Xiaoou Tang
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Support Vector Machines Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas Second Edition A Tutorial on Support Vector Machines for Pattern.
Reduced Support Vector Machine
ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
FACE RECOGNITION, EXPERIMENTS WITH RANDOM PROJECTION
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Scalable Text Mining with Sparse Generative Models
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see
CLOP A MATLAB® learning object package
Masquerade Detection Mark Stamp 1Masquerade Detection.
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, Five Datasets were provided for experiments: ARCENE: cancer diagnosis.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Lab 1 Getting started with CLOP and the Spider package.
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Kaihua Zhang Lei Zhang (PolyU, Hong Kong) Ming-Hsuan Yang (UC Merced, California, U.S.A. ) Real-Time Compressive Tracking.
1 Lab 1 Getting started with Basic Learning Machines and the Overfitting Problem.
Challenge Submissions for the Feature Extraction Class Georg Schneider my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'});
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Universit at Dortmund, LS VIII
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li University of Zurich Department of Informatics The NIPS 2003 challenge was organized to.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge Hugo Jair Escalante, Manuel Montes and Enrique Sucar Computer Science Department.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Consensus Group Stable Feature Selection
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
Queensland University of Technology
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Face Recognition and Feature Subspaces
Are End-to-end Systems the Ultimate Solutions for NLP?
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Project 1: Text Classification by Neural Networks
Principal Component Analysis
The European Conference on e-learing ,2017/10
Decision Trees for Mining Data Streams
MILESTONE RESULTS Mar. 1st, 2007
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Presentation transcript:

Baseline Methods for the Feature Extraction Class Isabelle Guyon Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'}); my_model=chain({normalize, s2n('f_max=1000'), my_classif}); Best BER= 11.9  1.2 % - n0=1100 (11%) – BER0=14.7% ARCENE Best BER= 11.9  1.2 % - n0=1100 (11%) – BER0=14.7% my_svc=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=0.1'}); my_model=chain({standardize, s2n('f_max=1100'), normalize, my_svc}) BACKGROUND We present supplementary course material complementing the book “Feature Extraction, Fundamentals and Applications”, I. Guyon et al Eds., to appear in Springer. Classical algorithms of feature extraction were reviewed in class. More attention was given to the feature selection than feature construction because of the recent success of methods involving a large number of "low-level" features. The book includes the results of a NIPS 2003 feature selection challenge. The students learned techniques employed by the best challengers and tried to match the best performances. A Matlab® toolbox was provided with sample code.The students could makepost-challenge entries to: Challenge: Good performance + few features. Tasks: Two-class classification. Data split: Training/validation/test. Valid entry: Results on all 5 datasets. DATASETS Dataset Size MB TypeFeaturesTrainingValidationTest Arcene 8.7Dense Gisette 22.5Dense Dexter 0.9Sparse Dorothea 4.7Sparse Madelon 2.9Dense Best BER=6.22  0.57% - n0=20 (4%) – BER0=7.33% MADELON Best BER=6.22  0.57% - n0=20 (4%) – BER0=7.33% my_classif=svc({'coef0=1', 'degree=0', 'gamma=1', 'shrinkage=1'}); my_model=chain({probe(relief,{'p_num=2000', 'pval_max=0'}), standardize, my_classif}) METHODS Scoring: Ranking according to test set balanced error rate (BER), i.e. the average positive class error rate and negative class error rate. Ties broken by the feature set size. Learning objects: CLOP learning objects implemented in Matlab®. Two simple abstractions: data and algorithm. Download: Task of the students: Baseline method provided, BER0 performance and n0 features. Get BER<BER0 or BER=BER0 but n<n0. Extra credit for beating the best challenge entry. OK to use the validation set labels for training. Best BER=8.54  0.99% - n0=1000 (1%) – BER0=12.37% DOROTHEA Best BER=8.54  0.99% - n0=1000 (1%) – BER0=12.37% my_model=chain({TP('f_max=1000'), naive, bias}); Best BER=3.30  0.40% - n0=300 (1.5%) – BER0=5% DEXTER Best BER=3.30  0.40% - n0=300 (1.5%) – BER0=5% my_classif=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.5'}); my_model=chain({s2n('f_max=300'), normalize, my_classif}) ARCENE DEXTER DEXTER: text categorization GISETTE GISETTE: digit recognition DOROTHEA: drug discovery DOROTHEA RESULTS MADELON MADELON: artificial data CONCLUSIONS The performances of the challengers could be matched with the CLOP library. Simple filter methods (S2N and Relief) were sufficient to get a space dimensionality reduction comparable to what the winners obtained. SVMs are easy to use and generally work better than other methods. We experienced with Gisette to add prior knowledge about the task and could outperform the winners. Further work includes using prior knowledge for other datasets. NEW YORK, October 2, 2001 – Instinet Group Incorporated (Nasdaq: INET), the world’s largest electronic agency securities broker, today announced tha ARCENE: cancer diagnosis