Presentation is loading. Please wait.

Presentation is loading. Please wait.

MILESTONE RESULTS Mar. 1st, 2007

Similar presentations


Presentation on theme: "MILESTONE RESULTS Mar. 1st, 2007"— Presentation transcript:

1 MILESTONE RESULTS Mar. 1st, 2007
Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

2 Thanks

3 Part I DATASETS

4 Datasets http://www.agnostic.inf.ethz.ch Dataset Domain Type Feat-ures
Training Examples Validation Examples Test Examples ADA Marketing Dense 48 4147 415 41471 GINA Digits Dense 970 3153 315 31532 HIVA Drug discovery Dense 1617 3845 384 38449 NOVA Text classif. Sparse binary 16969 1754 175 17537 SYLVA Ecology Dense 216 13086 1308 130858

5 ADA is the marketing database
Task: Discover high revenue people from census data. Two-class pb. Source: Census bureau, “Adult” database from the UCI machine-learning repository. Features: 14 original attributes including age, workclass,  education, education, marital status, occupation, native country. Continuous, binary and categorical features.

6 GINA is the digit database
Task: Handwritten digit recognition. Separate the odd from the even digits. Two-class pb. with heterogeneous classes. Source: MNIST database formatted by LeCun and Cortes. Features: 28x28 pixel map.

7 HIVA is the HIV database
Task: Find compounds active against the AIDS HIV infection. We brought it back to a two-class pb. (active vs. inactive), but provide the original labels (active, moderately active, and inactive). Data source: National Cancer Inst. Data representation: The compounds are represented by their 3d molecular structure.

8 NOVA is the text classification database
Subject: Re: Goalie masks Lines: 21 Tom Barrasso wore a great mask, one time, last season.  He unveiled it at a game in Boston.  It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, along with a steel mill on one side and the Civic Arena on the other.   On the back of the helmet was the old Pens' logo the current (at the time) Pens logo, and a space for the "new" logo. A great mask done in by a goalie's superstition. Lori  NOVA is the text classification database Task: Classify newsgroup s into politics or religion vs. other topics. Source: The 20-Newsgroup dataset from in the UCI machine-learning repository. Data representation : The raw text with an estimated words of vocabulary.

9 SYLVA SYLVA is the ecology database
Task: Classify forest cover types into Ponderosa pine vs. everything else. Source: US Forest Service (USFS). Data representation: Forest cover type for 30 x 30 meter cells encoded with 108 features (elavation, hill shade, wilderness type, soil type, etc.)

10 The same datasets were used in:
Previous Challenges The same datasets were used in: The WCCI 2006 performance prediction challenge. How good are you at predicting how good you are? Practically important in pilot studies. Good performance predictions render model selection trivial. Nature of datasets and features unknown to participants. The NIPS 2006 model selection game. Which model works best in a well controlled environment? A given “sandbox”: the CLOP Matlab® toolbox. Focus only on devising model selection strategy. Same datasets as WCCI 2006 challenge, different shuffling.

11 Agnostic Learning vs. Prior Knowledge challenge
When everything else fails, ask for additional domain knowledge… Two tracks: Agnostic learning: Preprocessed datasets in a nice “feature-based” representation, but no knowledge about the identity of the features. Prior knowledge: Raw data, sometimes not in a feature-based representation. Information given about the nature and structure of the data.

12 Part II PROTOCOL and SCORING

13 Protocol Data split: training/validation/test.
Data proportions: 10/1/100. Online feed-back on validation data (1st phase). Validation labels released in February, 2007. Challenge prolongated until August 1st, 2007. Final ranking on test data using the five last complete submissions for each entrant.

14 Performance metrics Balanced Error Rate (BER): average of error rates of positive class and negative class. Area Under the ROC Curve (AUC). Guess error (for the performance prediction challenge only): dBER = abs(testBER – guessedBER)

15 Ranking Compute an overall score:
For each dataset, regardless of the track, rank all the entries with “test BER”. Score=entry_rank/max_rank. Overall_score=average score over datasets. Keep only the last five complete entries of each participant, regardless of track. Individual dataset ranking: For each dataset, make one ranking for each track using “test BER”. Overall ranking: Rank the entries separately in each track with their overall score. Entries having “prior knowledge” results for at least one dataset are entered in the “prior knowledge” track.

16 Part III RESULT ANALYSIS

17 Challenge statistics Date started: October 1st, 2006.
Milestone (NIPS 06): December 1st, 2006 Milestone: March 1st, 2007 Date will end: August 1st, 2007 Duration up to now: 5 months. Five last complete entries ranked (March 1st): Total ALvsPK challenge entrants: 35. Total ALvsPK development entries: 77 prior agnos. Number of ranked participants: 11 (prior), 15 (agnos). Number of ranked submissions: 22 prior + 28 agnos The fast that there are fewer prior K entries make the results on prior K stronger.

18 BER distribution (March 1st)
Agnostic learning Prior knowledge Since fewer submissions were made for PK, it superiority is more pronounced. The black vertical line indicates the best ranked entry (only the 5 last entry of each participant were ranked). Beware of overfitting!

19 Agnostic learning ranks as of December 1st, 2006
Milestone results Agnostic learning ranks as of December 1st, 2006 Yellow: CLOP model. CLOP prize winner: Juha Reunanen (both ave. rank and ave. BER). Best ave. BER held by Reference (Gavin Cawley) with “the bad”.

20 Milestone results (cont.)
Agnostic learning best ranked entries as of March 1st, 2007 Best ave. BER still held by Reference (Gavin Cawley) with “the bad”. Note that the best entry for each dataset is not necessarily the best entry overall. Some of the best agnostic entries of individual datasets were made as part of prior knowledge entries (the bottom four); there is no corresponding overall agnostic ranking.

21 Milestone results (cont.)
Prior knowledge best ranked entries as of March 1st, 2007 Best ave. BER held by Reference (Gavin Cawley) with “interim all prior”. Note that the overall entry ranking is performed with the overall score (average rank over all datasets). The best performing complete entry may not contain all the best performing entries on the individual datasets.

22 Individual dataset leaders
Agnostic learning ADA: Roman Lutz with LogitBoost with trees GINA: Roman Lutz with Doubleboost HIVA: Vojtech Franc with SVM-RBF NOVA: Roman Lutz with Doubleboost SYLVA: Roman Lutz with LogitBoost with trees Prior knowledge ADA: Marc Boulle with Data Grid (Coclustering) GINA: Vojtech Franc with SVM-RBF HIVA: Chloe Azencott with final svm # 2 NOVA: Jorge Sueiras with Boost mix SYLVA: Roman Lutz with Doubleboost

23 AL vs. PK, who wins? We compare the best results of the ranked entries for entrants who entered both tracks. If the Agnostic Learning BER larger than the Prior Knowledge BER, “1” is shown in the table. The pvalue of the sign test reveals not PK not significantly better than AL except for SYLVA. We need more entrant who enter both tracks to get conclusive results from that test.

24 Learning Curves (Oct 1st – Mar 1st)
Best BER on test data at a certain time Blue: agnostic learning Red: prior knowledge 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 ADA time 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 GINA time

25 Learning Curves (Oct 1st – Mar 1st)
0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 HIVA time Best BER on test data 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 NOVA time 3.898 3.9 3.902 3.904 3.906 3.908 3.91 3.912 3.914 3.916 x 10 4 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 SYLVA time Blue: agnostic learning Red: prior knowledge

26 How to enter? Enter results on any dataset in either track until August 1st 2007 at Only “complete” entries (on 5 datasets) will be ranked. The 5 last will count. Prizes: Best overall agnostic entry. Best prior knowledge result in each dataset.


Download ppt "MILESTONE RESULTS Mar. 1st, 2007"

Similar presentations


Ads by Google