Download presentation
Presentation is loading. Please wait.
1
AutoML challenge Isabelle Guyon automl@ChaLearn.org
2
Challenges in Machine Learninghttp://chalearn.org ChaLearn & Codalab Codalab management: Evelyne Viegas Percy Liang Erick Watson Software development: Eric Carmichael Ivan Judson Christophe Poulain Percy Liang Arthur Pesah Xavier Baro Sole Michael Zyskowski Advisors and beta testers: Kristin Bennett Cecile Capponi Richard Caruana Gideon Dror Sergio Escalera Tin Kam Ho Hugo Larochelle Víctor Ponce López Nuria Macia Simon Mercer Florin Popescu Danny Silver, Data providers: Yindalon Aphinyanaphongs Olivier Chapelle Hugo Jair Escalante Sergio Escalera Zainab Iftikhar Malhi Vincent Lemaire Chih Jen Lin Meysam Madani Bisakha Ray Mehreen Saeed Alexander Statnikov Gustavo Stolovitzky H-J. Thiesen Ioannis Tsamardinos
3
Big data growth ChaLearn AutoML challengehttp://automl.chalearn.org
4
ChaLearn AutoML challengehttp://automl.chalearn.org Data scientist is the sexiest job of the 21 st century
5
Davenport and Patil, Harvard Business Review, 2012 ChaLearn AutoML challengehttp://automl.chalearn.org
6
The dream ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA AutoML black box Trained model Answer y Query x
7
The dream ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA AutoML black box Trained model Answer y Query x
8
The REALITY ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA Trained model Answer y Query x Hyper-parameter tuning
9
ChaLearn ML challenges ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA Crowd Trained model Answer y Query x
10
AutoML challenge ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA Trained model Answer y Query x Crowd AutoML black box
11
Why would the crowd do that? ChaLearn AutoML challengehttp://automl.chalearn.org $30000 Fame Learning Fun Workshop Dream
12
Challenges in Machine Learninghttp://chalearn.org What tasks? -INPUT = I.I.D. data in feature representation, but: Sparse or full matrices. Numerical/categorical/binary variables. Missing values or not. Noisy data or not. Various proportions Ntrain / Nfeat. -OUPUT = one target, but: Binary (two-classes, balanced or not). Categorical (multi-class: tens, hundreds of classes) Multi-label. Regression. -OBJECTIVE = miscellaneous loss funtions. -COMPUTATIONAL RESOURCES = fixed.
13
Challenges in Machine Learninghttp://chalearn.org What data? -30 datasets spanning various domains: pharmacology, medicine, marketing, ecology, text, image, video and speech processing. -6 rounds (x 5 datasets). -Early phases: partially “recycled data”. -Late phases: novel data. -Medium hard tasks (10-20% error) to separate algorithms best. -Large test sets (~>10,000 examples) to ensure statistical significance.
14
Round O: sample data ChaLearn AutoML challengehttp://automl.chalearn.org US census 1994 (Kohavi and Becker) MNIST HWR. LeCun, Cortes, and Burges (1998) House prices. Pace and Barry (1997) Pharmacology. DuPont Pharmaceuticals (2001) 20 NewsGroups. Lang and Mitchell (1997)
15
Next rounds 1.NOVICE: Binary classification.. 2.INTERMEDIATE: Multiclass classification. 3.ADVANCED: Multiclass and multilabel. 4.EXPERT: Classification and regression. 5.MASTER: All of the above. ChaLearn AutoML challengehttp://automl.chalearn.org
16
What’s exciting? -Intellectually challenging: Completely autonomous learner Beat the “no free lunch theorem” Model selection Meta learning Two-level objectives Any overall objective Any time -Practically important: Improve cost effectiveness Improve reliability Reach out to more applications Challenges in Machine Learninghttp://chalearn.org
17
How? Challenges in Machine Learninghttp://chalearn.org o AutoML: Automatic code execution on Codalab platform. o Tweakathon: Result or code submission. o To earn prizes: code should be made open source.
18
How? Challenges in Machine Learninghttp://chalearn.org n-1 n+1 ROUND n PHASESUBMISSION /EVALUATION
19
How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] n-1 n+1 ROUND n Blind test on fresh data
20
How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] n-1 n+1 ROUND n PHASESUBMISSION /EVALUATION Code Blind test on fresh data
21
How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code Blind test on fresh data
22
How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code Blind test on fresh data
23
How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code Code and/or Results Validation set Blind test on fresh data
24
How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code and/or Results Validation set Test set Final [+] Blind test on fresh data Code
25
How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon AutoML [+] n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code and/or Results Validation set Test set Final [+] Blind test on fresh data Code
26
How? Challenges in Machine Learninghttp://chalearn.org Tweakathon AutoML [+] Tweakathon AutoML [+] n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Data release Final [+] Code and/or Results Validation set Test set Code and/or Results Validation set Test set Final [+] Blind test on fresh data
27
How? We provide sample submissions. We provide a starting kit in Python using the scikit-learn library. The platform accepts: –Python scripts. –Linux executables. –Java JRE executables. ChaLearn AutoML challengehttp://automl.chalearn.org
28
When? Challenges in Machine Learninghttp://chalearn.org 2014 -Start: December 2014. ------ NIPS 2014, Montreal (December) ------ 2015 -Phase 0 ends: Mid February. -Phase 1 ends: Mid March. -Phase 2 ends: Mid April. -Phase 3 ends: Mid May. -Phase 4/5 end: Mid June. ------ IJCNN 2015, Ireland (July) (and CiML2 Paris, and ICML Lille?) ------
29
Data scientist is the sexiest job of the 21 st century Davenport and Patil, Harvard Business Review 2012. ChaLearn AutoML challengehttp://automl.chalearn.org
30
How to lie with statistics Darrell Huff, 1954 ChaLearn AutoML challengehttp://automl.chalearn.org
31
Recent trends ChaLearn AutoML challengehttp://automl.chalearn.org
32
We are not AutoML (yet) ChaLearn AutoML challengehttp://automl.chalearn.org Videolectures word cloud, Grant Marshall, KDnuggests, Sept 2014
33
ChaLearn AutoML challengehttp://automl.chalearn.org
34
AutoML challenge http://automl.chalearn.org Thank you
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.