Presentation is loading. Please wait.

Presentation is loading. Please wait.

AutoML challenge Isabelle Guyon

Similar presentations


Presentation on theme: "AutoML challenge Isabelle Guyon"— Presentation transcript:

1 AutoML challenge Isabelle Guyon automl@ChaLearn.org

2 Challenges in Machine Learninghttp://chalearn.org ChaLearn & Codalab Codalab management: Evelyne Viegas Percy Liang Erick Watson Software development: Eric Carmichael Ivan Judson Christophe Poulain Percy Liang Arthur Pesah Xavier Baro Sole Michael Zyskowski Advisors and beta testers: Kristin Bennett Cecile Capponi Richard Caruana Gideon Dror Sergio Escalera Tin Kam Ho Hugo Larochelle Víctor Ponce López Nuria Macia Simon Mercer Florin Popescu Danny Silver, Data providers: Yindalon Aphinyanaphongs Olivier Chapelle Hugo Jair Escalante Sergio Escalera Zainab Iftikhar Malhi Vincent Lemaire Chih Jen Lin Meysam Madani Bisakha Ray Mehreen Saeed Alexander Statnikov Gustavo Stolovitzky H-J. Thiesen Ioannis Tsamardinos

3 Big data growth ChaLearn AutoML challengehttp://automl.chalearn.org

4 ChaLearn AutoML challengehttp://automl.chalearn.org Data scientist is the sexiest job of the 21 st century

5 Davenport and Patil, Harvard Business Review, 2012 ChaLearn AutoML challengehttp://automl.chalearn.org

6 The dream ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA AutoML black box Trained model Answer y Query x

7 The dream ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA AutoML black box Trained model Answer y Query x

8 The REALITY ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA Trained model Answer y Query x Hyper-parameter tuning

9 ChaLearn ML challenges ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA Crowd Trained model Answer y Query x

10 AutoML challenge ChaLearn AutoML challengehttp://automl.chalearn.org TRAINING DATA Trained model Answer y Query x Crowd AutoML black box

11 Why would the crowd do that? ChaLearn AutoML challengehttp://automl.chalearn.org $30000 Fame Learning Fun Workshop Dream

12 Challenges in Machine Learninghttp://chalearn.org What tasks? -INPUT = I.I.D. data in feature representation, but: Sparse or full matrices. Numerical/categorical/binary variables. Missing values or not. Noisy data or not. Various proportions Ntrain / Nfeat. -OUPUT = one target, but: Binary (two-classes, balanced or not). Categorical (multi-class: tens, hundreds of classes) Multi-label. Regression. -OBJECTIVE = miscellaneous loss funtions. -COMPUTATIONAL RESOURCES = fixed.

13 Challenges in Machine Learninghttp://chalearn.org What data? -30 datasets spanning various domains: pharmacology, medicine, marketing, ecology, text, image, video and speech processing. -6 rounds (x 5 datasets). -Early phases: partially “recycled data”. -Late phases: novel data. -Medium hard tasks (10-20% error) to separate algorithms best. -Large test sets (~>10,000 examples) to ensure statistical significance.

14 Round O: sample data ChaLearn AutoML challengehttp://automl.chalearn.org US census 1994 (Kohavi and Becker) MNIST HWR. LeCun, Cortes, and Burges (1998) House prices. Pace and Barry (1997) Pharmacology. DuPont Pharmaceuticals (2001) 20 NewsGroups. Lang and Mitchell (1997)

15 Next rounds 1.NOVICE: Binary classification.. 2.INTERMEDIATE: Multiclass classification. 3.ADVANCED: Multiclass and multilabel. 4.EXPERT: Classification and regression. 5.MASTER: All of the above. ChaLearn AutoML challengehttp://automl.chalearn.org

16 What’s exciting? -Intellectually challenging: Completely autonomous learner Beat the “no free lunch theorem” Model selection Meta learning Two-level objectives Any overall objective Any time -Practically important: Improve cost effectiveness Improve reliability Reach out to more applications Challenges in Machine Learninghttp://chalearn.org

17 How? Challenges in Machine Learninghttp://chalearn.org o AutoML: Automatic code execution on Codalab platform. o Tweakathon: Result or code submission. o To earn prizes: code should be made open source.

18 How? Challenges in Machine Learninghttp://chalearn.org n-1 n+1 ROUND n PHASESUBMISSION /EVALUATION

19 How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] n-1 n+1 ROUND n Blind test on fresh data

20 How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] n-1 n+1 ROUND n PHASESUBMISSION /EVALUATION Code Blind test on fresh data

21 How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code Blind test on fresh data

22 How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code Blind test on fresh data

23 How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code Code and/or Results Validation set Blind test on fresh data

24 How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code and/or Results Validation set Test set Final [+] Blind test on fresh data Code

25 How? Challenges in Machine Learninghttp://chalearn.org AutoML [+] Tweakathon AutoML [+] n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Code and/or Results Validation set Test set Final [+] Blind test on fresh data Code

26 How? Challenges in Machine Learninghttp://chalearn.org Tweakathon AutoML [+] Tweakathon AutoML [+] n-1 n+1 Data release ROUND n PHASESUBMISSION /EVALUATION Data release Final [+] Code and/or Results Validation set Test set Code and/or Results Validation set Test set Final [+] Blind test on fresh data

27 How? We provide sample submissions. We provide a starting kit in Python using the scikit-learn library. The platform accepts: –Python scripts. –Linux executables. –Java JRE executables. ChaLearn AutoML challengehttp://automl.chalearn.org

28 When? Challenges in Machine Learninghttp://chalearn.org 2014 -Start: December 2014. ------ NIPS 2014, Montreal (December) ------ 2015 -Phase 0 ends: Mid February. -Phase 1 ends: Mid March. -Phase 2 ends: Mid April. -Phase 3 ends: Mid May. -Phase 4/5 end: Mid June. ------ IJCNN 2015, Ireland (July) (and CiML2 Paris, and ICML Lille?) ------

29 Data scientist is the sexiest job of the 21 st century Davenport and Patil, Harvard Business Review 2012. ChaLearn AutoML challengehttp://automl.chalearn.org

30 How to lie with statistics Darrell Huff, 1954 ChaLearn AutoML challengehttp://automl.chalearn.org

31 Recent trends ChaLearn AutoML challengehttp://automl.chalearn.org

32 We are not AutoML (yet) ChaLearn AutoML challengehttp://automl.chalearn.org Videolectures word cloud, Grant Marshall, KDnuggests, Sept 2014

33 ChaLearn AutoML challengehttp://automl.chalearn.org

34 AutoML challenge http://automl.chalearn.org Thank you


Download ppt "AutoML challenge Isabelle Guyon"

Similar presentations


Ads by Google