1 Unsupervised and Transfer Learning Challenge Unsupervised and Transfer Learning Challenge Isabelle Guyon Clopinet, California IJCNN 2011 San Jose, California Jul. 31, Aug. 5, 2011
2 Unsupervised and Transfer Learning Challenge Challenge protocol and implemetation: Web platform: Server made available by Prof. Joachim Buhmann, ETH Zurich, Switzerland. Computer admin.: Peter Schueffler. Webmaster: Olivier Guyon, MisterP.net, France. Protocol review and advising: David W. Aha, Naval Research Laboratory, USA. Gideon Dror, Academic College of Tel-Aviv Yaffo, Israel. Vincent Lemaire, Orange Research Labs, France. Gavin Cawley, University of east Anglia, UK. Olivier Chapelle, Yahoo!, California, USA. Gerard Rinkus, Brandeis University, USA. Ulrike von Luxburg, MPI, Germany. David Grangier, NEC Labs, USA. Andrew Ng, Stanford Univ., Palo Alto, California, USA Graham Taylor, NYU, New-York. USA. Quoc V. Le, Stanford University, USA. Yann LeCun, NYU. New-York, USA. Danny Silver, Acadia Univ., Canada. Beta testing and baseline methods: Gideon Dror, Academic College of Tel-Aviv Yaffo, Israel. Vincent Lemaire, Orange Research Labs, France. Gregoire Montavon, TU Berlin, Germany. Data donors: Handwriting recognition (AVICENNA) -- Reza Farrahi Moghaddam, Mathias Adankon, Kostyantyn Filonenko, Robert Wisnovsky, and Mohamed Chériet (Ecole de technologie supérieure de Montréal, Quebec) contributed the dataset of Arabic manuscripts. The toy example (ULE) is the MNIST handwritten digit database made available by Yann LeCun and Corinna Costes. Object recognition (RITA) -- Antonio Torralba, Rob Fergus, and William T. Freeman, collected and made available publicly the 80 million tiny image dataset. Vinod Nair and Geoffrey Hinton collected and made available publicly the CIFAR datasets. See the techreport Learning Multiple Layers of Features from Tiny Images, by Alex Krizhevsky, 2009, for details. Human action recognition (HARRY) -- Ivan Laptev and Barbara Caputo collected and made publicly available the KTH human action recognition datasets. Marcin Marszałek, Ivan Laptev and Cordelia Schmid collected and made publicly available the Hollywood 2 dataset of human actions and scenes. Text processing (TERRY) -- David Lewis formatted and made publicly available the RCV1-v2 Text Categorization Test Collection. Ecology (SYLVESTER) -- Jock A. Blackard, Denis J. Dean, and Charles W. Anderson of the US Forest Service, USA, collected and made available the (Forest cover type) dataset. Credits
3 Unsupervised and Transfer Learning Challenge What is the problem?
4 Unsupervised and Transfer Learning Challenge Can learning about...
5 Unsupervised and Transfer Learning Challenge help us learn about…
6 Unsupervised and Transfer Learning Challenge What is Transfer Learning?
7 Unsupervised and Transfer Learning Challenge Vocabulary Target task labels Source task labels
8 Unsupervised and Transfer Learning Challenge Vocabulary Target task labels Source task labels
9 Unsupervised and Transfer Learning Challenge Vocabulary Target task labels Source task labels Domains the same? Labels available? Tasks the same?
10 Unsupervised and Transfer Learning Challenge Taxonomy of transfer learning Adapted from: A survey on transfer learning, Pan-Yang, Transfer Learning Unsupervised TL Semi-supervised TL Inductive TL No labels in both source and target domains Labels avail. ONLY in source domain Labels available in target domain No labels in source domain Labels available in source domain Transductive TL Cross-task TL Same source and target task Different source and target tasks Self-taught TL Multi-task TL
11 Unsupervised and Transfer Learning Challenge Challenge setting
12 Unsupervised and Transfer Learning Challenge Challenge setting Adapted from: A survey on transfer learning, Pan-Yang, Transfer Learning Unsupervised TL Semi-supervised TL Inductive TL No labels in both source and target domains Labels avail. ONLY in source domain Labels available in target domain No labels in source domain Labels available in source domain Transductive TL Cross-task TL Same source and target task Different source and target tasks Self-taught TL Multi-task TL
13 Unsupervised and Transfer Learning Challenge Dec 2010-April Goal: Learning data representations or kernels. Phase 1: Unsupervised learning (Dec 25, 2010-Mar 3, 2011) Phase 2: Cross-task transfer learning (Mar. 4, 2011-Apr. 15, 2011) Prizes: $ free registrations + travel awards Dissemination: ICML and IJCNN. Proceedings in JMLR W&CP. Evaluators Challenge target task labels Challenge data Validation data Development data Validation target task labels Competitors Data represen- tations
14 Unsupervised and Transfer Learning Challenge Dec 2010-April Goal: Learning data representations or kernels. Phase 1: Unsupervised learning (Dec 25, 2010-Mar 3, 2011) Phase 2: Cross-task transfer learning (Mar. 4, 2011-Apr. 15, 2011) Prizes: $ free registrations + travel awards Dissemination: ICML and IJCNN. Proceedings in JMLR W&CP. Evaluators Challenge target task labels Challenge data Validation data Development data Validation target task labels Source task labels Competitors Data represen- tations
15 Unsupervised and Transfer Learning Challenge Datasets of the challenge
16 Unsupervised and Transfer Learning Challenge Evaluation
17 Unsupervised and Transfer Learning Challenge AUC score For each set of samples queried, we assess the predictions of the learning machine with the Area under the ROC curve.
18 Unsupervised and Transfer Learning Challenge Area under the Learning Curve (ALC) Linear interpolation. Horizontal extrapolation.
19 Unsupervised and Transfer Learning Challenge Classifier used Linear discriminant: f(x) = w. x = i w i x i Hebbian learning: X = (p, N) training data matrix Y {–1/p –, +1/p + } p target vector w = X’ Y = (1/p + ) k pos x k –(1/p – ) k neg x k
20 Unsupervised and Transfer Learning Challenge Kernel version Kernel classifier: f(x) = k k k(x k, x) with a linear kernel k(x k, x) = x k. x and with k = –1/p –, if x k neg k = +1/p +, if x k pos Equivalent linear discriminant f(x) = (1/p + ) k pos x k. x – (1/p – ) k neg x k. x = w. x with w = (1/p + ) k pos x k – (1/p – ) k neg x k
21 Unsupervised and Transfer Learning Challenge Methods used
22 Unsupervised and Transfer Learning Challenge No learning 1) P Validation data Task labels C prediction Pre- processed data Challenge platform
23 Unsupervised and Transfer Learning Challenge No learning 1) P Validation data Task labels C prediction Pre- processed data Select the best preprocessing based on performance on the validation tasks
24 Unsupervised and Transfer Learning Challenge No learning 1) P
25 Unsupervised and Transfer Learning Challenge No learning 2) P Challenge data Task labels C prediction Pre- processed data Use the same preprocessor for the final evaluation
26 Unsupervised and Transfer Learning Challenge Unsupervised transfer learning P R Source domain 1) Simultaneously train a preprocessor P and a re-constructor R using unlabeled data
27 Unsupervised and Transfer Learning Challenge Unsupervised transfer learning P 1)
28 Unsupervised and Transfer Learning Challenge Unsupervised transfer learning P Target domain 2) Task labels C John Use the same preprocessor for the evaluation on target domains
29 Unsupervised and Transfer Learning Challenge Supervised data representation learning Source task labels P C Source domain Sea 1) Simultaneously train a preprocessor P and a classifier C with labeled source domain data
30 Unsupervised and Transfer Learning Challenge P 1) Supervised data representation learning
31 Unsupervised and Transfer Learning Challenge P Target domain 2) Task labels C John Use the same preprocessor for the evaluation on target domains Supervised data representation learning
32 Unsupervised and Transfer Learning Challenge Variants Use all or subsets of data for training (development/validation/challenge data). Learn what preprocessing steps to apply w. validation data (not the preprocessor) then apply the method to challenge data. Learn to reconstruct noisy versions of the data. Train a kernel instead of a preprocessor.
33 Unsupervised and Transfer Learning Challenge Results
34 Unsupervised and Transfer Learning Challenge Questions Can Transfer Learning beat raw data (or simple preprocessing)? Does Deep Learning work? Do labels help (does cross-task TL beat unsupervised TL)? Is model selection possible in TL? Did consistent TL methodologies emerge? Do the results make sense? Is there code available?
35 Unsupervised and Transfer Learning Challenge Can transfer learning beat raw data? Phase 1 (6933 jobs submitted, 41 complete final entries) Phase 2 (1141 jobs sumitted, 14 complete final entries)
36 Unsupervised and Transfer Learning Challenge Results (ALC)
37 Unsupervised and Transfer Learning Challenge Does “Deep Learning” work? Evolution of performance as a function of depth on SYLVESTER LISA team, 1 st in phase 2, 4 th in phase 1
38 Unsupervised and Transfer Learning Challenge Do labels help in TL?
39 Unsupervised and Transfer Learning Challenge Is model selection possible? Phase 1 Phase 2 Use of “transfer labels”: the - criterion (LISA team)
40 Unsupervised and Transfer Learning Challenge Did consistent methodologies emerge?
41 Unsupervised and Transfer Learning Challenge Results (ALC)
42 Unsupervised and Transfer Learning Challenge Bottom layers: Preprocessing and feature selection
43 Unsupervised and Transfer Learning Challenge Middle layers
44 Unsupervised and Transfer Learning Challenge Top layer
45 Unsupervised and Transfer Learning Challenge Implementation
46 Unsupervised and Transfer Learning Challenge A few things that worked well Learn the preprocessing steps (not the preprocessor) – Aiolli, 1 st phase 1. As 1 st steps: eliminate low info features or keep largest PC and sphere the data, normalize, and/or standardize. Learn denoising or contrastive auto-encoders or RBMs– LISA team, 1 st phase 2. Use cluster memberships of multiple K-means – 1055A team, 2 nd phase 1 and 3 rd phase 2. Transductive PCA (as last step) – LISA.
47 Unsupervised and Transfer Learning Challenge Conclusion UL: This challenge demonstrated the potential of unsupervised learning methods used as preprocessing to supervised learning tasks. UTL: Model selection of UL hyper-parameters can be carried out with “source tasks” similar to the “target tasks”. DL: Multi-step preprocessing leading to deep architectures can be trained in a greedy bottom-up step-wise manner. Favorite methods include normalizations, PCA, clustering, and auto-encoders. A kernel method won phase 1 and a Deep Learning method won phase 2.
48 Unsupervised and Transfer Learning Challenge STEP 1: Develop a “generic” gesture recognition system that can learn new signs with a few examples. STEP 2: At conference: teach the system new signs. STEP 3: Live evaluation in front of audience. June 2011-June Challenge