Source-Selection-Free Transfer Learning

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
1 Manifold Alignment for Multitemporal Hyperspectral Image Classification H. Lexie Yang 1, Melba M. Crawford 2 School of Civil Engineering, Purdue University.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Machine learning continued Image source:
Kohonen Self Organising Maps Michael J. Watts
Data Visualization STAT 890, STAT 442, CM 462
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Transfer Learning for WiFi-based Indoor Localization
Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Active Learning with Support Vector Machines
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Semi-supervised protein classification using cluster kernels Jason Weston, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Elisseeff and William Stafford.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Distributed Representations of Sentences and Documents
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
1 Introduction to Transfer Learning (Part 2) For 2012 Dragon Star Lectures Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
Modern Topics in Multivariate Methods for Data Analysis.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Transductive Regression Piloted by Inter-Manifold Relations.
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
H. Lexie Yang1, Dr. Melba M. Crawford2
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Data Mining and Text Mining. The Standard Data Mining process.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
Cross-lingual Dataless Classification for Many Languages
Bridging Domains Using World Wide Knowledge for Transfer Learning
Semi-Supervised Clustering
Cross-lingual Dataless Classification for Many Languages
Cross Domain Distribution Adaptation via Kernel Mapping
Introductory Seminar on Research: Fall 2017
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Word Embedding Word2Vec.
An Interactive Approach to Collectively Resolving URI Coreference
Knowledge Transfer via Multiple Model Local Structure Mapping
Machine Learning – a Probabilistic Perspective
Learning to Detect Human-Object Interactions with Knowledge
Presentation transcript:

Source-Selection-Free Transfer Learning Evan Xiang, Sinno Pan, Weike Pan, Jian Su, Qiang Yang HKUST - IJCAI 2011

Transfer Learning Supervised Learning Lack of labeled training data always happens Transfer Learning When we have some related source domains HKUST - IJCAI 2011

Where are the “right” source data? We may have an extremely large number of choices of potential sources to use. HKUST - IJCAI 2011

Outline of Source-Selection-Free Transfer Learning (SSFTL) Stage 1: Building base models Stage 2: Label Bridging via Laplacian Graph Embedding Stage 3: Mapping the target instance using the base classifiers & the projection matrix Stage 4: Learning a matrix W to directly project the target instance to the latent space Stage 5: Making predictions for the incoming test data using W HKUST - IJCAI 2011

SSFTL – Building base models vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. From the taxonomy of the online information source, we can “Compile” a lot of base classification models HKUST - IJCAI 2011

SSFTL – Label Bridging via Laplacian Graph Embedding Problem Since the label names are usually short and sparse, , in order to uncover the intrinsic relationships between the target and source labels, we turn to some social media such as Delicious, which can help to bridge different label sets together. Neighborhood matrix for label graph However, the label spaces of the based classification models and the target task can be different Bob History Tom Travel M John Finance vs. q Gary Tech q Steve Sports mismatch Laplacian Eigenmap [Belkin & Niyogi,2003] vs. Projection matrix m-dimensional latent space vs. V vs. q vs. vs. m The relationships between labels, e.g., similar or dissimilar, can be represented by the distance between their corresponding prototypes in the latent space, e.g., close to or far away from each other. HKUST - IJCAI 2011

“Ipad2 is released in March, …” SSFTL – Mapping the target instance using the base classifiers & the projection matrix V vs. Target Instance 0.1:0.9 vs. vs. “Ipad2 is released in March, …” For each target instance, we can obtain a combined result on the label space via aggregating the predictions from all the base classifiers 0.6:0.4 0.3:0.7 vs. vs. 0.7:0.3 0.2:0.8 Then we can use the projection matrix V to transform such combined results from the label space to a latent space = <Z1, Z2, Z3, …, Zm> Tech Probability Projection matrix m-dimensional latent space Finance V Travel Sports q History q Label space m However, do we need to recall the base classifiers during the prediction phase? The answer is No! HKUST - IJCAI 2011

Labeled & Unlabeled Data SSFTL – Learning a matrix W to directly project the target instance to the latent space vs. Target Domain Projection matrix vs. vs. V Labeled & Unlabeled Data vs. q vs. m For each target instance, we first aggregate its prediction on the base label space, and then project it onto the latent space Loss on unlabeled data Loss on labeled data Learned Projection matrix W Our regression model d HKUST - IJCAI 2011 m

SSFTL – Making predictions for the incoming test data Target Domain Learned Projection matrix The learned projection matrix W can be used to transform any target instance directly from the feature space to the latent space W Incoming Test Data d m vs. Projection matrix vs. vs. V vs. q Therefore, we can make prediction directly for any incoming test data based on the distance to the label prototypes, without calling the base classification models vs. m HKUST - IJCAI 2011

Experiments - Datasets Building Source Classifiers with Wikipedia 3M articles, 500K categories (mirror of Aug 2009) 50, 000 pairs of categories are sampled for source models Building Label Graph with Delicious 800-day historical tagging log (Jan 2005 ~ March 2007) 50M tagging logs of 200K tags on 5M Web pages Benchmark Target Tasks 20 Newsgroups (190 tasks) Google Snippets (28 tasks) AOL Web queries (126 tasks) AG Reuters corpus (10 tasks) HKUST - IJCAI 2011

SSFTL - Building base classifiers Parallelly using MapReduce Input Map Reduce vs. vs. 1 1 3 … … 2 3 … … vs. vs. 2 1 2 … … If we need to build 50,000 base classifiers, it would take about two days if we run the training process on a single server. Therefore, we distributed the training process to a cluster with 30 cores using MapReduce, and finished the training within two hours. The training data are replicated and assigned to different bins 3 In each bin, the training data are paired for building binary base classifiers These pre-trained source base classifiers are stored and reused for different incoming target tasks. HKUST - IJCAI 2011

Semi-supervised SSFTL Experiments - Results Unsupervised SSFTL Semi-supervised SSFTL Table 1, compared to SVM and TSVM, SSFTL can achieve much better classification accuracy on the target test data. An interesting result is that SSFTL can also achieve satisfiable classification performance without any labeled data, which is much higher than Random Guessing (RG). -Parameter setttings- Source models: 5,000 Unlabeled target data: 100% lambda_2: 0.01 Our regression model HKUST - IJCAI 2011

Experiments - Results In the second experiment, we aim to verify the impact of the number of source classifiers to the overall performance of SSFTL, where we set λ2 = 0.01 and use 20 labeled target data. From Table 2, we can find that, when the number of source classifiers increases, the performance of SSFTL increases in company with the number. When it is equal to or larger than 5, 000 For each target instance, we first aggregate its prediction on the base label space, and then project it onto the latent space -Parameter setttings- Mode: Semi-supervised Labeled target data: 20 Unlabeled target data: 100% lambda_2: 0.01 Loss on unlabeled data Our regression model HKUST - IJCAI 2011

Experiments - Results the third experiment, we further verify the performance of SSFTL when the proportion of unlabeled data involved in learning varies, as shown in Table 3. In this experiment, we use 5, 000 source classifiers, 20 labeled target data and set λ2 = 0.01. The results suggest that the classification performance of SSFTL increases as the amount of unlabeled data grows. -Parameter setttings- Mode: Semi-supervised Labeled target data: 20 Source models: 5,000 lambda_2: 0.01 Our regression model HKUST - IJCAI 2011

Semi-supervised SSFTL Experiments - Results we verify the impact of different values of λ2 on the overall classification performance of SSFTL. The result is shown in Table 4. In this experiment, we use 5, 000 source classifiers and 20 labeled data. As can be seen, the proposed SSFTL performs best and is stable when λ2 falls in the range [0.001, 0.1]. When λ2 = 0, the semisupervised SSFTL method is reduced to a supervised regularized least squares regression (RLSR) model, and when the value of λ2 is large, e.g. λ2 = 100, the result of SSFTL is similar to those of unsupervised SSFTL as shown in Table 1. Supervised SSFTL Semi-supervised SSFTL -Parameter setttings- Labeled target data: 20 Unlabeled target data: 100% Source models: 5,000 Our regression model HKUST - IJCAI 2011

Experiments - Results In the last experiment, we verify the effectiveness of our proposed weighted strategy of auxiliary source classifiers introduced at the end of Section 2. We compare the classification performance of SSFTL using the weighted strategy with that using the uniform weighting strategy. In this experiment, we set λ2 = 0.01, use 5, 000 source classifiers and vary the number of labeled target data. As can be seen from Table 5, SSFTL using the weighted strategy can perform much better than that using the uniform weighting strategy. With this simple weighted strategy, we are able to “filter” unrelated source classifiers and identify useful ones for transfer. Table 5 For each target instance, we first aggregate its prediction on the base label space, and then project it onto the latent space -Parameter setttings- Mode: Semi-supervised Labeled target data: 20 Source models: 5,000 Unlabeled target data: 100% lambda_2: 0.01 Loss on unlabeled data Our regression model HKUST - IJCAI 2011

Related Works HKUST - IJCAI 2011

Conclusion Source-Selection-Free Transfer Learning When the potential auxiliary data is embedded in very large online information sources No need for task-specific source-domain data We compile the label sets into a graph Laplacian for automatic label bridging SSFTL is highly scalable Processing of the online information source can be done offline and reused for different tasks. HKUST - IJCAI 2011

Q & A HKUST - IJCAI 2011