Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer.

Slides:



Advertisements
Similar presentations
Collectively Representing Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen and Jamie Callan Language Technologies Institute Carnegie Mellon.
Advertisements

Unsupervised Learning
AUTOMATIC GLOSS FINDING for a Knowledge Base using Ontological Constraints Bhavana Dalvi (PhD Student, LTI) Work done with: Prof. William Cohen, CMU Prof.
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
K Means Clustering , Nearest Cluster and Gaussian Mixture
CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
CS728 Web Clustering II Lecture 14. K-Means Assumes documents are real-valued vectors. Clusters based on centroids (aka the center of gravity or mean)
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Very Fast Similarity Queries on Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen Language Technologies Institute, Carnegie Mellon University.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Active Learning for Class Imbalance Problem
Text Classification, Active/Interactive learning.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Collectively Representing Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen and Jamie Callan Language Technologies Institute, Carnegie.
Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Text Clustering.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
EXPLORATORY LEARNING Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 CS 391L: Machine Learning Clustering Raymond J. Mooney University of Texas at Austin.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Active, Semi-Supervised Learning for Textual Information Access Anastasia Krithara¹, Cyril Goutte², Massih-Reza Amini³, Jean-Michel Renders¹ Massih-Reza.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Data Mining and Text Mining. The Standard Data Mining process.
Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
Machine Learning Lecture 9: Clustering
Classification of unlabeled data:
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Three steps are separately conducted
EM Algorithm and its Applications
Introduction to Machine learning
Presentation transcript:

Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer Science, Carnegie Mellon University Motivation Extending existing SSL methods The Exploratory EM Algorithm Experimental Results Acknowledgements : This work is supported by Google and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA C Conclusions  Multi-class semi-supervised learning: The number of natural classes present in the data might not be known.  There may be no labeled data for some of the classes.  Exploratory Learning extends the semi-supervised EM algorithm by dynamically add new classes when appropriate.  Thus it uses the existing knowledge in the form of seeds, and discovers clusters belonging to unknown classes.  Example of Semantic Drift seeds: (Country: USA, Japan, India…) (State: CA, PA, MN etc.) Unlabeled data contains instances of Country, State, City, Museums etc. Model seeded with ``States’’ might end up collecting ``Cities’’ or even other kind of locations. Comparison to Chinese Restaurant Process  We investigate and improve the robustness of SSL methods in a setting in which seeds are available for only a subset of the classes.  Our proposed approach, called Exploratory EM, introduces new classes on-the-fly during learning, based on the intuition that hard-to-classify examples, specifically, examples with a nearly- uniform posterior class distribution, are in new classes.  We showed that this approach outperforms standard Semi- supervised EM approaches on three different publicly available datasets.  We also showed performance improvements over a Gibbs sampling baseline that uses the Chinese Restaurant Process (CRP) to induce new clusters.  In the future, we plan on extending this technique to multi-label, hierarchical and multi-view classification problems.  Initialize the model with a few seeds per class  Iterate till convergence (Data likelihood and number of classes)  E Step: Predict labels for unlabeled points For i = 1 : n If P(C j | X i ) is nearly-uniform for a data-point X i, j = 1 to k Create a new class C k+1, assign X i to it else Assign X i to Argmax C j {P(C j | X i )}  M step: Re-compute model parameters using seeds and predicted labels for unlabeled data-points. Number of classes might increase in each iteration.  Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1’ ModelSemi-supervised version Exploratory version Naïve Bayes Multinomial Model label(X i ) = Argmax (C j |X i ) C j =1..k if (P(C j | X i ) is nearly uniform) label(X i ) = C k+1 Else label(Xi) = Argmax P(C j |X i ) C j =1..k Seeded K-Means Features: L1 normalized TFIDF vectors Similarity: Dot Product (centroid, data-point) Assign Xi to closest centroid C j If (X i is nearly equidistant from all centroids) Create new cluster C k+1 and put X i in it Else Assign X i to closest centroid Seeded von Mises- Fisher Distribution of data on the unit hypersphere. label(X i ) = Argmax P(C j |X i ) C j =1..k Extension similar to Naïve Bayes based on near-uniformity of P (C j | X i ) Comparison: macro averaged seeded-class F1 When New Classes Are Created  For each data-point X i, we compute posterior distribution P(C j | X i ) of Xi belonging to any of the existing classes C 1 … C k  Criterion 1 : MinMax maxP = max(P(C j | X i )), minP = min(P(C j | X i )) if (maxP / minP < 2)  Create a new class/cluster  Criterion 2 : JS uniP = uniform distribution over k classes = {1/k, 1/k, ….1/k} jsDiv = JD-Divergence(uniP, P(C j |X i ) if (jsDiv < 1/k)  Create a new class/cluster Model Selection Criterion  We tried BIC, AIC, and AICc criteria, and AICc worked best BIC(g) = -2 * L(g) + v * ln(n) AIC(g) = -2 * L(g) + 2 * v AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1) Here g: model being evaluated, L(g): log-likelihood of data given g, v: number of free parameters of the model, n: number of data-points.  Initialize the model using seed data  for (epoch in 1 to numEpochs) { for (item in unlabeled data) { Decrement data counts for item and label[epoch-1,item] Sample a label from P(label | item) Create a new class using CRP Increment data counts for item, register label[epoch, item] } Semantic drift Hypothesis: Dynamically inducing clusters of data-points that do not belong to any of the seeded classes will reduce the semantic drift on seeded classes. Hypothesis: If P(Cj | Xi) is nearly uniform then Xi does not belong to any of the existing classes, hence a new class/cluster needs to be created. Exploratory EM improves seed class F1 (over Semi-supervised EM) on all three publicly available datasets. Exploratory EM discovers unseeded clusters and improves seed class F1 Varying #seed classes and #seeds per class As the number of seed classes or the number of seeds per class increases, both methods improve. ExploratoryEM is beneficial especially when amount of supervision is small. Objective using AICc 20-Newsgroups Best case performance of improved baseline Proposed Method Baseline Delicious_Sports 20-Newsgroups Exploratory EM is better than Gibbs+CRP in terms of Seed class F1 Run-time #classes produced No parameter tuning