Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer.

Similar presentations


Presentation on theme: "Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer."— Presentation transcript:

1 Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer Science, Carnegie Mellon University Motivation Extending existing SSL methods The Exploratory EM Algorithm Experimental Results Acknowledgements : This work is supported by Google and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA8650-10-C-7058. Conclusions  Multi-class semi-supervised learning: The number of natural classes present in the data might not be known.  There may be no labeled data for some of the classes.  Exploratory Learning extends the semi-supervised EM algorithm by dynamically add new classes when appropriate.  Thus it uses the existing knowledge in the form of seeds, and discovers clusters belonging to unknown classes.  Example of Semantic Drift seeds: (Country: USA, Japan, India…) (State: CA, PA, MN etc.) Unlabeled data contains instances of Country, State, City, Museums etc. Model seeded with ``States’’ might end up collecting ``Cities’’ or even other kind of locations. Comparison to Chinese Restaurant Process  We investigate and improve the robustness of SSL methods in a setting in which seeds are available for only a subset of the classes.  Our proposed approach, called Exploratory EM, introduces new classes on-the-fly during learning, based on the intuition that hard-to-classify examples, specifically, examples with a nearly- uniform posterior class distribution, are in new classes.  We showed that this approach outperforms standard Semi- supervised EM approaches on three different publicly available datasets.  We also showed performance improvements over a Gibbs sampling baseline that uses the Chinese Restaurant Process (CRP) to induce new clusters.  In the future, we plan on extending this technique to multi-label, hierarchical and multi-view classification problems.  Initialize the model with a few seeds per class  Iterate till convergence (Data likelihood and number of classes)  E Step: Predict labels for unlabeled points For i = 1 : n If P(C j | X i ) is nearly-uniform for a data-point X i, j = 1 to k Create a new class C k+1, assign X i to it else Assign X i to Argmax C j {P(C j | X i )}  M step: Re-compute model parameters using seeds and predicted labels for unlabeled data-points. Number of classes might increase in each iteration.  Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1’ ModelSemi-supervised version Exploratory version Naïve Bayes Multinomial Model label(X i ) = Argmax (C j |X i ) C j =1..k if (P(C j | X i ) is nearly uniform) label(X i ) = C k+1 Else label(Xi) = Argmax P(C j |X i ) C j =1..k Seeded K-Means Features: L1 normalized TFIDF vectors Similarity: Dot Product (centroid, data-point) Assign Xi to closest centroid C j If (X i is nearly equidistant from all centroids) Create new cluster C k+1 and put X i in it Else Assign X i to closest centroid Seeded von Mises- Fisher Distribution of data on the unit hypersphere. label(X i ) = Argmax P(C j |X i ) C j =1..k Extension similar to Naïve Bayes based on near-uniformity of P (C j | X i ) Comparison: macro averaged seeded-class F1 When New Classes Are Created  For each data-point X i, we compute posterior distribution P(C j | X i ) of Xi belonging to any of the existing classes C 1 … C k  Criterion 1 : MinMax maxP = max(P(C j | X i )), minP = min(P(C j | X i )) if (maxP / minP < 2)  Create a new class/cluster  Criterion 2 : JS uniP = uniform distribution over k classes = {1/k, 1/k, ….1/k} jsDiv = JD-Divergence(uniP, P(C j |X i ) if (jsDiv < 1/k)  Create a new class/cluster Model Selection Criterion  We tried BIC, AIC, and AICc criteria, and AICc worked best BIC(g) = -2 * L(g) + v * ln(n) AIC(g) = -2 * L(g) + 2 * v AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1) Here g: model being evaluated, L(g): log-likelihood of data given g, v: number of free parameters of the model, n: number of data-points.  Initialize the model using seed data  for (epoch in 1 to numEpochs) { for (item in unlabeled data) { Decrement data counts for item and label[epoch-1,item] Sample a label from P(label | item) Create a new class using CRP Increment data counts for item, register label[epoch, item] } Semantic drift Hypothesis: Dynamically inducing clusters of data-points that do not belong to any of the seeded classes will reduce the semantic drift on seeded classes. Hypothesis: If P(Cj | Xi) is nearly uniform then Xi does not belong to any of the existing classes, hence a new class/cluster needs to be created. Exploratory EM improves seed class F1 (over Semi-supervised EM) on all three publicly available datasets. Exploratory EM discovers unseeded clusters and improves seed class F1 Varying #seed classes and #seeds per class As the number of seed classes or the number of seeds per class increases, both methods improve. ExploratoryEM is beneficial especially when amount of supervision is small. Objective using AICc 20-Newsgroups Best case performance of improved baseline Proposed Method Baseline Delicious_Sports 20-Newsgroups Exploratory EM is better than Gibbs+CRP in terms of Seed class F1 Run-time #classes produced No parameter tuning


Download ppt "Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer."

Similar presentations


Ads by Google