Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.

Similar presentations


Presentation on theme: "Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute."— Presentation transcript:

1 Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute for Artificial Intelligence, * School Of Computer Science, Carnegie Mellon University, † Department of Computer Science & Software Engineering, Seattle University Motivation Datasets Method: OptDAC Exploratory EM Experimental Results Acknowledgements : This work is supported in part by Google PhD fellowship in Information Extraction, and NSF grant No. IIS1250956-NSFCOHEN. Conclusions  In an entity classification task, topic or concept hierarchies are often incomplete. This can lead to semantic drift of known classes or topics.  Our previous work on Exploratory Learning (Dalvi et al. ECML 2013) extends the semi-supervised EM algorithm by dynamically adding new classes when appropriate. In this paper, we present Exploratory learning techniques for hierarchical semi- supervised learning tasks.  We focus on entity classification task where each entity is represented by either text context or table co-occurrence features. Given a few seed examples per Knowledge Base(KB) category, the task is to classify unlabeled entities into KB categories.  KB categories are arranged in an ontology. There are subset and disjointness constraints defined between these classes. Further, the class hierarchy can be incomplete.  Our proposed method (OptDAC) can learn new examples of existing classes, as well as extend the class hierarchy in a single unified framework. Optimal Label Assignment given Class Constraints  In this paper, we propose the Hierarchical Exploratory EM approach that can take an incomplete class ontology as input, along with a few seed examples of each class, to populate new instances of seeded classes and extend the ontology with newly discovered classes.  Our proposed hierarchical exploratory EM method, named OptDAC- ExploreEM performs better than flat classification and hierarchical semi- supervised EM methods at all levels of hierarchy, especially as we go further down the hierarchy.  Experiments show that OptDAC-ExploreEM outperforms its semi- supervised variant on average by 13% in terms of seed class F1 scores. It also outperforms both previously proposed exploratory learning approaches FLAT-ExploreEM and DAC-ExploreEM in terms of seed class F1on average by 10% and 7% respectively.  In the future, we would like to apply our method on datasets with non- tree structured class hierarchies. Comparison: macro averaged seeded-class F1 OptDAC reduces semantic drift of seeded classes. DatasetStatistics #Entities#Features# (Entity, label) pairs Text-Small2.5K3.4M7.2K Text-Medium12.9K6.7M42.2K Table-Small4.3K0.96M12.2K Table-Medium33.4K2.2M126.K StatisticOntology SmallMedium #Classes34 #levels in the hierarchy1139 #classes per level1, 3, 71, 4, 24, 10 Subset constraint Mutex Constraint Mutex constraint Penalty Score of label assignment Subset constraint Penalty Evaluation of extended class hierarchies OptDAC with varying amount of training data DatasetAvg. Runtime in sec. Avg. runtime in multiple of Flat Semi- supervised EM FLAT OptDAC Semi- supervised EM Exploratory EM Semi- supervised EM Exploratory EM Text-Small53.58717 Table-Small50.731021 Text-Medium524.751125 Table-Medium5932.44710 Runtime of Flat vs. OptDAC method on different datasets Text-Small Table-Small This dataset is made publicly available at http://rtw.ml.cmu.edu/wk/WebSets/hierarc hical_ExploratoryLearning_WSDM2016/ index.html http://rtw.ml.cmu.edu/wk/WebSets/hierarc hical_ExploratoryLearning_WSDM2016/ index.html When New Classes Are Created? 1 2 34 5 6 7 8 10 11 9 C new Near uniform? Test: Best assignment using the mixed integer program should pick C new Level = 2 3 4 Small Ontology Medium Ontology  An example Text pattern feature for entity “Pittsburgh” is (“lives in ARG”, 1000), indicating that the entity Pittsburgh appeared in position ARG of the text context “live in ARG” for 1000 times in the sentences from Clueweb09 dataset.  An example Table context feature for entity “Pittsburgh” is (“clueweb09-en0011-94-04::2:1”, 1) indicates that the entity “Pittsburgh” appeared once in HTML table 2, column 1 from ClueWeb09 document id “clueweb09-en0011-94-04”. denotes statistically significant improvements (0.05 significance level) w.r.t. FLAT ExloreEM


Download ppt "Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute."

Similar presentations


Ads by Google