Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Dataless Hierarchical Text Classification

Similar presentations


Presentation on theme: "On Dataless Hierarchical Text Classification"— Presentation transcript:

1 On Dataless Hierarchical Text Classification
Yangqiu Song and Dan Roth, Computer Science Department, University of Illinois at Urbana-Champaign Text Representation Classification Procedure Results Document Classification Label Representation 20newsgroups (Pure Dataless) Explicit Semantic Analysis (ESA) Label On Feb. 8, Dong Nguyen announced that he would be removing his hit game Flappy Bird from both the iOS and Android app stores, saying that the success of the game is something he never wanted. Some fans of the game took it personally, replying that they would either kill Nguyen or kill themselves if he followed through with his decision. Pick a label: Class or Class2 ? Gabrilovich and Markovitch. AAAI, 2006; IJCAI, 2007. Word Representation Similarity Barack Obama Timeline of the presidency of Barack Obama (2009) Family of Barack Obama Barack Obama citizenship conspiracy theories Barack Obama presidential primary campaign 2008 Document Representation Document Method Representation Feature Bag-of-words Sparse Vector Words’ TFIDF in non-zero elements ESA Wikipedia concept weights in non-zero elements Brown Cluster Cluster ID from root to path Word Embedding Dense Vector Compact real values Word Brown Clusters Brown et. al, Computational Linguistics, 1992. X-axis ESA: Number of concepts BC: Number of depth in word hierarchy WE: Number of dimensions of word embedding Mobile Game or Sports? Labels carry a lot of information! Unsupervised baseline OHLDA (Ha-Thuc, V., and Renders, J.-M Large-scale hierarchical text classification without labeled data. In WSDM) Dataless Classification Hierarchical Classification Dataless definition No labeled data for training Depends on understanding the labels The importance of semantic representation A news article: Mobile Game or Sports? names of players, teams, activities of a match without mentioning the word sport New representation should be used Top-down classification Neural Network Word Embedding Supervised baseline Mikolov et. al. NIPS and HLTNAACL Collobert et. al. JMLR (Senna) Turian et. al, ACL, 2010. Bottom-up classification (Flat classification) Mobile Game or Sports? Bootstrapping Conclusions Examples from: Dataless + semi-supervised learning It is possible to classify document into multiple (hierarchical) categories Systematic comparison Different representations Bottom-up vs. top-down Bootstrapping Possibility of classification in a larger labels space Chang, M.; Ratinov, L.; Roth, D.; and Srikumar, V. Importance of semantic represenation: Dataless classification. In AAAI Elhoseiny, M.; Saleh, B.; and A.Elgammal. Write a classifier: Zero shot learning using purely textual descriptions. In ICCV, 2013. Initialize N documents for each label Pure dataless classifications Train a classifier to label N more documents for each label Continue until no unlabeled document This work is supported by the Army Research Laboratory (ARL) under agreement W911NF , by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155, and by DARPA under agreement number FA


Download ppt "On Dataless Hierarchical Text Classification"

Similar presentations


Ads by Google