Presentation is loading. Please wait.

Presentation is loading. Please wait.

Word sense induction using continuous vector space models

Similar presentations


Presentation on theme: "Word sense induction using continuous vector space models"— Presentation transcript:

1 Word sense induction using continuous vector space models
Mikael Kågebäck, Fredrik Johansson, Richard Johansson*, Devdatt Dubhashi LAB, Chalmers University of Technology *Språkbanken, University of Gothenburg

2 Word Sense Induction (WSI)
Automatic discovery of word senses. Given a corpus discover senses of a given word, e.g. rock

3 Applications of WSI Novel sense detection
Temporal/Geographical word sense drift Localized word sense lexicons Machine translation Text understanding more…

4 Context clustering Compute embeddings for word instances in a corpus, based on their context. Cluster the space. Let the centroids represent the senses. Pioneered by Hinrich schütze (1998). Assumption: Distributional hypothesis valid.

5 Instance-context Embeddings (ICE)
Based on word embeddings computed using the skip-gram model. Low rank approximate factorization of a normalized co-occurrence matrix C. Context word embeddings in V and word embeddings in U.

6 Instance-context Embeddings (ICE)
Let the mean skip-gram vector representing the context form the Instance vector but: Apply a triangular window function Weight each context word using Naturally removes stop words Related to the PMI, Goldberg et al (2014).

7 Plotted instances for ‘paper’
ICE Mean vector Plotted using t-sne

8 Proposed algorithm Train skip gram model on the corpus.
Compute instance representations using ICE. One for each instance of a word in the corpus. Cluster using (nonparametric) k-means. Cluster evaluation from Pham et al. (2005). (Evaluation) disambiguate test data using obtained cluster centroids.

9 SemEval 2013 task 13 WSI: Identify senses in ukWaC.
WSD: Disambiguate test words To one of the induced senses. Evaluation :Compare to the annotated WordNet labels.

10 Detailed results Semeval 2013 – task 13

11 Detailed results Semeval 2013 – task 13

12 Detailed results Semeval 2013 – task 13

13 Conclusions Using skip-gram word embeddings clearly boost the performance of WSI. Semantic representation for word. Tell which context words are most important.

14 ICE profile

15 Evaluation SemEval 2013 - task 13 ukWaC
50 lemmas and 100 instances per lemma. Annotated with a WordNet senses.


Download ppt "Word sense induction using continuous vector space models"

Similar presentations


Ads by Google