Word sense induction using continuous vector space models

Name: Word sense induction using continuous vector space models
Uploaded: 2017-10-18T21:42:26+00:00
Duration: PTM3S19
Channel: Blanche Jenkins
Description: Word sense induction using continuous vector space models

Word sense induction using continuous vector space models
Mikael Kågebäck, Fredrik Johansson, Richard Johansson*, Devdatt Dubhashi LAB, Chalmers University of Technology *Språkbanken, University of Gothenburg

Word Sense Induction (WSI)
Automatic discovery of word senses. Given a corpus discover senses of a given word, e.g. rock

Applications of WSI Novel sense detection
Temporal/Geographical word sense drift Localized word sense lexicons Machine translation Text understanding more…

Context clustering Compute embeddings for word instances in a corpus, based on their context. Cluster the space. Let the centroids represent the senses. Pioneered by Hinrich schütze (1998). Assumption: Distributional hypothesis valid.

Instance-context Embeddings (ICE)
Based on word embeddings computed using the skip-gram model. Low rank approximate factorization of a normalized co-occurrence matrix C. Context word embeddings in V and word embeddings in U.

Instance-context Embeddings (ICE)
Let the mean skip-gram vector representing the context form the Instance vector but: Apply a triangular window function Weight each context word using Naturally removes stop words Related to the PMI, Goldberg et al (2014).

Plotted instances for ‘paper’
ICE Mean vector Plotted using t-sne

Proposed algorithm Train skip gram model on the corpus.
Compute instance representations using ICE. One for each instance of a word in the corpus. Cluster using (nonparametric) k-means. Cluster evaluation from Pham et al. (2005). (Evaluation) disambiguate test data using obtained cluster centroids.

SemEval 2013 task 13 WSI: Identify senses in ukWaC.
WSD: Disambiguate test words To one of the induced senses. Evaluation :Compare to the annotated WordNet labels.

Detailed results Semeval 2013 – task 13

Conclusions Using skip-gram word embeddings clearly boost the performance of WSI. Semantic representation for word. Tell which context words are most important.

ICE profile

Evaluation SemEval 2013 - task 13 ukWaC
50 lemmas and 100 instances per lemma. Annotated with a WordNet senses.

Word sense induction using continuous vector space models

Similar presentations

Presentation on theme: "Word sense induction using continuous vector space models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Word sense induction using continuous vector space models

Similar presentations

Presentation on theme: "Word sense induction using continuous vector space models"— Presentation transcript:

Similar presentations

About project

Feedback