Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and.

Similar presentations


Presentation on theme: "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and."— Presentation transcript:

1 Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and Graeme Hirst Department of Computer Science University of Toronto

2 2 Overview  The purpose of the paper is to compare the performance of several measures of semantic relatedness that have been proposed for use in NLP applications.  Three kinds of approaches to the evaluation of measures of similarity or semantic distance:  The first kind is theoretical examination of a given measure for properties though desirable;  The second approach is comparison with human judgments;  The third approach is to evaluate the measures with respect to their performance within a particular NLP application.

3 3 Network-based measures of semantic distance  Hirst-St-Onge: two lexicalized concepts are semantically close if their WordNet synsets are connected by a path that is not too long and that “does not change direction too often”:  Leacock-Chodorow: also rely on the length len(c 1, c 2 ) of the shortest path between two synsets but they limit their attention to IS-A links and scale the path length by the overall depth D of the taxonomy:

4 4 Network-based measures of semantic distance  Resnik: defined the similarity between two concepts lexicalized in WordNet to be the information content of their most specific common subsumer lso(c 1, c 2 ):  Jiang-Conrath: also uses information content but in the form of conditional probability of encountering an instance of a child-synset given an instance of a parent-synset.  Lin:

5 5 Comparison with human ratings of similarity  Rubenstein and Goodenough: 65 pairs of words ranged from “highly synonymous” to “semantically unrelated”. 51 subjects were asked to rate them on a scale of 0.0 to 4.0.  Miller and Charles: extracted 30 pairs from the original 65 (10 from high level = 3-4, 10 from intermediate level = 1-3 and 10 from low level 0-1.

6 6 An application-based evaluation of measures of relatedness  Evaluate the measures with respect to their performance within a particular NLP application – detection and correction of real world spelling errors in open-class words, that is, malapropisms.  Malapropism detection was viewed as a retrieval task and evaluated in terms of precision, recall and F- measure and is divided in two stages:  For the first stage, a word is suspected of being a malapropism (and the word is a suspect) if it is judged to be unrelated to other words nearby; the word is a true suspect if it is indeed a malapropism.  At the second stage, an alarm is raised when a spelling variation of a suspect is judged to be related to a nearby word; and if an alarm word is a malapropism then the alarm is a true alarm and the malapropism has been detected.

7 7 Malapropism detection  Method:  500 articles from Wall Street Journal corpus  remove proper nouns and stop-list words  replace one word in every 200 with a spelling variation  For each measure use four different search scopes:  scope 1 – just the paragraph containing the target word  scope 3 and 5 – the paragraph plus one or two adjacent paragraphs on each side  scope MAX – the entire article

8 8 Suspicion

9 9 Detection


Download ppt "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and."

Similar presentations


Ads by Google