Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA 94305-4115, USA NLP Applications.

Similar presentations


Presentation on theme: "Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA 94305-4115, USA NLP Applications."— Presentation transcript:

1 Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA 94305-4115, USA email: schuetze@csli.stanford.edu NLP Applications By Masood Ghayoomi Oct 15, 2007

2 Outline of the Talk Introduction Introduction Brief review on the literature Brief review on the literature Presenting a hypothesis Presenting a hypothesis Introducing induction experiments Introducing induction experiments Results Results Conclusions Conclusions Discussions Discussions NLP Applications By Masood Ghayoomi Oct 15, 2007

3 Abstract of the Talk This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. This algorithm categorizes word tokens in context. This algorithm categorizes word tokens in context. NLP Applications By Masood Ghayoomi Oct 15, 2007

4 Introduction Why is it needed? Why is it needed? Increasing on line texts need to use automatic techniques to analyze a text. NLP Applications By Masood Ghayoomi Oct 15, 2007

5 Related Works Stochastic Tagging: Stochastic Tagging: -Bigram or trigram models: require a relatively large tagged training text (Church, 1989; Charniak et al.,1993) -Hidden Markov Models: require no pretagged text (Jelinek, 1985; Cutting et al., 1991; Kupiec, 1992) Rule-based Tagging: Rule-based Tagging: -Transformation-based tagging as introduced by Brill (1993): requires a hand-tagged text for training NLP Applications By Masood Ghayoomi Oct 15, 2007

6 Other Related Works Using connectionist net to predict words by reflecting grammatical categories (Elman, 1990) Using connectionist net to predict words by reflecting grammatical categories (Elman, 1990) Inferring grammatical category from bigram statistics (Brill et al, 1990) Inferring grammatical category from bigram statistics (Brill et al, 1990) Using vector models in which words are clustered according to the similarity of their close neighbors in a corpus (Finch and Chater, 1992; Finch, 1993) Using vector models in which words are clustered according to the similarity of their close neighbors in a corpus (Finch and Chater, 1992; Finch, 1993) Presenting a probabilistic model for entropy maximization that relies on the immediate neighbors of words in a corpus (Kneser and Ney, 1993) Presenting a probabilistic model for entropy maximization that relies on the immediate neighbors of words in a corpus (Kneser and Ney, 1993) Applying factor analysis to collocations of two target words with their immediate neighbors (Biber, 1993) Applying factor analysis to collocations of two target words with their immediate neighbors (Biber, 1993) NLP Applications By Masood Ghayoomi Oct 15, 2007

7 Hypothesis for New Tagging Algorithm The syntactic behavior of a word is represented with respect to its left and right context. The syntactic behavior of a word is represented with respect to its left and right context. Left neighbor  WORD  Right neighbor Left neighbor  WORD  Right neighbor     Left context vector Right context vector Left context vector Right context vector NLP Applications By Masood Ghayoomi Oct 15, 2007

8 4 POS Tag Induction Experiments Based on word type only Based on word type only Based on word type and context Based on word type and context Based on word type and context, restricted to “natural” contexts Based on word type and context, restricted to “natural” contexts Based on word type and context, using generalized left and right context vectors Based on word type and context, using generalized left and right context vectors NLP Applications By Masood Ghayoomi Oct 15, 2007

9 Word Type Only A base line to evaluate the performance of distributional POS taggers A base line to evaluate the performance of distributional POS taggers Words from BNC corpus clustered into 200 classes by considering left and right vector context similarities. All occurrences of a word assigned to one class. Words from BNC corpus clustered into 200 classes by considering left and right vector context similarities. All occurrences of a word assigned to one class. Drawback: Problematic for ambiguous words; e.g. Work, Book Drawback: Problematic for ambiguous words; e.g. Work, Book NLP Applications By Masood Ghayoomi Oct 15, 2007

10 Word Type and Context Dependency of a word’s syntactic role on: - the syntactic properties of its neighbors, - its own potential relationships with the neighbors. Dependency of a word’s syntactic role on: - the syntactic properties of its neighbors, - its own potential relationships with the neighbors. Considering context for distributional tagging: Considering context for distributional tagging: - The right context vector of the preceding word. - The left context vector of w. - The right context vector of w. - The left context vector of the following word. Drawback: fails for words whose neighbors are punctuation marks, since there are no grammatical dependencies between words and punctuation marks, in contrast to strong dependencies between neighboring words. Drawback: fails for words whose neighbors are punctuation marks, since there are no grammatical dependencies between words and punctuation marks, in contrast to strong dependencies between neighboring words. NLP Applications By Masood Ghayoomi Oct 15, 2007

11 Word Type and Context, Restricted to “Natural” Contexts For this drawback only for words with informative contexts were considered. For this drawback only for words with informative contexts were considered. words next to punctuation marks, words with rare words as neighbors (less than ten occurrences) were excluded. words next to punctuation marks, words with rare words as neighbors (less than ten occurrences) were excluded. NLP Applications By Masood Ghayoomi Oct 15, 2007

12 Word Type and Context, Using Generalized Left and Right Context Vectors Generalization: The right context vector makes clear the classes of left context vectors which occur to the right of a word; and vice versa. Generalization: The right context vector makes clear the classes of left context vectors which occur to the right of a word; and vice versa. In this method the information about left and right context vectors of a word is kept separate in the computation. In the previous methods left and right context vectors of a word are always used. In this method the information about left and right context vectors of a word is kept separate in the computation. In the previous methods left and right context vectors of a word are always used. This method is applied in two steps: This method is applied in two steps: - A generalized right context vector for a word is formed by considering the 200 classes - A generalized left context vectors by using word based right context vectors. NLP Applications By Masood Ghayoomi Oct 15, 2007

13 2 Examples “seemed” and “would” have similar left contexts, and they characterize the right contexts of “he” and “the firefighter”. The left contexts are verbs which potentially belong to one syntactic category. “seemed” and “would” have similar left contexts, and they characterize the right contexts of “he” and “the firefighter”. The left contexts are verbs which potentially belong to one syntactic category. Transitive verbs and prepositions belong to different syntactic categories, but their right contexts are identical which they require a noun phrase. Transitive verbs and prepositions belong to different syntactic categories, but their right contexts are identical which they require a noun phrase. NLP Applications By Masood Ghayoomi Oct 15, 2007

14 Results The Penn Treebank parses of the BNC were used. The Penn Treebank parses of the BNC were used. The results of the four experiments are evaluated by forming 16 classes of tags from the Penn Treebank. The results of the four experiments are evaluated by forming 16 classes of tags from the Penn Treebank. ttag ttag frequencythe frequency of t in the corpus frequencythe frequency of t in the corpus # classesthe number of induced tags i0, i1,..., il # classesthe number of induced tags i0, i1,..., il correct the number of times an occurrence of t was correctly labeled as belonging to one of i0, i1,..., il correct the number of times an occurrence of t was correctly labeled as belonging to one of i0, i1,..., il incorrectthe number of times that a token of a different tag t’ was miscategorized as being an instance of i0, i1,..., il incorrectthe number of times that a token of a different tag t’ was miscategorized as being an instance of i0, i1,..., il precisionthe number of correct tokens divided by the sum of correct and incorrect tokens. precisionthe number of correct tokens divided by the sum of correct and incorrect tokens. Recallthe number of correct tokens divided by the total number of tokens of t Recallthe number of correct tokens divided by the total number of tokens of t Fan aggregate score from precision and recall Fan aggregate score from precision and recall NLP Applications By Masood Ghayoomi Oct 15, 2007

15 Result: Word Type Only NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 1: Precision and recall for induction based on word type.

16 Result: Word Type and Context NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 2: Precision and recall for induction based on word type and context.

17 Result: Word Type and Context; Generalized Left and Right Context Vectors NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 3: Precision and recall for induction based on generalized context vectors.

18 Result: Word Type and Context; Restricted to “Natural” Contexts NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 4: Precision and recall for induction for natural contexts.

19 Conclusions Taking context into account improves the performance of distributional tagging, as F score increases: Taking context into account improves the performance of distributional tagging, as F score increases: 0.49 < 0.72 < 0.74 < 0.79 Performance for generalized context vectors is better than for word-based context vectors (0.74 vs. 0.72). Performance for generalized context vectors is better than for word-based context vectors (0.74 vs. 0.72). NLP Applications By Masood Ghayoomi Oct 15, 2007

20 Discussions “Natural” contexts’ performance is better than the other contexts (0.79), even though having low quality of the distributional information about punctuation marks and rare words are a difficulty for this tag induction. “Natural” contexts’ performance is better than the other contexts (0.79), even though having low quality of the distributional information about punctuation marks and rare words are a difficulty for this tag induction. Performing fairly good for typical and frequent contexts: prepositions, determiners, pronouns, conjunctions, the infinitive marker, modals, and the possessive marker Performing fairly good for typical and frequent contexts: prepositions, determiners, pronouns, conjunctions, the infinitive marker, modals, and the possessive marker Failing tag induction for punctuations, rare words, and “-ing” forms of present participles and gerunds which are difficult as both exhibit verbal and nominal properties. Failing tag induction for punctuations, rare words, and “-ing” forms of present participles and gerunds which are difficult as both exhibit verbal and nominal properties. NLP Applications By Masood Ghayoomi Oct 15, 2007

21 Thanks for your listening!


Download ppt "Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA 94305-4115, USA NLP Applications."

Similar presentations


Ads by Google