Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.

Similar presentations


Presentation on theme: "Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University."— Presentation transcript:

1 Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University of Pennsylvania {htd, chingyc, mpalmer, chioufd}@unagi.cis.upenn.edu

2 Overview Maximum entropy WSD feature types English Senseval2 verbs Chinese –Penn Chinese Treebank –People’s Daily News

3 English Senseval2 verbs Primarily Penn Treebank WSJ corpus WordNet 1.7 sense inventory 29 verbs 15.6 senses/verb in corpus baseline (most frequent sense) 40% best system performance 60%

4 Local Collocational Features (English) Collocational features for w –word w –pos of w –pos of words at positions +1, -1 relative to w –words at positions -2, -1, +1, +2 relative to w

5 Local Syntactic Features (English) Syntactic features –whether or not the sentence is passive –whether there is a subject, direct object, indirect object, or clausal complement –the words (if any) in the positions of subject, direct object, indirect object, particle, prepositional complement (and its object)

6 Local Semantic Features (English) Semantic features –a Named Entity tag (PERSON, ORGANIZATION, LOCATION) for proper nouns –WordNet synsets and hypernyms for the nouns

7 Overall Accuracy of System (English) Feature TypeAccuracy Collocation48.3 Collocation + Syntax53.9 Collocation + Syntax + Semantics59.0 Collocation + Topic52.9 Collocation + Syntax + Topic54.2 Collocation + Syntax + Semantics + Topic60.2

8 Data Preparation (Chinese) Penn Chinese Treebank (100K words) CETA (Chinese-English Translation Assistance) Dictionary 28 words (multiple verb senses, possibly other pos) 3.5 senses/word in corpus Baseline (most frequent sense) 77%

9 Local Collocational Features (Chinese) Collocational Features: –word –pos –word-2, word-1, word+1, word+2 –pos-1, pos+1 –followsVerb

10 Local Syntactic Features (Chinese) Syntactic Features: –hassubj –subj –hasobj –obj-p –obj –hasinobj –Comp-VP –VPComp –Comp-IP –hasprd

11 Local Semantic Features (Chinese) Semantic Features (for verbs only): generated by assigning a HowNet noun category to each subject and object –subjsem –objsem

12 Overall Accuracy of Maximum Entropy System (CTB) Feature TypeAccuracyStd Dev Collocation (no pos)86.81.0 Collocation93.40.5 Collocation + Syntax94.40.4 Collocation + Syntax + Semantics94.40.6 Collocation + Topic90.31.0 Collocation + Syntax + Topic92.70.9 Collocation + Syntax + Semantics + Topic92.80.8 Baseline76.7

13 Data Preparation (PDN) People’s Daily News (PDN) –Five words with low accuracy and counts in CTB subsequently sense-tagged in PDN (1M words). –About 200 sentences/word from PDN. –8.2 senses/verb in corpus –Baseline (most frequent sense) 58% –Automatic segmentation, pos-tagging, parsing

14 Overall Accuracy of Maximum Entropy System (PDN) Feature TypeAccuracyStd Dev Collocation (no pos)72.32.2 Collocation70.32.9 Collocation + Syntax71.73.9 Collocation + Syntax + Semantics71.74.2 Collocation + Topic73.33.2 Collocation + Syntax + Topic72.62.9 Collocation + Syntax + Semantics + Topic73.03.4 Baseline57.6

15 Conclusion Types of features that are important for English and Chinese are different. –Parse information is useful for English WSD. –Lexical collocational information may be sufficient for Chinese. Chinese word sense disambiguation addressed at segmentation level


Download ppt "Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University."

Similar presentations


Ads by Google