Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya

Goal Lexical Analysis Part-Of-Speech (POS) Tagging : Assigning part-of-speech to each word. e.g. Noun, Verb... Syntactic Analysis Chunking : Identify and label phrases as verb phrase, noun phrase etc. Language : Hindi Approach : MEMM

Outline Maximum Entropy Markov Model (MEMM) Principle Mathematical formulation System overview Parameter estimation and classification POS tagging features Chunking features Results and error analysis Future work Conclusion

Maximum Entropy Markov Model Maximum entropy principle The least biased model which considers all known information is the one which maximizes entropy. Entropy

Maximum Entropy Markov Model Mathematical formulation... The distribution with the maximum entropy is equivalent to \

System overview Parameter estimation and classification GIS (Generalized Iterative Scaling) finds the model parameters that define the maximum entropy classifier for a given feature set and training corpus Beam Search heuristic search algorithm, optimization of best-first search unfolds the first m most promising nodes at each depth

What are features? Feature function : Indicator function which captures useful facts of the modelling task For example,

POS tagging features Context-based POS tag of previous word Current word Word-dependent Suffixes Digits Special characters English words

POS tagging features Dictionary-based Possible tags for the word, according to the dictionary Corpus-driven Occurrence of a word and its tag(s) according to the training data

Chunking features Context based features Word itself (conditionally) POS tag Chunk label of previous word Current POS tag based feature Tag class

Experimental Setup 26 POS tags 6 chunk labels 75 - 25 split of training and test data Result averaged over 10 data sets

Results POS tagging accuracy Best : 89.346 % Average : 88.4 % Chunk labelling accuracy (per word basis) Best : 87.399 % Average : 86.45 %

Accuracy across runs

Error Analysis : POS tagging Good performance for : VAUX, VFM, VNN Postpositions Need to improve : Compound tags Proper nouns

Error Analysis : Chunking Good performance for : Noun phrase Need to improve : Verb phrase

Future Work Morphological Features Enriching dictionary Hybrid models

References 1. Adwait Ratnaparakhi. 1996. A maximum entropy model for part-of-speech tagging. In Erich Brill and Kenneth Church, editors, Proceedings of the Conference on Empirical Methods in NLP, pages 133-142. ACL. Somerset, New Jersey. 2. Adwait Ratnaparakhi. 1997. A simple introduction to maximum entropy models for natural language processing. Technical report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.

References 3. Adam L. Berger, Vincent J. Della Pietra, Stephen A. Della Pietra, 1996.A maximum entropy approach to natural language processing, Computational Linguistics, v.22 n.1, p.39-71. 4. Akshay Singh, Sushma Bendre, and Rajeev Sangal. 2005. HMM based chunker for hindi. In Proceedings of IJCNLP- 05. Jeju Island, Republic of Korea.

References 5. J. N. Darroch, D. Ratcliff, 1972. Generalized Iterative Scaling for Log-Linear Models, The Annals of Mathematical Statistics.

Thank you! Questions ?

Example Ram/PN aur/CC Sita/PN Shaadi/N karne/GRND ja/VM rahen/VAUX hain/VAUX

Beam Search Ram N:0.3 CC:0.005 PN:0.4 CC:0.2 CC:0.15CC:0.25 INJ:0.10 VA:0.05 Aur

Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Similar presentations

Presentation on theme: "Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Similar presentations

Presentation on theme: "Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya."— Presentation transcript:

Similar presentations

About project

Feedback