Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Pushpak Bhattacharyya

Similar presentations


Presentation on theme: "Dr. Pushpak Bhattacharyya"— Presentation transcript:

1 Dr. Pushpak Bhattacharyya
Part of Speech Tagging of Indian languages using Hidden Markov Model Ph. D. Seminar Report by Manish Shrivastava Roll no Under the guidance of Dr. Pushpak Bhattacharyya

2 Presentation Outline Part of Speech Tagging Motivation
Existing Taggers Need for Part of Speech Taggers for Indian languages Part of Speech Tagging of Indian languages The Morphological Perspective Morphological Advantages Hidden Markov Model Conclusions Future work

3 Part of Speech Tagging Is the task of assigning POS tags to words
Selecting among more than one tags that apply Can be used for further NLP tasks Information extraction, Question Answering etc.

4 Example of POS tagging

5 Motivation Lack of significant tools for Indian languages
Dependence of other NLP activities on PoS tagging Failure of existing techniques on Indian Languages

6 Existing Taggers Techniques used for foreign languages
Rule Based Tagging Stochastic Tagging

7 Overview of PoS tagging

8 Existing Taggers Rule Based Taggers Stochastic Taggers Brill tagger
CLAWS tagger Tree tagger

9 Need for a new Taggers for Hindi
The existing taggers fail on Indian languages The grammatical structure differs Free word structure of Hindi Stochastic taggers cannot give good performance Morphological Information not taken into account

10 Example of Free word structure

11 Part of Speech tagging of Indian Languages
To make efficient taggers Get morphological information Use heuristics to use morphological information

12 Morphological Perspective
Three kind of word morphologies Verb Noun Adjectives

13 Morphological Perspective
Noun Morphology Depicting possesion laD,ka Possesion laD,ko ka Depicting number laD,ka plural laD,ko

14 Morphological Perspective
Verb Morphology Tense Kola laD,ko Kola rho hO. Kola laDko Kolato qao . Kola laD,ko Kolanaa caahto hOM.

15 Morphological Advantage
POS tag heuristic Noun laD,kaoM Suffix -- oM “ aoM “ sahoilayaaoM Suffix -- iyoN “ [yaaoM “ Verb pZ,U^Mgaa Suffix -- UMgA “ }^Mgaa “ pZ,ta Suffix -- wA “ ta “

16 Morphological Advantages
Morphological strength of Hindi helps in efficient tagging The morphological information can be used for further tasks

17 The Tool : Hidden Markov Model
Why HMM Underlying events generate surface probabilities The models can be trained using Expectation Maximization algorithm. Easy to port to other languages

18 Example of a Hidden Markov Model

19 Hidden Markov Model The Parameters Estimation
i = initial state probabilities aij = state transition probability bij = probability of recognizing kth symbol in transition from i to j Estimation Initial estimation done with training data Re-estimation done using Baum-Welch Re-estimation

20 Conclusions The Part of Speech taggers for Hindi should morphological information To make efficient taggers we must allow use of heuristics Hidden Markov Models can be used for portable taggers.


Download ppt "Dr. Pushpak Bhattacharyya"

Similar presentations


Ads by Google