Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

Part-Of-Speech Tagging and Chunking using CRF & TBL

Hidden Markov Model Jianfeng Tang Old Dominion University 03/03/2004.

Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

John Lafferty, Andrew McCallum, Fernando Pereira

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.

Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.

Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.

Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.

Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.

Albert Gatt Corpora and Statistical Methods Lecture 8.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

FSA and HMM LING 572 Fei Xia 1/5/06.

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Conditional Random Fields

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

Sequence labeling and beam search LING 572 Fei Xia 2/15/07.

LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.

Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Albert Gatt Corpora and Statistical Methods Lecture 9.

1 Sequence Labeling Raymond J. Mooney University of Texas at Austin.

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

Graphical models for part of speech tagging

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,

Albert Gatt Corpora and Statistical Methods Lecture 10.

인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Tokenization & POS-Tagging

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

CSA3202 Human Language Technology HMMs for POS Tagging.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

Hybrid Method for Tagging Arabic Text Written By: Yamina Tlili-Guiassa University Badji Mokhtar Annaba, Algeria Presented By: Ahmed Bukhamsin.

Dongfang Xu School of Information

John Lafferty Andrew McCallum Fernando Pereira

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

PoS tagging and Chunking with HMM and CRF

Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

Part of Speech Tagging in Context month day, year Alex Cheng Ling 575 Winter 08 Michele Banko, Robert Moore.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

Language Identification and Part-of-Speech Tagging

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

CSC 594 Topics in AI – Natural Language Processing

CSCI 5832 Natural Language Processing

CONTEXT DEPENDENT CLASSIFICATION

CPSC 503 Computational Linguistics

Presentation transcript:

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad

Introduction POS-tagging is the process of marking up of words with their corresponding part of speech. It is not as simple as having a list of words and their part of speech, because some words have more than one tag. This problem is commonly found as huge numbers of word-forms are ambiguous.

Introduction (Cont.) Building a POS-Tagger for Telugu becomes complicated as Telugu words can be freely formed by agglutinating morphemes. This increases the no of distinct words. Ex: tinu (eat) tinAli (have to eat) tintunnADu (eating) {he} tinAlianukuntunnADu (wants to eat) ……. tinu is common in all the words. We can observe that by the combination of the morphemes new words are formed.

Introduction (cont.) This co-occurrence is common for verbs (multiple words joining to form a single verb). Because of this the number of distinct words increases which in turn decreases the accuracy.

Types of Taggers Taggers can be characterized as rule-based and statistical based. Rule-based taggers use hand-written rules to distinguish the tag ambiguity. Stochastic based taggers uses the probabilities of occurrences of words for a particular tag. Since Indian languages are morphologically rich in nature, developing rule based taggers is a cumbersome process. But, stochastic taggers require large amount of annotated data to train upon.

Models of Statistical Taggers We have tried out four different models of statistical taggers. 1. Hidden Markov Model 2. Conditional Random Fields 3. Maximum Entropy Model 4. Memory Based Learning

Hidden Markov Model The Hidden Markov Model (HMM) is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution. To implement HMM’s we need three kinds of probabilities. Initial state Probabilities State Transition Probabilities P(t 2 /t 1 ) Emission Probabilities P(w/t) Statistical methods of Markov source or the hidden Markov modeling have become increasingly popular in the last several years.

Experiments using HMM’s HMM is implemented using Tri-ngram-Tagger (TnT). Experiment 1 :( Trigram) This experiment uses the previous two tags. i.e. P(t 2 /t 1 t 0 ) is used to calculate the best tag sequence. t 2 : Current tag t 1 : Previous tag t 0 : Previous 2 nd tag Correct Tags : 81.59% Incorrect Tags : 18.41%

Experiments using HMM’s (cont...) Experiment 2 :( Bi-gram) This experiment uses the tag of the previous tag. i.e. P ( t 2 / t 1 ) is used to calculate the best tag sequence. t 2 : Current tag t 1 : Previous tag Correct Tags : 82.32% Incorrect Tags : 17.68%

Experiments using HMM’s (cont...) Experiment 3: (Bi-gram with 1/100th approximation) By 1/100 th approximation it means that it outputs alternate tag if the probability is 1/100 th of the best tag Correct Tags : 82.47% Incorrect Tags : 17.53%

Conditional Random Fields A conditional random field (CRF) is a framework of probabilistic model to segment and label sequence data. A conditional model specifies the probabilities of possible label sequences given an observation sequence. The need to segment and label sequences arises in many different problems in several scientific fields.

Experiments using CRF’s Conditional random field is implemented using “CRF tool kit” Experiment 1: Features: Unigram: Current word, Previous 2nd word, Previous word, Next word, Next 2 nd word. Bi-gram: The Previous 2 nd word and the Previous word, Previous word and the Current word, and Current word and the next word. Correct Tags : 75.11% Incorrect Tags: 24.89%

Experiments using CRF’s (cont..) Experiment 2: Features: Unigram: Current word, Previous 2 nd word, Previous word, Next word, Next 2 nd word, and their corresponding first 4 and last 4 letters. Bi-gram: The Previous 2 nd word and the Previous word, Previous word and the current word. Correct Tags : 80.26% Incorrect Tags: 19.74%

Experiments using CRF’s (cont..) Experiment 3: Features: Unigram: Current word, Previous 2nd word, Previous word, and their corresponding first 3 and last 3 letters. Bi-gram: The Previous 2 nd word and the Previous word, Previous word and the current word. Correct Tags : 80.55% Incorrect Tags: 19.45%

Maximum Entropy Model The principle of maximum entropy (Ratnaparakhi, 1999) states that when one has only partial information about the probabilities of possible outcomes of an experiment, one should choose the probabilities so as to maximize the uncertainty about the missing information. In other way, since entropy is a measure of randomness, one should choose the most random distribution subject to whatever constraints are imposed on the problem.

Experiments using MEMM’s Maximum Entropy Markov Model is implemented using “Maxent”. Experiment 1: Features: (Unigram) Current word, Previous 2nd word, Previous word, Previous 2nd words Tag, Previous words Tag, Next word, Next 2nd word, first and last 4 letters of the Current word. Correct Tags : 80.3% Incorrect Tags : 19.69%

Experiments using MEMM’s (cont..) Experiment 2: Features: Current word, Previous 2nd word with its tag, Previous word with its tag, Previous 2nd words tag with Previous words tag, suffixes and prefixes for rare words (with frequency less than 2). Correct Tags : 80.09% Incorrect Tags: 19.9%

Experiments using MEMM’s (cont..) Experiment 3: Features: Current word, Previous 2nd word with its tag, Previous word with its tag, Previous 2nd words tag with previous words tag, Suffixes and prefixes for all words. Correct Tags : 82.27% Incorrect Tags : 17.72%

Memory Based Learning Memory-based tagging is based on the idea that words occurring in similar contexts will have the same POS tag. MBL consists of two components: A learning component which is memory-based A performance component which is similarity-based. The learning component of MBL is memory-based as it involves adding training examples to memory. In the performance of an MBL system, the product of the learning component is used as a basis for mapping input to output.

Experiments using MBL’s Memory based learning is implemented using Timbl. Experiment 1: Using IB1 Algorithm Correct Tags : 75.39% Incorrect Tags: 24.61% Experiment 2: Using IGTREE Algorithm Correct Tags : 75.75% Incorrect Tags : 24.25%

Overall results ModelResult (%) HMM82.47 MEMM82.27 CRF80.55 MBL75.75 The results for the Telugu data was low compared to other languages due to the less availability of annotated data(27336).

Observations On telugu corpus: Error Analysis: Actual tagAssigned tagCounts NNJJ87 VFMNN34 PRPNN31 VRBVFM25 JJNN23 NNPNN18 NLOCPREP10 VJJJJ14 QFJJ7 RPRB7

Observations On Telugu corpus(cond.) In the Telugu corpus of words 9801 distinct words are found. If we see the count of number of words with low frequency, say with the frequency of 1, we find 7143 words. This is due to the morphological richness in the language.

Observations On Telugu corpus(cond.) Word-frequency table (most frequent words) 322A 280I 199oVka 189lo 183Ayana 178ani 89kUdA

Conclusion The accuracy of the Telugu POS Tagging seemed to be low when compared to other Indian Languages due to agglutinative nature of the language. One could explore using Morphological analyzer by splitting verb part and the morphemes to minimize the distinct words.

Thank you