Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Part-Of-Speech Tagging and Chunking using CRF & TBL
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven.
Some Advances in Transformation-Based Part of Speech Tagging
A New Approach for HMM Based Chunking for Hindi Ashish Tiwari Arnab Sinha Under the guidance of Dr. Sudeshna Sarkar Department of Computer Science and.
Survey of Semantic Annotation Platforms
Graphical models for part of speech tagging
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
Albert Gatt Corpora and Statistical Methods Lecture 10.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
A Language Independent Method for Question Classification COLING 2004.
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages Dan Garrette, Jason Mielens, and Jason Baldridge Proceedings of ACL 2013.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Revisiting Output Coding for Sequential Supervised Learning Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University.
Hybrid Method for Tagging Arabic Text Written By: Yamina Tlili-Guiassa University Badji Mokhtar Annaba, Algeria Presented By: Ahmed Bukhamsin.
POS Tagger and Chunker for Tamil
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
PoS tagging and Chunking with HMM and CRF
Conditional Markov Models: MaxEnt Tagging and MEMMs
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Part of Speech Tagging in Context month day, year Alex Cheng Ling 575 Winter 08 Michele Banko, Robert Moore.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
A CRF-BASED NAMED ENTITY RECOGNITION SYSTEM FOR TURKISH Information Extraction Project Reyyan Yeniterzi.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006
Presentation transcript:

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat Department of Computer Science & Engineering Indian Institute of Technology Kharagpur

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Goal  Lexical Analysis Part-Of-Speech (POS) Tagging : Assigning part-of-speech to each word. e.g. Noun, Verb...  Syntactic Analysis Chunking: Identify and label phrases as verb phrase and noun phrase etc.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Machine Learning to Resolve POS Tagging and Chunking  HMM Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.) Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.)  Maximum Entropy (Ratnaparkhi,96; etc.)  TB(ED)L (Brill,92,94,95; etc.)  Decision Tree (Black,92; Marquez,97; etc.)

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Our Approach  Maximum Entropy based Diverse and overlapping features Language Independence Reasonably good accuracy Data intensive Absence of sequence information

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging Schema Language Model Disambiguation Algorithm Raw text Tagged text Possible POS Class Restriction … POS tagging

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach ME Model Disambiguation Algorithm Raw text Tagged text Possible POS Class Restriction … POS tagging ME Model: Current state depends on history (features)

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach ME Model Disambiguation Algorithm Raw text Tagged text Possible POS Class Restriction … POS tagging ME Model: Current state depends on history (features)

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Learning ME Model  GIS (Generalized Iterative Scaling) Finds the model parameters that define the maximum entropy classifier for a given feature set and training corpus  The parameters of the ME model are estimated using an off-the-shelf toolkit ( )

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach ME Model Disambiguation Algorithm Raw text Tagged text … POS tagging t i  {T} or t i  T MA (w i ) {T} : Set of all tags T MA (w i ) : Set of tags computed by Morphological Analyzer

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach ME Model Beam Search Raw text Tagged text … POS tagging t i  {T} or t i  T MA (w i ) {T} : Set of all tags T MA (w i ) : Set of tags computed by Morphological Analyzer

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Disambiguation Algorithm Text: Tags: Where, t i  {T},  w i {T} = Set of tags

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Disambiguation Algorithm Text: Tags: Where, t i  T MA (w i ),  w i {T} = Set of tags

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur What are Features?  Feature function Binary function of the history and target Example,

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging Features W1 W2 W3 W4 T2 T3 T4 T5 T6 T7 i-3 W1T1 i-2 i-1 i i+1 i+2 i+3 T4 Estimated Tag Feature Set  40 different experiments were conducted taking several combination from set ‘F’ pos word POS_Tag

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging Features Estimated Tag Feature Set ConditionFeatures Static features for all words Current word(w i ) Previous word (w i-1 ) Next word (w i+1 ) |prefix| ≤ 4 |suffix| ≤ 4 Dynamic Features for all words POS tag of previous word (t i-1 ) W3 W4 T3 T4 T5 T6 T7 i-3 W1 T1 i-2 i-1 i i+1 i+2 i+3 W6 W7 W2 T2 pos word POS_Tag

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Chunking Features T2 T3 T4 T5 T6 C3 C4 C5 C6 C7 -3 W1 T1 C1 W2 W3 T C2 W5 W6 W7 W4 Estimated Tag Feature Set Static features for all words Current word (w i ) POS tag of the current word (t i ) POS tags of previous two words (t i-1 and t i-2 ) POS tags of next two words (t i+1 and t i+2 ) Dynamic Features for all words Chunk tags of previous two word (C i-1 and C i-2 )

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Chunking Features T2 T3 T4 T5 T6 C2 C3 C4 C5 C6 C7 i-3 W1 T1 C1 W2 W3 T7 i-2 i-1 i i+1 i+2 i+3 W5 W6 W7 W4 Estimated Tag Feature Set Static features for all words Current word (w i ) POS tag of the current word (t i ) POS tags of previous two words (t i-1 and t i-2 ) POS tags of next two words (t i+1 and t i+2 ) Dynamic Features for all words Chunk tags of previous two words (C i-1 and C i-2 ) pos word POS_Tag Chunk_Tag

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Experiments: POS tagging  Baseline Model  Maximum Entropy Model ME (Bengali, Hindi and Telugu) ME + IMA ( Bengali) ME + CMA (Bengali)  Data Used LanguageBengaliHindiTelugu Training data20,39621,47021,416 Development data5,0235,6816,098 Test data5,2264,9245,193 No. of POS tags2725 No. of Chunk labels676

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Tagset and Corpus Ambiguity  Tagset consists of 27 grammatical classes  Corpus Ambiguity Mean number of possible tags for each word Measured in the training tagged data LanguageDutchGermanEnglishFrenchBengaliHindiTelugu Corpus Ambiguity Accuracy96%97%96.5%94.5%??? Unknown Words 13%9%11%5%33%21%56% (Dermatas et al 1995)

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging Results on Development Set Overall Accuracy LanguageBengaliHindiTelugu Corpus Ambiguity Accuracy79.74%83.10%67.12% Unknown Words 33%21%56%

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging Results on Development Set Known Words Unknown Words Overall Accuracy

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging Results - Bengali

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Results on Development set MethodBengaliHindiTelugu Baseline ME (89.3, 60.5) (90.9,53.7) ( ) ME + IMA (84.2, 82.1) -- ME + CMA (89.3, 86.2) --

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Chunking Results  Two different measures Per word basis Per chunk basis  Correctly identified groups along with correctly labeled groups Evaluation Criteria MethodBengaliHindiTelugu Per word basis ME + I_POS Per chunk basis ME + I_POS87.3, , ,56.7 ME + C_POS93.3, ,74.4-

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Assessment of Error Types Predicted Class Actual Class % of total error % of class error NNNNC NNJJ NNNNP VFMVRB NNPNNPC Predicted Class Actual Class % of total error % of class error NNNNP NNJJ NNNNC JJNN VFMVAUX Bengali Hindi Predicted Class Actual Class % of total error % of class error NNJJ NNNNP PREPNLOC NNRB Telugu

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Results on Test Set  Bengali data has been tagged using ME+IMA model  Hindi and Telugu data has been tagged with simple ME model Language Number of Words POS Tagging Accuracy Chunking Accuracy Bengali Hindi Telugu  Chunk Accuracy has been measured per word basis

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Conclusion and Future Scope  Morphological restriction on tags gives an efficient tagging model even when small labeled text is available  The performance of Hindi and Telugu can be improved using the morphological analyzer of the languages  Linguistic prefix and suffix information can be adopted  More features can be explored for chunking

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Thank You