A Statistical Model for Multilingual Entity Detection and Tracking R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, S.

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Fast Algorithms For Hierarchical Range Histogram Constructions

A Corpus for Cross- Document Co-Reference D. Day 1, J. Hitzeman 1, M. Wick 2, K. Crouch 1 and M. Poesio 3 1 The MITRE Corporation 2 University of Massachusetts,

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

Introduction to Machine Learning Approach Lecture 5.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Graphical models for part of speech tagging

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.

Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Universit at Dortmund, LS VIII

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

A Language Independent Method for Question Classification COLING 2004.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.

Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.

MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran – Dikkala Sai Nishanth – Ashwin P. Paranjape

Arabic Word Segmentation for Better Unit of Analysis Yassine Benajiba 1 and Imed Zitouni 2 1 CCLS, Columbia University 2 IBM T.J. Watson Research Center.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Unsupervised Classification

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Language Identification and Part-of-Speech Tagging

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Simone Paolo Ponzetto University of Heidelberg Massimo Poesio

Clustering Algorithms for Noun Phrase Coreference Resolution

Statistical NLP: Lecture 9

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

A Statistical Model for Multilingual Entity Detection and Tracking R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, S. Roukos I.B.M. T.J. Watson Research Center Yorktown Heights, NY HLT-NAACL 2004

2/37 Abstract Entity Detection and Tracking task (EDT) Name Entity Recognition (NER) Coreference resolution tasks A statistical language-independent framework for identifying and tracking named, nominal and pronominal references to entities, and chaining them into clusters corresponding to each logical entity present in the text Use arbitrary feature: lexical, syntactic and semantic Mention detection model The novel entity tracking model Experiments run in Arabic, Chinese and English texts

3/37 Introduction (1/3) We will instead adopt the nomenclature of the Automatic Content Extraction program1 (NIST, 2003a): Mention: which can be either named (e.g. John Mayor), nominal (e.g. the president) or pronominal (e.g. she, it) Entity: consists of all the mentions (of any level) which refer to one conceptual entity EX: President John Smith said he has no comments. Two mentions: John Smith and he But one entity, formed by the set {John Smith, he}

4/37 Introduction (2/3) Models for Mention detection: Maximum Entropy (MaxEnt) Robust Risk Minimization (RRM) Model for predicting whether a mention should or should not be linked to an existing entity and how to build entity chains A novel MaxEnt-based Model

5/37 Introduction (3/3) The framework is languageuniversal Most of the feature types are shared across the languages, but there are a small number of useful feature types which are languagespecific, especially for the mention detection task.

6/37 Mention Detection Assigning to each token in the text a label, indicating whether it starts a specific mention, is inside a specific mention, or is outside any mentions. (IOB2) Integrating many feature types, we are interested in algorithms that can easily integrate and make effective use of diverse input types. a linear classifier: the Robust Risk Minimization classifier a log-linear classifier: the Maximum Entropy classifier

7/37 Notations The set of predicted classes: The example space Feature space: Each example has associated a vector of binary features We also assume the existence of a training data set and a test set

8/37 Robust Risk Minimization Algorithm (1/2) n linear classifiers (one for each predicted class) each predicting whether the current example belongs to the class or not. Every such classifier has an associated feature weight vector,, which is learned during the training phase At test time, for each example, the model computes a score:

9/37 Robust Risk Minimization Algorithm (2/2) Labels the example with either the class corresponding to the classifier with the highest score, if above 0, or outside, otherwise.

10/37 Maximum Entropy Algorithm (1/2) MaxEnt algorithm has an associated set of weights, which are estimated during the training phase so as to maximize the likelihood of the data. Given these weights, the model computes the probability distribution of a particular example : (Z is a normalization factor)

11/37 Maximum Entropy Algorithm (2/2)

12/37 The Combination Hypothesis In addition to using rich lexical, syntactic, and semantic features, we leveraged several pre- existing mention taggers. Our hypothesis – the combination hypothesis – is that combining pre-existing classifiers from diverse sources will boost performance by injecting complementary information into the mention detection models. Hence, we used the output of these pre-existing taggers and used them as additional feature streams for the mention detection models.

13/37 Language-Independent Features tokens in a window of 5: the part-of-speech associated with token dictionary information (whether the current token is part of a large collection of dictionaries – one boolean value for each dictionary) the output of named mention detectors trained on different style of entities. the previously assigned classification tags.

14/37 Arabic Feature(1/4) Words are composed of zero or more prefixes, followed by a stem and zero or more suffixes. Each prefix, stem or suffix will be called a token any contiguous sequence of tokens can represent a mention. EX: the word “ trwmAn ” (Truman) could be segmented in 3 tokens: trwmAn -> t + rwm + An EX( ambiguity) : tmnEh t(label: pronominal) + mnE + h(label: pronominal)

15/37 Arabic Feature(2/4) Pragmatically, we found segmenting Arabic text to be a necessary and beneficial process due mainly to two facts: 1. some prefixes/suffixes can receive a different mention type than the stem they are glued to (for instance, in the case of pronouns); 2. keeping words together results in significant data sparseness, because of the inflected nature of the language.

16/37 Arabic Feature(3/4) Given these observations, we decided to “ condition ” the output of the system on the segmented data: the text is first segmented into tokens, and the classification is then performed on tokens. The segmentation model is similar to the one presented by Lee et al. (2003), and obtains an accuracy of about 98%.

17/37 Arabic Feature(4/4) For the language-general features, the Arabic system implements a feature specifying for each token its original stem. But the gazetteer feature(12000 person names, 3000 location and country names, collected by few man-hours web browsing) are computed on words, not on tokens. RRM model predicting 48 mention categories, HMM model predicting 32 mention categories.

18/37 Chinese Mention Detection Character-based model Segmentation information is still useful and is integrated as an additional feature stream. Gazetteers include dictionaries of 10k person names, 8k location and country names, and 3k organization names, compiled from annotated corpora. There are four additional classifiers whose output is used as features: a RRM model which outputs 32 named categories a RRM model identifying 49 categories a RRM model identifying 45 mention categories a RRM model that classifies whether a character is an English character, a numeral or other.

19/37 English Mention Detection Shallow parsing information associated with the tokens in window of 3; Prefixes/suffixes of length up to 4; A capitalization/word-type flag; Gazetteer information: a handful of location (55k entries) person names (30k) and organizations (5k) dictionaries; A combination of gazetteer, POS and capitalization information, obtained as follows: if the word is a closed-class word — select its class else if it ’ s in a dictionary — select that class otherwise back-off to its capitalization information; we call this feature gap; WordNet information (the synsets and hypernyms of the two most frequent senses of the word); The outputs of three systems (HMM, RRM and MaxEnt) trained on a 32- category named entity data, the output of an RRM system trained on the MUC-6 data, and the output of RRM model identifying 49 categories

20/37

21/37 Entity Tracking Algorithm (1/6) n mentions in a document The map from mention index i to entity index j. For a mention index k(1<=k<=n) Define the set of indices of the partially- established entities to the left of Note that

22/37 Entity Tracking Algorithm (2/6) The set of the partially-established entities Given that E k has been formed to the left of the active metion m k, m k can take two possible actions: If, m k link with the entity e g(k) Otherwise it starts a new entity e g(k)

23/37 Entity Tracking Algorithm (3/6) At training time, the action is known to us, and at testing time, both hypotheses will be kept during search. A binary model is used to compute the link probability L is 1 iff m k links with e t ; A is the index of the partial entity to which m k is linking

24/37 Entity Tracking Algorithm (4/6) Since starting a new entity means that m k dose not link with any entities in E k, the probability of starting a new entity

25/37 Entity Tracking Algorithm (5/6)

26/37 Entity Tracking Algorithm (6/6)

27/37 Entity Tracking Features String match Context Mention count Distance Editing distance Mention information Acronym Syntactic features Another category of features is created by taking conjunction of the atomic features.

28/37 Experimental Data (1/3) The data used in all experiments is provided by the Linguistic Data Consortium and is distributed by NIST to all participants in the ACE evaluation. The training data for the English system consists of the training data from both the 2002 evaluation and the 2003 evaluation While for Arabic and Chinese, new additions to the ACE task in 2003, consists of 80% of the provided training data.

29/37 Experimental Data (2/3)

30/37 Experimental Data (3/3) The data is annotated with five types of entities: person, organization, geo-political entity, location, facility Each mention can be either named, nominal or pronominal, and can be either generic (not referring to a clearly described entity) or specific. The models for all three languages are built as joint models, simultaneously predicting the type, level and genericity of a mention – basically each mention is labeled with a 3-pronged tag. To transform the problem into a classification task, we use the IOB2 classification scheme.

31/37 The ACE Value It estimates the normalized weighted cost of detection of specific-only entities in terms of misses, false alarms and substitution errors The normalized cost is subtracted from to obtain the ACE value; a value of 100% corresponds to perfect entity detection. A system can obtain a negative score if it proposed too many incorrect entities Also present results by using F-measure

32/37 EDT Results (1/5) To better assert the contribution of the different types of features to the final performance, we have grouped them into 4 categories: 1. Surface features: lexical features that can be derived from investigating the words: words, morphs, prefix/suffix, capitalization/word-form flags 2. Features derived from processing the data with NLP techniques: POS tags, text chunks, word segmentation, etc. 3. Gazetteer/dictionary features 4. Features obtained by running other named-entity classifiers (with different tag sets): HMM, MaxEnt and RRM output on the 32-category, 49-category and MUC data sets.

33/37

34/37 EDT Results (3/5) Table 4 presents the results obtained by running the entity tracking algorithm on true mentions. It is interesting to compare the entity tracking results with inter-annotator agreements. The interannotator agreement (computed as ACE-values) between annotators are 73.5%, 87.3%, 87.8% for Arabic, Chinese and English, respectively. The system performance is very close to human performance on this task.

35/37 EDT Results (4/5) Presents the results obtained by running both mention detection follow by entity tracking on the ACE ’ 03 evaluation data.

36/37 Performance of the English mention detection system on different sets of features (uniformly penalized F-measure), September ’ 02 data. The lower part of each box describes the particular combination of feature types; the arrows show a inclusion relationship between the feature sets. Figure 1

37/37 Conclusion A language-independent framework for the entity detection and tracking task. Three radically different languages: Arabic, Chinese and English. The task is separated into two sub-tasks: A mention detection part, which is modeled through a named entity-like approach An entity tracking part, for a which a novel modeling approach is proposed This statistical framework is general and can incorporate heterogeneous feature types Additional feature types help improve the performance significantly, especially in terms of ACE value.