Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman.

Slides:

Advertisements

Similar presentations

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University July 28, 2011 Beijing, EOS, SIGIR 2011 NYU.

Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.

Viterbi Algorithm Ralph Grishman G Natural Language Processing.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Named Entity Recognition Stephan Lesch Maschinelle Lernverfahren für Informationsextraktion und Text Mining.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.

Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,

NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.

Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.

Graphical models for part of speech tagging

Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,

Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.

Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)

Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Image Compression – Fundamentals and Lossless Compression Techniques

1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

NYU: Description of the Proteus/PET System as Used for MUC-7 ST Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Conditional Markov Models: MaxEnt Tagging and MEMMs William W. Cohen CALD.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Viterbi Algorithm CSCI-GA.2590 – Natural Language Processing Ralph Grishman NYU.

Yuya Akita , Tatsuya Kawahara

Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

Conditional Markov Models: MaxEnt Tagging and MEMMs

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Grade Eight – Algebra I - Unit 12 Linear Equations and Their Graphs

Writing Analytics Clayton Clemens Vive Kumar.

مادة الدرس : مقدمة في علم الإحصاء

Machine Learning Interpretability

LECTURE 23: INFORMATION THEORY REVIEW

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao

Content Name Entity Recognition (NER) Maximum Entropy (ME) System Architecture Results Conclusions

Name Entity Recognition (NER) Give a tokenization of a test corpus and a set of n (n=7) tags, NER is the problem of how to assigning one of (4n+1) tags to each token. – x_begin, x_continue, x_end, x_unique MUC-7: – Proper names (people, organizations, locations) – expressions of time – quantities – monetary values – percentages

Name Entity Recognition (NER) Jim bought 300 shares o Acme Corp. in per_unnique other qua_unique other org_begin org-end other time_unique other Jim bought 300 shares of Acme Corp. in 2006.

Maximum Entropy (ME) Statistical modeling technique Estimate probability distribution based on partial knowledge Principle: correct probability distribution maximizes entropy (uncertainty) based on what is known

Maximum Entropy (ME) ---build ME model

Maximum Entropy (ME) --- Initialize Features

Maximum Entropy (ME) --- ME Estimation

Maximum Entropy (ME) --- Generalized Interactive Scaling

System Architecture --- Features(1) Feature set – Binary: similar to BBN’s Nymble/Identification system – Lexical: all tokens with a count of 3 or more – Section: date, preamble, text… – Dictionary: name list – External system: futures in other systems become histories – Compound: external system : section feature

System Architecture --- Features(2) Feature selection – Features which activate on default value of a history view.(99% cases are not names) – Lexicons which predict the future ”other” less than 6 times instead of 3 – Features which predict “other” at position token -2 and tokens 2

System Architecture --- Decoding and Viterbi Search Viterbi Search: dynamic programming – Find the highest probability legal path through the lattice of conditional probabilities – Example: Mike England person_start(0.66) gpe_unique(0.6) p(g_u/p_s) = 0 person-start(0.66) person_end(0.3) p(p_e/p_s) =0.7

Result(1)

Result(2) Probable reasons: – Dynamic updating of vocabulary during decoding.( reference resolution) Andrew Borthwick – Binary model VS multi-class model.

Conclusion Future work: – Incorporating long-range reference resolution – Use general compound features – Use Acronyms Advantage of MENE: – Can incorporate previous token’s information – Features can be overlap – Highly portable – Easy to be combined with other systems