Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman.

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University July 28, 2011 Beijing, EOS, SIGIR 2011 NYU.
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Viterbi Algorithm Ralph Grishman G Natural Language Processing.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Named Entity Recognition Stephan Lesch Maschinelle Lernverfahren für Informationsextraktion und Text Mining.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Graphical models for part of speech tagging
Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Image Compression – Fundamentals and Lossless Compression Techniques
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
NYU: Description of the Proteus/PET System as Used for MUC-7 ST Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Conditional Markov Models: MaxEnt Tagging and MEMMs William W. Cohen CALD.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Viterbi Algorithm CSCI-GA.2590 – Natural Language Processing Ralph Grishman NYU.
Yuya Akita , Tatsuya Kawahara
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Grade Eight – Algebra I - Unit 12 Linear Equations and Their Graphs
Writing Analytics Clayton Clemens Vive Kumar.
مادة الدرس : مقدمة في علم الإحصاء
Machine Learning Interpretability
LECTURE 23: INFORMATION THEORY REVIEW
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao

Content Name Entity Recognition (NER) Maximum Entropy (ME) System Architecture Results Conclusions

Name Entity Recognition (NER) Give a tokenization of a test corpus and a set of n (n=7) tags, NER is the problem of how to assigning one of (4n+1) tags to each token. – x_begin, x_continue, x_end, x_unique MUC-7: – Proper names (people, organizations, locations) – expressions of time – quantities – monetary values – percentages

Name Entity Recognition (NER) Jim bought 300 shares o Acme Corp. in per_unnique other qua_unique other org_begin org-end other time_unique other Jim bought 300 shares of Acme Corp. in 2006.

Maximum Entropy (ME) Statistical modeling technique Estimate probability distribution based on partial knowledge Principle: correct probability distribution maximizes entropy (uncertainty) based on what is known

Maximum Entropy (ME) ---build ME model

Maximum Entropy (ME) --- Initialize Features

Maximum Entropy (ME) --- ME Estimation

Maximum Entropy (ME) --- Generalized Interactive Scaling

System Architecture --- Features(1) Feature set – Binary: similar to BBN’s Nymble/Identification system – Lexical: all tokens with a count of 3 or more – Section: date, preamble, text… – Dictionary: name list – External system: futures in other systems become histories – Compound: external system : section feature

System Architecture --- Features(2) Feature selection – Features which activate on default value of a history view.(99% cases are not names) – Lexicons which predict the future ”other” less than 6 times instead of 3 – Features which predict “other” at position token -2 and tokens 2

System Architecture --- Decoding and Viterbi Search Viterbi Search: dynamic programming – Find the highest probability legal path through the lattice of conditional probabilities – Example: Mike England person_start(0.66) gpe_unique(0.6) p(g_u/p_s) = 0 person-start(0.66) person_end(0.3) p(p_e/p_s) =0.7

Result(1)

Result(2) Probable reasons: – Dynamic updating of vocabulary during decoding.( reference resolution) Andrew Borthwick – Binary model VS multi-class model.

Conclusion Future work: – Incorporating long-range reference resolution – Use general compound features – Use Acronyms Advantage of MENE: – Can incorporate previous token’s information – Features can be overlap – Highly portable – Easy to be combined with other systems