Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,

Slides:

Advertisements

Similar presentations

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

The Assembly Language Level

SVM—Support Vector Machines

Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.

Optimizing Statistical Information Extraction Programs Over Evolving Text Fei Chen Xixuan (Aaron) Feng Christopher Ré Min Wang.

Xyleme A Dynamic Warehouse for XML Data of the Web.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Presented by Zeehasham Rasheed

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

Radial Basis Function Networks

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

by B. Zadrozny and C. Elkan

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Presenter: Shanshan Lu 03/04/2010

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Unsupervised Sparse Vector Densification for Short Text Similarity

Unsupervised Learning

An Empirical Study of Learning to Rank for Entity Search

Natural Language Processing (NLP)

GLOW- Global and Local Algorithms for Disambiguation to Wikipedia

Revision (Part II) Ke Chen

Topic Oriented Semi-supervised Document Clustering

Revision (Part II) Ke Chen

Introduction Task: extracting relational facts from text

Family History Technology Workshop

Leverage Consensus Partition for Domain-Specific Entity Coreference

Natural Language Processing (NLP)

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Automatic Handwriting Generation

Unsupervised Learning

Natural Language Processing (NLP)

Presentation transcript:

Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations, organizations and other entities in text. Demo: Download: How to represent NE’s? There are at least 5 representation schemas that vary in the way they mark the beginning, the end and single-token entities. The letters used are B(egin), E(nd), L(ast), I(nside), U(nit), and O(ther). The more states we use, the more expressive the model is, but it may require more training data to converge. Choice of Inference algorithm Given that NER is a sequential problem, Viterbi is the most popular inference algorithm in NER. It computes the most likely label assignment in a single-order HMM/CRF in time that is quadratic in the number of states and linear in the input size. When non- local features are used, exact dynamic programming with Viterbi becomes intractable, and Beam search is usually used. We show that greedy left-to-right decoding (beam search of size one) performs comparably to Viterbi and Beam search, while being much faster. The reason is that in NER, short NE-chunks are separated by many O-labeled tokens, which “break” the likelihood maximization problem to short independent chunks, where classifiers with local features perform well. The non-local dependencies in NER have a very long range, which the second-order state transition features fail to model. With the label set: Per/Org/Loc/Misc/O, and the BILOU encoding, there are 17 states (B- Per, I-Per, U-Per, etc.), and with second order Viterbi, we need to make 17 3 decisions in each step, in contrast to only 17 necessary in greedy decoding. How to model non-local information? Common intuition: multiple occurrences of the same token should be assigned the same labels. We have compared three approaches proposed in the literature, and found that no single approach consistently outperformed the rest on a collection of 5 datasets. However, the combination of the approaches was the most stable and generally performed the best. Context Aggregation: collect the contexts for all instances of a token, extract features from all contexts and use them for all instances. For example, all the instances of FIFA in the sample text, will be added context aggregation features extracted from: Two-stage Prediction Aggregation: the model tags the text in the first round of inference. Then, for each token, we mark the most frequent label it assigned among the multiple occurrences, and if there were named entities found in the document which contained the token, we mark this information too. We use these statistics as features in a second round of inference. Extended Prediction History: the previous two techniques treat all the tokens in the document identically. However, it is often the case that the easiest named entities to identify occur at the beginning of the document. When using beam search or greedy decoding, we collect, for each token, statistic on how often it was previously assigned each of the labels. How to inject knowledge? We investigated two knowledge resources: word class models extracted from unlabeled text and extracting Gazetteers from Wikipedia. The approaches are independent and the contribution can be accumulated. Word class models hierarchically cluster words based on distribution-similarity-like measure. Two clusters are merged if they tend to appear in similar contexts. For example, clustering Friday and Monday together, allows us to abstract both as day of the week and helps dealing with the data sparsity problem. A sample binary clustering is given below: Deciding which level of abstraction to use is challenging; however, similarly to Hoffman encoding, each word can be assigned a path from the root to the leaf. For example, the encoding of IBM is 011. By taking substrings of different lengths, we can consider different levels of abstraction. Gazetteers: We use a collection of 14 high-precision, low-recall lists extracted from the web that cover common names, countries, monetary units, temporal expressions, etc. While these gazetteers have excellent accuracy, they do not provide sufficient coverage. To further improve the coverage, we have extracted 16 gazetteers from Wikipedia, which collectively contain over 1.5M entities. Wikipedia is an open, collaborative encyclopedia with several attractive properties. (1) It is kept updated manually by its collaborators, hence new entities are constantly added to it. (2) It contains redirection pages, mapping several variations of spelling of the same name to one canonical entry. For example, Suker is redirected to an entry about Davor Šuker, the Croatian footballer. (3) The entries in Wikipedia are manually tagged with categories. For example, the entry Microsoft in Wikipedia has the following categories: Companies listed on NASDAQ; Cloud computing vendors; etc. We used the category structure in Wikipedia to group the titles and the corresponding redirects into Higher-level concepts, such as: locations, organizations, artworks, etc. …B-PERI-PERB-ORG …ReggieBlinker Feyenoord A typical solution is modeled as a structured problem: HMM/CRF: BIO1BIO2IOE1IOE2BILOU retainOOOOO theOOOOO GolanI-locB-locI-locB-loc HeightsI-loc E-loc L-loc IsraelB-loc I-locE-locU-loc capturedOOOOO fromOOOOO SyriaI-locB-locI-locE-locU-loc Key result: BILOU performs the best overall on 5 datasets, and better than BIO2, which is most commonly used for NER. suspension lifted by two games after FIFA on Friday and Slapped a worldwide Main Results: Conclusions We have presented a conditional model for NER that uses averaged perceptron (implementation: Learning Based Java (LBJ): ) with expressive features to achieve new state of the art performance on the Named Entity recognition task. We explored four fundamental design decisions: text chunks representation, inference algorithm, non-local features and external knowledge. Supported by NSF grant SoD-HCER , by MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC and by the National Library of Congress project NDIIPP