Effective Entity Recognition and Typing by Relation Phrase-Based Clustering 20151116.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Mining di Dati Web Web Community Mining and Web log Mining : Commody Cluster based execution Romeo Zitarosa.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Aki Hecht Seminar in Databases (236826) January 2009
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Relation Extraction for Academic Collaboration Project Proposal Justin Betteridge, Matthew Bilotti, Simon Fung, Sophie Wang Jan 26, 2006.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Bo Lin Kevin Dela Rosa Rushin Shah.  As part of our research, we are working on a cross- document co-reference resolution system  Co-reference Resolution:
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
NELL Knowledge Base of Verbs
Measuring Monolinguality
Clustering of Web pages
A Brief Introduction to Distant Supervision
Entity- & Topic-Based Information Ordering
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Reading Report: Open QA Systems
Probabilistic Data Management
KDD 2004: Adversarial Classification
Aspect-based sentiment analysis
Summarizing Entities: A Survey Report
Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona
CS7280: Special Topics in Data Mining Information/Social Networks
Social Knowledge Mining
Statistical NLP: Lecture 9
School of Library and Information Science
Graph Database Mining and Its Applications
Data Integration for Relational Web
Adaptive entity resolution with human computation
Introduction Task: extracting relational facts from text
Text Mining & Natural Language Processing
Measuring Complexity of Web Pages Using Gate
Text Mining & Natural Language Processing
Enriching Taxonomies With Functional Domain Knowledge
CMU Y2 Rosetta GnG Distillation
CS565: Intelligent Systems and Interfaces
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Entity Linking Survey
Data Pre-processing Lecture Notes for Chapter 2
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
“Traditional” image segmentation
Topic: Semantic Text Mining
Statistical NLP : Lecture 9 Word Sense Disambiguation
Connecting the Dots Between News Article
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
KnowItAll and TextRunner
Presentation transcript:

Effective Entity Recognition and Typing by Relation Phrase-Based Clustering 20151116

Content Motivation Definition Problem model ClusType Algorithm Experiments

Motivation Fine-Grained type information is useful for downstream applications (e.g, it improved the F1 score by 93% for a relation extraction system[1]) Traditional named entity recognition systems are designed for several major types (e.g., person, organization, location) and general domains (e.g., news), require additional steps for adaptation to a new domain and new types. Entity linking techniques suffer coverage and freshness (e.g., over 50% entities mentioned in Web documents are unlinkable [2]) Previous methods have difficulties in handling entity mentions with sparse context. there are often many ways to describe even the same relation between two entities (e.g., “beat” and “won the game 34-28 over”)

Definition

Problem model Based on several hypotheses Hypothesis 1: Entity-Relation Co-occurrences If surface name c often appears as the left (right) argument of relation phrase p, then c's type indicator tends to be similar to the corresponding type indicator in p's type signature. Hypothesis 2: Mention correlation If there exists a strong correlation (i.e., within sentence, common neighbor mentions) between two candidate mentions that share the same name, then their type indicators tend to be similar.

Problem model Hypothesis 3: Type signature consistency. If two relation phrases have similar cluster memberships, the type indicators of their left and right arguments (type signature) tend to be similar, respectively. Hypothesis 4: Relation phrase similarity. Two relation phrases tend to have similar cluster memberships, if they have similar (1) strings; (2) context words; and (3) left and right argument type indicators.

H1 H3 H4 H2

ClusType Algorithm Framework Overview 1. Perform phrase mining on a POS-tagged corpus to extract candidate entity mentions and relation phrases, and construct a heterogeneous graph G 2. Collect seed entity mentions ML as labels by linking extracted candidate mentions M to the KB Ψ. 3. Estimate type indicator y for unlinkable candidate mention m ∈MU with G using clustering-integrated type propagation.

Candidate Generation[4] 1. mining frequent contiguous patterns up to a fixed length 2. using a greedy agglomerative algorithm to generate longer phrases and terminates when the next highest-score merging does not meet a pre-defined significance threshold.

Construction of Graph G Name-Relation Phrase Subgraph

Construction of Graph G Mention Correlation Subgraph Mention-Name Subgraph Washington <-> 76_Washington

Clustering-integrated Type Propagation 1. Seed Mention Generation utilize a entity name disambiguation tool (http://spotlight.dbpedia.org/) and only keep entity mapped with high confidence scores (η > 0.8) 2. Joint Optimization the type indicators of entity names C the type signatures of relation phrases {PL; PR} F follows from Hypothesis 1 to model type propagation

Clustering-integrated Type Propagation follows Hypotheses 3 and 4 to model the multi-view relation phrase clustering models the type indicator for each entity mention candidate, the mention-mention link and the supervision from seed mentions Finally, solve the real-valued relaxation of (2) and predict the exact type of each candidate mention using

Experiments

Reference [1] X Ling, DS Weld. Fine-Grained Entity Recognition. AAAI, 2012 [2] Thomas Lin, Mausam, Oren Etzioni. No noun phrase left behind: detecting and typing unlinkable entities. EMNLP-CoNLL, 2012 [3] Xiang Ren, Ahmed El-Kishky, Chi Wang, etc. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering. KDD 2015 [4] A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable topical phrase mining from text corpora. VLDB, 2015.