Effective Entity Recognition and Typing by Relation Phrase-Based Clustering 20151116.

Slides:

Advertisements

Similar presentations

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.

Advertisements

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.

A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date ： 2014/04/15 Source ： KDD’13 Authors ： Chi Wang, Marina Danilevsky, Nihit.

Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.

Mining di Dati Web Web Community Mining and Web log Mining : Commody Cluster based execution Romeo Zitarosa.

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.

Aki Hecht Seminar in Databases (236826) January 2009

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

Relation Extraction for Academic Collaboration Project Proposal Justin Betteridge, Matthew Bilotti, Simon Fung, Sophie Wang Jan 26, 2006.

Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.

Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

LOGO 1 Corroborate and Learn Facts from the Web Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Shubin Zhao, Jonathan Betz (KDD '07 )

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Bo Lin Kevin Dela Rosa Rushin Shah.  As part of our research, we are working on a cross- document co-reference resolution system  Co-reference Resolution:

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

NELL Knowledge Base of Verbs

Measuring Monolinguality

Clustering of Web pages

A Brief Introduction to Distant Supervision

Entity- & Topic-Based Information Ordering

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Reading Report: Open QA Systems

Probabilistic Data Management

KDD 2004: Adversarial Classification

Aspect-based sentiment analysis

Summarizing Entities: A Survey Report

Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona

CS7280: Special Topics in Data Mining Information/Social Networks

Social Knowledge Mining

Statistical NLP: Lecture 9

School of Library and Information Science

Graph Database Mining and Its Applications

Data Integration for Relational Web

Adaptive entity resolution with human computation

Introduction Task: extracting relational facts from text

Text Mining & Natural Language Processing

Measuring Complexity of Web Pages Using Gate

Text Mining & Natural Language Processing

Enriching Taxonomies With Functional Domain Knowledge

CMU Y2 Rosetta GnG Distillation

CS565: Intelligent Systems and Interfaces

Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

Entity Linking Survey

Data Pre-processing Lecture Notes for Chapter 2

Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University

“Traditional” image segmentation

Topic: Semantic Text Mining

Statistical NLP : Lecture 9 Word Sense Disambiguation

Connecting the Dots Between News Article

Yingze Wang and Shi-Kuo Chang University of Pittsburgh

KnowItAll and TextRunner

Presentation transcript:

Effective Entity Recognition and Typing by Relation Phrase-Based Clustering 20151116

Content Motivation Definition Problem model ClusType Algorithm Experiments

Motivation Fine-Grained type information is useful for downstream applications (e.g, it improved the F1 score by 93% for a relation extraction system[1]) Traditional named entity recognition systems are designed for several major types (e.g., person, organization, location) and general domains (e.g., news), require additional steps for adaptation to a new domain and new types. Entity linking techniques suffer coverage and freshness (e.g., over 50% entities mentioned in Web documents are unlinkable [2]) Previous methods have difficulties in handling entity mentions with sparse context. there are often many ways to describe even the same relation between two entities (e.g., “beat” and “won the game 34-28 over”)

Definition

Problem model Based on several hypotheses Hypothesis 1: Entity-Relation Co-occurrences If surface name c often appears as the left (right) argument of relation phrase p, then c's type indicator tends to be similar to the corresponding type indicator in p's type signature. Hypothesis 2: Mention correlation If there exists a strong correlation (i.e., within sentence, common neighbor mentions) between two candidate mentions that share the same name, then their type indicators tend to be similar.

Problem model Hypothesis 3: Type signature consistency. If two relation phrases have similar cluster memberships, the type indicators of their left and right arguments (type signature) tend to be similar, respectively. Hypothesis 4: Relation phrase similarity. Two relation phrases tend to have similar cluster memberships, if they have similar (1) strings; (2) context words; and (3) left and right argument type indicators.

H1 H3 H4 H2

ClusType Algorithm Framework Overview 1. Perform phrase mining on a POS-tagged corpus to extract candidate entity mentions and relation phrases, and construct a heterogeneous graph G 2. Collect seed entity mentions ML as labels by linking extracted candidate mentions M to the KB Ψ. 3. Estimate type indicator y for unlinkable candidate mention m ∈MU with G using clustering-integrated type propagation.

Candidate Generation[4] 1. mining frequent contiguous patterns up to a fixed length 2. using a greedy agglomerative algorithm to generate longer phrases and terminates when the next highest-score merging does not meet a pre-defined significance threshold.

Construction of Graph G Name-Relation Phrase Subgraph

Construction of Graph G Mention Correlation Subgraph Mention-Name Subgraph Washington <-> 76_Washington

Clustering-integrated Type Propagation 1. Seed Mention Generation utilize a entity name disambiguation tool (http://spotlight.dbpedia.org/) and only keep entity mapped with high confidence scores (η > 0.8) 2. Joint Optimization the type indicators of entity names C the type signatures of relation phrases {PL; PR} F follows from Hypothesis 1 to model type propagation

Clustering-integrated Type Propagation follows Hypotheses 3 and 4 to model the multi-view relation phrase clustering models the type indicator for each entity mention candidate, the mention-mention link and the supervision from seed mentions Finally, solve the real-valued relaxation of (2) and predict the exact type of each candidate mention using

Experiments

Reference [1] X Ling, DS Weld. Fine-Grained Entity Recognition. AAAI, 2012 [2] Thomas Lin, Mausam, Oren Etzioni. No noun phrase left behind: detecting and typing unlinkable entities. EMNLP-CoNLL, 2012 [3] Xiang Ren, Ahmed El-Kishky, Chi Wang, etc. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering. KDD 2015 [4] A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable topical phrase mining from text corpora. VLDB, 2015.