Relation Extraction CSCI-GA.2591

Slides:

Advertisements

Similar presentations

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Advertisements

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Deep Learning in NLP Word representation and how to use it for Parsing

Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.

Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman.

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Relation Extraction: Rule-based Approaches CSCI-GA.2590 Ralph Grishman NYU.

Learning to Extract CSCI-GA.2590 Ralph Grishman NYU.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Graph-based WSD の続き DMLA /7/10 小町守.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Data Mining Practical Machine Learning Tools and Techniques

Neural Machine Translation

Convolutional Sequence to Sequence Learning

Support Feature Machine for DNA microarray data

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Preliminaries CSCI-GA.2591

Tokenizer and Sentence Splitter CSCI-GA.2591

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Entity- & Topic-Based Information Ordering

NYU Coreference CSCI-GA.2591 Ralph Grishman.

(Entity and) Event Extraction CSCI-GA.2591

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Training and Evaluation CSCI-GA.2591

Neural networks (3) Regularization Autoencoder

Are End-to-end Systems the Ultimate Solutions for NLP?

Deep learning and applications to Natural language processing

Introduction Feature Extraction Discussions Conclusions Results

CSC 594 Topics in AI – Natural Language Processing

Machine Learning Week 1.

convolutional neural networkS

Support Vector Machines

Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui

convolutional neural networkS

Introduction Task: extracting relational facts from text

COSC 4335: Other Classification Techniques

Word embeddings based mapping

CSCI 5832 Natural Language Processing

Word embeddings based mapping

Automatic Extraction of Hierarchical Relations from Text

CSCI 5832 Natural Language Processing

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Attention for translation

Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

CSE 291G : Deep Learning for Sequences

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Modeling IDS using hybrid intelligent systems

Support Vector Machines 2

Presentation transcript:

Relation Extraction CSCI-GA.2591 NYU Relation Extraction CSCI-GA.2591 Ralph Grishman

ACE Relations An ACE relation mention connects two entity mentions in the same sentence: the CEO of Microsoft  OrgAff:employment(the CEO of MIcrosoft, Microsoft) in the West Bank, a passenger was wounded  Phys:Located(a passenger, the WestBank) ACE 2005 had 6 types of relations and 18 subtypes most papers report on types only Most relations are local … in roughly 70% of relations, arguments are adjacent or separated by one word so chunking is important but full parsing is not critical

Benchmarks ACE 2003 / 2003 / 2005 corpora Semeval-2010 task 8 generally assuming perfect entity mentions on input some work assumes only position (and not semantic type) is given Semeval-2010 task 8 carefully selected examples of 10 relations a classification task

Using MaxEnt First description of an ACE relation extractor IBM system [Kambhatla ACL 2004] Used features: words entity type mention level overlap dependency tree parse tree used 2003 ACE data F = 55 (perfect mentions) 23 (system mentions) good system mentions are important

Lots of features Singapore system [Zhou et al. ACL 2005] used a very rich feature set, including 11 chunk-based features family-relative feature 2 country-name features 7 dependency-based features . . . highly tuned to ACE task F = 68 (relation type) F = 55 (subtype) reports several % gain over IBM used perfect mentions further extended at NYU, on ACE 2004: F=70.1

Kernel methods and SVMs As an alternative to a feature-based model, one can provide a kernel function: a similarity function between pairs of the objects being classified kernel can be used directly by a kNN nearest neighbor classifier or can be used in training an SVM [Support Vector Machine]

SVM The SVM, when trained, creates a separating hyperplane if data is fully separable, all data on one side of the hyperplane are classified +, on the other side – inherently binary classifier

Benefit of kernel methods provides a natural way of handling structured input of variable size: sequences and trees feature-based system may require a large number of features for the same effect

Shortest-path kernel [Bunescu & Mooney EMNLP 2005] Sept 2002 corpus Based on dependency path between arguments Kernel function between two paths x and y of lengths m and n c = degree of match (lexical / POS) Train SVM F = 52.5

Tree kernel To take account of more of the tree than the dependency path, use PET (path-enclosed tree) PET = Portion of tree enclosed by shortest path Using entire sentence tree introduces too much irrelevant data Use a tree kernel which recursively compares the two trees For example, counts number of shared subtrees Best kernel is a composite kernel: tree kernel + entity kernel

Lexical Generalization Test data will include words not seen in training Remedies Use lemmas Use Brown clusters Use word embedings Can be used with feature-based or kernel-based methods

FCM Feature-Rich Compositional Embedding Models Combines word embedding and hand-made discrete features: where e is the word embedding vector f is a vector of hand-coded features T is a matrix of weights If e is fixed during training, this is a feature-rich log linear model

Neural Network neural networks provide a richer model than logLinear reduce the need for feature engineering although it may help to add features to embeddings but are slow to train and hard to inspect several types of networks have been used convolutional NNs recurrent NNs an ensemble of different NN types appears most effective may even include log linear model in ensemble

Some comparisons ACE 2005, train nw+bn, test bc, perfect mentions, including entity types LogLinear system 57.8 FCM 61.9 hybrid FCM 63.5 CNN 63.0 NN ensemble 67.0 The richer model of even a simple NN beats a log linear (maxent system) [Nguyen and Grishman, IJCAI Workshop 2016]

Comparing scores Using subset of ACE 2005 (news) Feature-based system Perfect mention position but no type info Baseline 51.4 Single Brown Cluster 52.3 Multiple clusters 53.7 Word Embedding (WE) 54.1 Multiple clusters + WE 55.5 Mult. clusters + WE + regularization 59.4 Moral: lexical generalization & regularization are worthwhile (probably for all ACE tasks) [Nguyen & Grishman ACL 2014]

Distant Supervision We have focused on supervised methods, which produce the best performance If we have a large data base with instances of the relations of interest, we can use distant supervision Use data base to tag corpus If DB has relation R(x,y), tag all sentences in corpus containing x and y as examples of R Train model from tagged corpus

Distant Supervision By itself, distant supervision is too noisy If the same pair <x, y> is connected by several relations, which one to we label? But it can be combined with selective manual annotation to produce a satisfactory result