A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.

Slides:

Advertisements

Similar presentations

Yansong Feng and Mirella Lapata

Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.

1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Xyleme A Dynamic Warehouse for XML Data of the Web.

Aki Hecht Seminar in Databases (236826) January 2009

July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.

Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:

Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,

Chapter 5: Information Retrieval and Web Search

Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.

Information Retrieval in Practice

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Mining Binary Constraints in the Construction of Feature Models Li Yi Peking University March 30, 2012.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.

A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

A Language Independent Method for Question Classification COLING 2004.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Chapter 6: Information Retrieval and Web Search

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Supervised Relation Extraction.

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Post-Ranking query suggestion by diversifying search Chao Wang.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Sentiment analysis algorithms and applications: A survey

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Social Knowledge Mining

Introduction Task: extracting relational facts from text

Automatic Detection of Causal Relations for Question Answering

Hierarchical, Perceptron-like Learning for OBIE

SVMs for Document Ranking

Topic: Semantic Text Mining

Connecting the Dots Between News Article

Stance Classification of Ideological Debates

Presentation transcript:

A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1

I NTRODUCTION Current search systems are based on 2 assumptions 1. Users send words, not sentences 2. The aim is finding documents which is related to the query words We are unconsciously get to select words which will appear nearby the target information In some cases this clue doesn’t work well 2

I NTRODUCTION For more convenient access to the information Analysis of the detail of question To know the target information Analysis of the information in retrieved documents To find the requested information Information Extraction 3

O UTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures Frequent substructure Shortest path between 2 words Applying the kernel method for structured data Conclusion 4

I NFORMATION E XTRACTION What is Information Extraction? A kind of task in natural language processing Addresses extraction of information from texts Not to retrieve the documents Originated with an international conference named MUC Message Understanding Conference (MUC) Competition of IE among research groups Set information extraction tasks every year between

MUC C OMPETITION An example of MUC task MUC-3 terrorism domain Input: news articles (some of them include terrorism event) Input: news articles (some of them include terrorism event) Output: the instances involved in each incident 6

MUC C OMPETITION Pattern matching or linguistic analysis At that time ( ), there were many difficulties to use advanced natural language processing Therefore, most of competitors adopted pattern matching to find instances 7

O UTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures Frequent substructure Shortest path between 2 words Applying the kernel method for structured data Conclusion 8

E XAMPLE OF P ATTERN M ATCHING CIRCUS [92 Lehnert et al.] Each pattern consists of “trigger word” and “linguistic pattern” Pattern: kidnap-passive Trigger: “kidnap” Linguistic pattern: “ passive-verb” Variable: “target” Pattern: kidnap-passive Trigger: “kidnap” Linguistic pattern: “ passive-verb” Variable: “target” “The mayor was kidnapped by terrorists.” 1.“kidnap” activates the pattern 2.“was kidnapped” is a passive verb phrase 3.The subject “mayor” is the target 9

P ROBLEMS OF P ATTERN M ATCHING It takes a huge amount of time to create patterns In many cases, they were handwritten It depends a lot on the target domain It is difficult to adapt to the new task Automatic construction of patterns Automatic construction of patterns 10

T HE E ARLIEST A UTOMATIC P ATTERN G ENERATION AutoSlog [93 Riloff et al.] Creates the patterns for CIRCUS automatically Training data: articles tagged the target word Created 1237 patterns from 1500 tagged texts Only 450 of them were judged to be valid by human “The mayor was kidnapped by terrorists.” Pattern: kidnap-passive Trigger: “kidnap” Linguistic pattern: “ passive-verb” Variable: “target” Pattern: kidnap-passive Trigger: “kidnap” Linguistic pattern: “ passive-verb” Variable: “target” 11

Recently it has become possible to use deeper linguistic analysis Some studies are addressing new IE tasks using these linguistic resources and machine learning approach 12

O UTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures Frequent substructure Shortest path between 2 words Applying the kernel method for structured data Conclusion 13

S ENTENCE S TRUCTURES Dependency Structure Describes modification relations between words One sentence makes up a tree structure Predicate-Argument structure Describes the semantic relations between predicate and argument One sentence makes up a graph structure 14

D IFFICULTIES TO U SE S TRUCTURED D ATA Most of the machine learning algorithms deal with the data as feature vectors It is difficult to express structured data (e.g. trees, graphs) as vectors The ways to use sentence structures for IE Frequent substructures Shortest paths between 2 words Applying the kernel method for structured data 15

O UTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures Frequent substructure Shortest path between 2 words Applying the kernel method for structured data Conclusion 16

IE WITH S UBGRAPH OF S ENTENCE S TRUCTURES On-Demand Information Extraction [06 Sekine et al.] Create extraction patterns on-demand and extract information with it query Relevant articles Relevant articles Frequent Subtree Mining Frequent Subtree Mining Article databaseDependency analyzer Table of Information Dependency trees Subtree patterns 17

E XPERIMENTAL R ESULTS Generated patterns Found patterns for a query “merger and acquisition” (M&A) Extracted Information For the query “acquire, acquisition, merger, buy, purchase” 18

E XPERIMENTAL R ESULTS Very quick construction of patterns In MUC, it is allowed to take one month ODIE takes only a few minutes to return the result No training corpus is needed ODIE learns extraction patterns from the data Information about reprising event can be extracted well Merger and acquisition Nobel prize winners 19

O UTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures Frequent substructure Shortest path between 2 words Applying the kernel method for structured data Conclusion 20

IE WITH S HORTEST P ATH BETWEEN W ORDS Extraction of interacting protein pair [06 Yakushiji et al.] Extract the interacting protein pairs from biomedical articles Focus on the shortest path between 2 protein names on predicate-argument structure Discriminate with Support Vector Machine (SVM) Entity1 is interacted with a hydrophilic loop region of Entity2. be entity1 interact with region of a a hydrophilic loop entity2 21

P ATTERN G ENERATION Variation of Patterns The extracted patterns are not enough Divide the patterns and combine them into new patterns MainPrep Entity ……… X X interact Y Y with protein region of 22

P ATTERN G ENERATION Validation of patterns Some of these patterns are inappropriate Each patterns are scored by its adequacy to the learning data Feature vector 23 TP: True Positive FP: False Positive TP: True Positive FP: False Positive

S UPPORT V ECTOR M ACHINE (SVM) 2 class linear classifier Divide the data space with hyperplane Margin maximization 24

E XPERIMENTAL R ESULTS Learning AImed corpus 225 abstracts of biomedical papers Annotated with protein names and interactions Extraction MEDLINE 14 million titles and 8 million abstracts Extracted data 7775 protein pairs 64.0% precision 83.8% recall 25

O UTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures Frequent substructure Shortest path between 2 words Applying the kernel method for structured data Conclusion 26

IE WITH T HE K ERNEL M ETHOD ON S ENTENCE S TRUCTURES Kernel Method e.g. SVM Data are used only in the form of dot products If you can calculate the dot product directly, you do not have to calculate the vector Furthermore, you can use other functions as long as they meet some conditions 27 Raw data vector space classifier Kernel function

R ELATION E XTRACTION Relation Extraction with Tree Kernel [04 Culotta et al.] Classify the relation between 2 entities 5 entity types (person, organization, geo-political-entity, location, facility) 5 major types of relations (at, near, part, role, social) Classify the smallest subtree of dependency tree which includes the entities 28

T REE K ERNEL Represents the similarity between 2 tree-shaped data Calculated as the sum of similarity of nodes 29 Dequeue a node pair Add the similarity Find all child node sequence pairs whose main features of the nodes are common Find all child node sequence pairs whose main features of the nodes are common Enqueue the child node pairs Is the queue empty? Is the queue empty? Return the similarity Enqueue root node pair Start End Yes No

C ALCULATION OF T REE K ERNEL Features of nodes The similarity between nodes are defined as the number of common features (except the main features) 30 Main features

C ALCULATION OF T REE K ERNEL 31 A A B B C C D D E E A’ B’ D’ E’ F’ A A B B A A D D D D E E C’ A’ B’ C’ A’ A A B’ A’ D’ A A B B C C E’ X and X’ denote the nodes whose main features are common A A C C A’ C’

E XPERIMENTAL R ESULTS Data set: ACE corpus 800 annotated documents (gathered from newspapers and broadcasts) 5 entity types (person, organization, geo-political-entity, location, facility) 5 major types of relations (at, near, part, role, social) 32 KernelPrecision (%)Recall (%) Bag-of-words kernel Tree kernel

O UTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures Frequent substructure Shortest path between 2 words Applying the kernel method for structured data Conclusion 33

C ONCLUSION Overview of Information Extraction The aim of information extraction Recent movement to use deep linguistic resource The way to use sentence structures for IE Difficulties of using structured data in machine learning Three different approaches to exploit them 34