Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:

Slides:



Advertisements
Similar presentations
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
Advertisements

The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
A Novel Approach to Event Duration Prediction
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
8/13/2004NYCNLP (COLING 2004) Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Pre-CODIE System: Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University Crosslingual On-Demand Information Extraction IE from Japanese source.
CS 4705 Robust Semantics, Information Extraction, and Information Retrieval.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
Interactive Discovery and Semantic Labeling of Patterns in Spatial Data Thomas Funkhouser, Adam Finkelstein, David Blei, and Christiane Fellbaum Princeton.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
NYU: Description of the Proteus/PET System as Used for MUC-7 ST Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Semi-automatic Product Attribute Extraction from Store Website
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Automatically Labeled Data Generation for Large Scale Event Extraction
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Introduction to Information Extraction
Social Knowledge Mining
CS246: Information Retrieval
Natural Language Processing at NYU: the Proteus Project
Presentation transcript:

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 2 Outline Introduction Research Proposal – Problem Setting – Approach – Application to Information Extraction Discussion

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 3 MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism. MUC Scenario Template Task DatePerpetratorWeaponVictimLocation

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 4 MUC Scenario Template Task DatePerpetratorWeaponVictimLocation Masked gunmen Mondaysix people three Kalashnikov rifles a Christian school MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism.

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 5 High Cost for Acquiring Knowledge-Base Find extraction patterns – Find relevant documents – Find relevant events – Analyze sentences Find domain-specific lexicon – Find existing KB (e.g. thesaurus, gazetteers)

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 6 Prior Work Automatic Knowledge Acquisition Lexical AcquisitionPattern Acquisition Mutual Bootstrapping (Riloff and Jones 1999) Simultaneous Multi-Semantic Class (Thelen and Riloff 2002) (Yangarber et al. 2002) Pattern Discovery with Document Re-ranking (Yangarber et al. 2000) Pattern Acquisition for QA (Ravichandran and Hovy 2002)

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 7 Challenge Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set User Knowledge Base MUC-3: Terrorism Event

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 8 Meeting the Challenge Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set User Knowledge Base Semantic Clustering Scenario Description Semantic Cluster

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 9 Semantic Clustering Scenario Description Semantic Cluster – Description specific enough to define the scenario – (terrorism, bombing, kidnapping) – “ Tell me about the terrorism action, such as bombing and kidnapping.” – Find Scenario-specific Semantic Clusters each of which consists of – Semantic Lexicon – Extraction Patterns Goal: Input: Semantic Lexicon Extraction Patterns

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 10 Benefit for User Semantic Clustering Scenario Description Semantic Cluster Simplify Domain Analysis Low-cost Knowledge-base Acquisition for IE systems

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 11 Extraction Patterns Definition where c unifies with the context that is defined by semantic class L context = Case Frame: (bomb (v), x (subj), himself (obj)) Sequential: (x, bombs, himself) Dependency: himselfbombx (cf. Sudo et al. 2001) V:subjV:obj

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 12 Outline Introduction Research Proposal – Problem Setting – Approach – Information Extraction Evaluation

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 13 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 14 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 15 Information Retrieval Get Relevant Document set Get list of lexical items and extraction patterns ordered by relevance to the scenario – TF/IDF scoring R

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 16 Example of TF/IDF scoring (Management Succession: Business) 300 documents retrieved From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998)

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 17 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source extraction patterns lexicon

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 18 Bootstrapping Assumption:  Patterns provide Lexical Classes.  Lexicon provides contextual information. Riloff and Jones 1999 Agichtein and Gravano 2000 Find one cluster that consists of Lexicon and Extraction Patterns

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 19 Bootstrapping (Cont.) Algorithm ( cf. Riloff and Jones 1999 ) – Given the ordered list of terms the ordered list of extraction patterns Lexicon = (), Pattern = () – w  the most relevant term in the list and add it into Lexicon 1. p  the most relevant pattern among those that extract w. 2. Add p into Pattern 3. w  the most relevant term among those that are extracted by p 4. Add w into Lexicon 5. Go to 1

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 20 Example of Bootstrapping (Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998)

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 21 Example of Bootstrapping (Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998)

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 22 Problem: Polysemous Lexicon, Pattern Lexicon can be ambiguous – e.g. Clinton (Person, Organization, Location … ) Extraction patterns can be ambiguous – e.g. be killed in (x: Location, Date … ) Needs more study – more restriction – Probabilistic Model ??

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 23 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source pattern lexicon ptlex

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 24 Query Expansion Generalize terms in a query with a newly discovered cluster – cf. Rocchio 1971 (Vector model) – Zhai and Lafferty 2001 (Language-modeling)

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 25 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source pattern lexicon ptlex

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 26 Outline Introduction Research Proposal – Problem Setting – Approach – Application to Information Extraction Discussion

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 27 Application to Information Extraction Semantic Clustering Scenario Description Semantic Cluster Preprocessing Entity Recognition Event Recognition Role Assignment Merging Pattern Matching Semantic Lexicon Extraction Patterns

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 28 Human Intervention Extraction patterns – Event pattern Context contains a verb or nominalization of verb Used for event extraction and role assignment e.g. ( terrorist, fire, x ) – Local pattern Context contains only enough information to recognize semantic class Used for entity recognition only e.g. ( x,Inc. ) Association of Event Pattern to Role – e.g. (company, hire, x)  PersonIn and (company, fire, x)  PersonOut

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 29 Outline Introduction Research Proposal – Problem Setting – Approach – Application to Information Extraction Discussion

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 30 Discussion Domain Portability – User only needs to specify the scenario Language Portability – Language-dependent Tools Segmentation (Lemmatization) Dependency Parsing

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 31 Evaluation MUC-style (Scenario-Template task) – Slot-base Precision, Recall, F-measure – Domain Portability Several pre-defined tasks that differ in difficulty – Language Portability Japanese English

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 32 Contribution Tool for Domain Analysis Low-cost Knowledge-base Acquisition Towards Open-domain Information Extraction

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 33 Conclusion Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) Discussed Evaluation with several predefined MUC- style tasks different in difficulty and across languages (Domain portability and Language portability)

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 34 ToDo Implementation Preparation for Evaluation Evaluation

August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 35 Time for Questions (Conclusion) Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) Discussed Evaluation with several predefined MUC- style tasks different in difficulty and across languages (Domain portability and Language portability)