Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,

Slides:



Advertisements
Similar presentations
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Advertisements

1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
Information extraction from text Spring 2003, Part 2 Helena Ahonen-Myka.
Plain Text Information Extraction (based on Machine Learning ) Chia-Hui Chang Department of Computer Science & Information Engineering National Central.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Comparing Methods to Improve Information Extraction System using Subjectivity Analysis Prepared by: Heena Waghwani Guided by: Dr. M. B. Chandak.
Evaluation of representations in AI problem solving Eugene Fink.
Research Basics PE 357. What is Research? Can be diverse General definition is “finding answers to questions in an organized and logical and systematic.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Information Extraction CS 652 Information Extraction and Integration.
Annotation Free Information Extraction Chia-Hui Chang Department of Computer Science & Information Engineering National Central University
CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.
Aki Hecht Seminar in Databases (236826) January 2009
Information Extraction CS 652 Information Extraction and Integration.
CIS392Semester Projects1 CIS392 Text Processing, Retrieval, and Mining Overview of Semester Projects.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Methodology Conceptual Database Design
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Intelligent Tutoring Systems Traditional CAI Fully specified presentation text Canned questions and associated answers Lack the ability to adapt to students.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Information Extraction Sources: Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3), 261–377. Hobbs, J. R., & Riloff,
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Information Extraction
Processing of large document collections Part 10 (Information extraction: learning extraction patterns) Helena Ahonen-Myka Spring 2005.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
Applying Belief Change to Ontology Evolution PhD Student Computer Science Department University of Crete Giorgos Flouris Research Assistant.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
BMAN Integrative Team Project Week 2 Professor Linda A Macaulay.
Intelligent Database Systems Lab Presenter: WU, JHEN-WEI Authors: Rodrigo RizziStarr, Jose´ Maria Parente de Oliveira IS Concept maps as the first.
Ontology-Based Information Extraction: Current Approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Dimitrios Skoutas Alkis Simitsis
A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.
METADATA WORKSHOP Conclusions Keith Jeffery Peter Wittenburg.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
THE SUPPORTING ROLE OF ONTOLOGY IN A SIMULATION SYSTEM FOR COUNTERMEASURE EVALUATION Nelia Lombard DPSS, CSIR.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Supertagging CMSC Natural Language Processing January 31, 2006.
Quick Write Reflection How will you implement the Engineering Design Process with your students in your classes?
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University.
Write a function rule for a graph EXAMPLE 3 Write a rule for the function represented by the graph. Identify the domain and the range of the function.
Processing of large document collections Part 9 (Information extraction: learning extraction patterns) Helena Ahonen-Myka Spring 2006.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
Shock Progress & Direction. MetaMap Tokenized words for Mohammed – Enables him to test his new models for Pattern matcher Mallet Training Data for Laura.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
Chapter 2 Bring systems into being April Aims of this Lecture To explain what is “System Life-Cycle” To understand the systems engineering process.
Social Knowledge Mining
Plain Text Information Extraction (based on Machine Learning)
Precise Condition Synthesis for Program Repair
Presentation transcript:

Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence, 1993.

Overview Problem: Domain specific dictionary Portability Practicality Solution: AutoSlog Automatic Training Approach Representative corpus Terrorist event descriptions

Information Extraction Selective Concept Extraction CIRCUS: Sentence Analyzer Domain-specific Dictionary Concept node definition Enabling conditions Slots (e.g., victims, perpetrators, …)  Syntactic expectation  Hard and soft constraints

Extraction Procedure Sentence CIRCUS Concept Node Dictionary Concept Nodes

Observations News reports follow certain stylistic conventions. Assumption: the first reference to a targeted piece of information is most likely where the relationship between that information and the event is made explicit. The immediate linguistic context surrounding the targeted information usually contains the words or phrases that describe its role in the events.

Conceptual Anchor Points A conceptual anchor point is a word that should activate a concept CIRCUS: trigger word 13 Heuristics

Automated Dictionary Construction Input A set of training text Answer keys Output: a set of concept node definitions Procedure: how to generate a concept node definition?

A Good Concept Node Def. passive-verb

Domain Specifications A set of mappings from template slots to concept node slots Hard and soft constraints for each type of concept node slot A set of mappings from template types to concept node types

Another Good Concept Node Def.

A Bad Concept Node Def.

Reasons of Bad Definitions When a sentence contains the targeted string but does not describe the event. When a heuristic proposes the wrong conceptual anchor point. When CIRCUS incorrectly analyzes the sentence.

Evaluation Terrorist Event Description MUC-4 Corpus 1500 texts 1258 answer keys A human (5 hours) Keeps: 450 edits 4280 string fillers 1237 concept node definitions

Comparative Results System/ Test Set RecallPrecisionF-measure MUC-4/TST AutoSlog/TST MUC-4/TST AutoSlog/TST

Conclusions Achieve Above 90% performance of the hand-crafted dictionary (1500 person hours) Improve performance Create dictionary from scratch No explicit domain theory is needed Need an annotated training corpus Make progress for scalable and portable IE systems