International Technology Alliance in Network & Information Sciences Using the English Resource Grammar to extend fact extraction capabilities v1.1 David.

Slides:



Advertisements
Similar presentations
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,
Advertisements

SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Fact Extraction.
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Semiotics and NLP.
KR-2002 Panel/Debate Are Upper-Level Ontologies worth the effort? Chris Welty, IBM Research.
Applying the Human Views for MODAF to the conception of energy-saving work solutions Dr Anne Bruseberg Systems Engineering & Assessment Ltd, UK on behalf.
Natural Language Processing Lecture 2: Semantics.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Using Visual Patterns to Facilitate Learning. Developed in 1988 by Dr. David Hyerle. A common visual language for learning.A common visual language.
Automating programming via concept mining, probabilistic reasoning over semantic knowledge base of SE domain by Max Talanov.
Towards an NLP `module’ The role of an utterance-level interface.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
End-to-End Design of Embedded Real-Time Systems Kang G. Shin Real-Time Computing Laboratory EECS Department The University of Michigan Ann Arbor, MI
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
System Analysis Overview Document functional requirements by creating models Two concepts help identify functional requirements in the traditional approach.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
2002 October 10SFWR ENG 4G030 Translating from English into Mathematics SFWR ENG 4G Robert L. Baber.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
NSW Curriculum and Learning Innovation Centre Draft Senior Secondary Curriculum ENGLISH May, 2012.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Ontologies Reasoning Components Agents Simulations Belief Update, Planning and the Fluent Calculus Jacques Robin.
Artificial intelligence project
Interpreting Dictionary Definitions Dan Tecuci May 2002.
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Writing an ERG mal-rule David Mott IBM Emerging Technology Services.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Artificial Intelligence: Natural Language
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Artificial Intelligence: Natural Language
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Natural Language Processing Chapter 1 : Introduction.
A Multidimensional Approach to Studying Cultural Differences & Coping Strategies in a Multinational Coalition Environment Presenter: Iya Whiteley – SEA,
Complex sentence analysis (2) D. Mott, ETS, IBM 5 th Nov 2014.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Concept mining for programming automation. Problem ➲ A lot of trivial tasks that could be automated – Add field Patronim on Customer page. – Remove field.
Automating programming via concept mining, probabilistic reasoning over semantic knowledge base of SE domain by Max Talanov.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In.
NL Processing and Fact Extraction 11th May 2013
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
CS246: Information Retrieval
Ling 566 Oct 14, 2008 How the Grammar Works.
Artificial Intelligence 2004 Speech & Natural Language Processing
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Representations & Reasoning Systems (RRS) (2.2)
Faculty of Computer Science and Information System
Presentation transcript:

International Technology Alliance in Network & Information Sciences Using the English Resource Grammar to extend fact extraction capabilities v1.1 David Mott, IBM UK Stephen Poteet, Anne Kao, Ping Xue, Boeing Research & Technology Ann Copestake, University of Cambridge ITA Fall Meeting October 2013

Research Objectives  Extraction of facts in Controlled English from Natural Language documents  express the document in a formal but still readable way  extracted facts can be used to infer new information  Facilitate configuration of NL processing tools in CE  human analyst can be more involved in the NL processing  a common model of linguistics, grammar, and semantics  Provide rationale for linguistic and analytic processing  human can better understand and review the reasoning  facilitate evaluation of the quality of the reasoning We are not tasked with creating fundamental breakthroughs in the theory of NL processing

other data Reference data Supporting the analyst doc27 CE Facts InferenceRationale Argumentation Query Analysts Conceptual Model Assumption s Uncertainty CE Tools NLP Requirements Product Linked data web Structured data CE Facts The analyst does not have time to read all the reports

Working Scenario  Imagine you are an analyst in a team, being asked to provide high value information about events on the ground  Based upon reports and background reference material:  You want to extract basic facts from these reports and to infer new information  You want to have “new ideas” and implement this quickly without IT involvement  You want to understand and review the collaborative reasoning of the team which may contain differing skills 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male ( ) in Bayaa to an unidentified male ( ) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds Source: SYNCOIN simulated reports Graham, Rimland, & Hall. (2011). A COIN-inspired Synthetic Dataset for Qualitative Evaluation of Hard and Soft Fusion Systems: Proc, 14th international conference on information fusion. Chicago, IL.

The state of the BPP11 research  We are using CE  as the target language for expressing facts  as the shared model of the concepts being expressed  as the language to configure NL systems Detecting structures in phrases Mapping language expressions to concepts  as the way to reveal reasoning performed by a collaborative team Text Phrase structures Facts Generic Semantics Domain Semantics Controlled English Analysts Reasoning High Value Facts

Motivation for using DELPH-IN linguistics  Collaborate with DELPH-IN consortium, to extend our NL and fact extraction capabilities  ERG is a high-coverage, high-precision English grammar, developed over 20 years  MRS is the representation of semantics  PET parser is an efficient parser  Explore Controlled English as possible facilitator for the use of DELPH-IN linguistic resources  Provide opportunity to research into deeper semantic processing  contribute to the NL research community Typed Feature Structures English Resource Grammar, Stanford Linguistic Knowledge Builder, Cambridge PET parser Minimal Recursion Semantics, Cambridge Japanese, German, Norwegian, Thai, Chinese, Spanish,... Translation

Integrating CE and the ERG  Use ERG (and PET) to parse sentences and provide phrase structures  Use MRS to express generic semantics  Represent domain semantics in MRS, by extending generic semantics  Research into the integration of domain semantics and linguistic processing Text Phrase structures Facts Generic Semantics Domain Semantics Controlled English Analyst’s Reasoning High Value Facts ERG MRS ?

Raw ERG system output PARSE TREE (syntax) MRS (semantics) We will turn this into CE

Defining the ERG lexicon in CE  Transformation between the ERG structures (Typed Feature Structures) and CE there is a count noun named checkpoint_n1 that is written as the word |checkpoint| and is a form of the noun sense ‘_checkpoint_n_1_rel’. checkpoint_n1 := n_-_c_le & [ ORTH, SYNSEM [ LKEYS.KEYREL.PRED "_checkpoint_n_1_rel", PHON.ONSET con ] ]. The user has to define this link Is this easier to understand? the noun sense ‘_checkpoint_n1_rel’ expresses the entity concept ‘checkpoint’.  Mapping between generic semantics and specific semantics the noun sense ‘_carpet_n1_rel’ expresses the entity concept ‘carpet’.

Defining ERG grammar rules in CE Subcomponents of phrase are “head daughter” followed by “non head” daughter basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS ]. there is a linguistic frame named f1 that defines the basic-head-initial PH and has the sequence ( the sign A0, and the sign A1 ) as subcomponents and has the statement that ( the basic-head-initial PH has the sign A0 as HD-DTR and has the sign A1 as NH-DTR ) as semantics. a basic-head-initial ARGS a list 0TH a sign HD-DTRa thing NH-DTR a thing 1ST a sign

Three stage approach to defining MRS in CE 1.Generate raw representation of :  elementary predications (EPs) as objects with predicate and arguments  scope information between EPs  features of the entities involved 2.Extract intermediate, but generic, concepts describing the raw MRS:  patterns of quantification  … 3.Transform into domain specific CE concepts  using links between the predicate and the CE concept.  …

Step 1 - CE version of raw MRS x5 – “I” x9 – “new carpet” x5 “needs” x9 Still needs to be turned into more understandable concepts …

if ( there is an indefinite quantification Q that is on the thing T and has the mrs predicate MRS as sense ) and ( the mrs predicate MRS expresses the entity concept EC ) then ( the thing T is an EC ). the mrs elementary predication #ep7_3 is an instance of the mrs predicate ‘_udef_q_rel’ and has the thing x9_8 as zeroth argument. there is an indefinite quantification named q2 that is on the thing x9_8 and has the mrs predicate “_carpet_n_1_rel” as sense. the mrs elementary predication #ep7_5 is an instance of the mrs predicate '_carpet_n_1_rel’ and has the thing x9_8 as zeroth argument. the mrs predicate “_carpet_n_1_rel” expresses the entity concept ‘carpet’. the thing x9_8 is a carpet. the mrs elementary predication #ep7_3 equals modulo quantifiers the mrs elementary predication #ep7_5. rule to detect quantifier pattern in MRS Raw Intermediate Domain 3 Steps to Domain Semantics

Facts extracted from example sentence 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male ( ) in Bayaa to an unidentified male ( ) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds If other reports can add to information on the man x5_8 then we may know who is requiring new carpets, and could predict future events? This requires a number of linguistic and domain specific steps

Discussion  DELPH-IN community have developed excellent Natural Language capabilities  We are integrating the “ERG system” and expressing lexicon, grammar rules and semantics in CE  However in the ERG system, the semantics are not completely separated from the linguistic structures  we propose intermediate semantic structures in CE, for bridging gap between generic and domain semantics  We are introducing domain semantics to represent facts in CE  provides a “target” for output of the ERG system  opportunity to explore how this can affect parsing of sentences  Much needs to to be done  improve integration  extend intermediate MRS  obtain rationale  feedback of semantic reasoning into the parsing  mechanisms to help adding/understanding of rules

Extra

ERG rules & typesERG lexicon PET parser Text MRS CE lexicon Conceptual model shallow processing CE facts PET parse tree Parse tree as CE Stanford Parser Raw MRS as CE Use same transformation to be consistent CE linguistic frames Information Flow Red links have been partially implemented

Rationale “the group of things x10 has the entity concept survey as categorisation.”  The rationale from the elementary predicates is:  How do we get the rationale FOR the elementary predicates?  could follow the parser tree + the TFS definitions, but nee a link between parse tree and MRS, which is so far not available

A layered Conceptual Model Meta ModelConcept, Entity Concept, Relation Concept, Conceptual Model belongs to, has as domain SemioticsThing, Meaning, Symbolstands for, expresses General Semantics Agent, Spatial Entity, Temporal Entity, Situation, Container has as agent role, is contained in LinguisticSentence, Phrase, Word, Noun, Fragment, Linguistic Frame has as dependent, is parsed from, expresses Analysts Domain Model Place, Person, Village, Communication, IED, Facility,.... is located in, monitors Our Semiotic Triangle, based on [Ogden, C. K. and Richards, I. A. (1923). ]

The ERG system architecture  PET is run under Linux (DEBIAN) in an ORACLE VirtualBox image  A Prolog program provides a web service for parsing sentences and turning the result into CE  Aiming to integrate to our CE Store sentence CE parse tree and MRS PET parser with ERG PROLOG CE generator PROLOG web service sentence parse tree and MRS CEparse tree and MRS

Feedback of domain reasoning to the parsing?  We want the domain to affect the parse, eg:  creating new lexical entries and grammar rules prior to parsing  But we also want arbitrary domain reasoning to affect the parse at runtime  Could this:  rule out inconsistent parses  provide disambiguations, and dialog context? ERG/PET DOMAIN REASONER facts constraints on linguistic phenomena ERG DOMAIN MODEL lexical entries, grammar rules

Linking text to domain situations

Working out the “requirer”  This can only be done by analysis of the communications as a whole (including anaphoric reference) 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male ( ) in Bayaa to an unidentified male ( ) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds STEP C STEP A Step C needs knowledge of the structure of the report and of communications Step A needs linguistic knowledge

Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE

Domain Situations a requirement a production a delivery a usage an agent the material has as material is requested by is requested from has as material is produced by is delivered by is delivered to an agent has as material an agent is performed by needs are these the same agent?

CE representation for parse tree

Defining ERG grammar rules in CE basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS ]. headed_phrase := phrase & [ SYNSEM.LOCAL [ CAT [ HEAD head & #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ], HD-DTR.SYNSEM.LOCAL local & [ CAT [ HEAD #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ] ]. Ordered sequence of subcomponents, Head daughter followed by non head daughter Some info is passed up from head daughter to “this” phrase Analysis of the rules for hd_cmp_u_c

Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE

Calling ERG system from Word