SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

Slides:



Advertisements
Similar presentations
Chapter 4 Syntax Part IV.
Advertisements

1 Knowledge Representation Introduction KR and Logic.
Artificial Intelligence: Natural Language and Prolog
Sugar 2.0 Formal Specification Language D ana F isman 1,2 Cindy Eisner 1 1 IBM Haifa Research Laboratory 1 IBM Haifa Research Laboratory 2 Weizmann Institute.
Kapitel 10. Copyright © Houghton Mifflin Company. All rights reserved.10 | 2 1. The passive voice.
Kapitel 9. Copyright © Houghton Mifflin Company. All rights reserved.9 | 2 1. Reflexive pronouns.
Chapter 1 The Study of Body Function Image PowerPoint
Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Fact Extraction.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Semiotics and NLP.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Fourth normal form: 4NF 1. 2 Normal forms desirable forms for relations in DB design eliminate redundancies avoid update anomalies enforce integrity constraints.
1 Word Grammar and other cognitive theories Richard Hudson Budapest March 2012.
|epcc| NeSC Workshop Open Issues in Grid Scheduling Ali Anjomshoaa EPCC, University of Edinburgh Tuesday, 21 October 2003 Overview of a Grid Scheduling.
Prolog programming....Dr.Yasser Nada. Chapter 8 Parsing in Prolog Taif University Fall 2010 Dr. Yasser Ahmed nada prolog programming....Dr.Yasser Nada.
Randomized Algorithms Randomized Algorithms CS648 1.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Semantics (Chapter 17) Muhammed Al-Mulhem March 1, 2009.
ABC Technology Project
University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield.
Mathematics and Special Education Leadership Protocols
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
A Process to Identify the Enduring Skills, Processes, & Concepts for your Content Area 1.
IONA Technologies Position Paper Constraints and Capabilities for Web Services
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 IPSI 2003 © 2003 T. Abou-Assaleh, N. Cercone, & V. Keselj An Overview of the Theory of Relaxed Unification Tony Abou-Assaleh Nick Cercone & Vlado Keselj.
25 seconds left…...
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
We will resume in: 25 Minutes.
PSSA Preparation.
Chapter 11 Describing Process Specifications and Structured Decisions
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Natural Language Processing Lecture 2: Semantics.
Statistical NLP: Lecture 3
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
February 2009Introduction to Semantics1 Logic, Representation and Inference Introduction to Semantics What is semantics for? Role of FOL Montague Approach.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
International Technology Alliance in Network & Information Sciences Using the English Resource Grammar to extend fact extraction capabilities v1.1 David.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
November 2003CSA4050: Semantics I1 CSA4050: Advanced Topics in NLP Semantics I What is semantics for? Role of FOL Montague Approach.
Writing an ERG mal-rule David Mott IBM Emerging Technology Services.
Artificial Intelligence: Natural Language
“Mr Brown” a simple logic puzzle requiring common sense David Mott (ETS, IBM UK) Nov 2014 David Mott (ETS, IBM UK) Nov 2014 International Technology Alliance.
Complex sentence analysis (2) D. Mott, ETS, IBM 5 th Nov 2014.
ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In.
NL Processing and Fact Extraction 11th May 2013
Statistical NLP: Lecture 3
CSC 594 Topics in AI – Applied Natural Language Processing
Linguistic Essentials
Presentation transcript:

SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley, IBM UK Steve Poteet, Boeing

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 2 Supporting the analyst doc27 CE Facts InferenceRationale Argumentation Search Analysts Conceptual Model Assumption s Uncertainty CE Tools NLP Requirements Product

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 3 Controlled English A Controlled Natural Language, being a subset of English –limited syntax, but still readable as English –meanings of the expressions unambiguously defined Avoids the complexity of a real Natural Language –computer systems can read, interpret and apply it Retains the appearance of a real language –humans can naturally use it, without learning "computer speak" The analyst may use Controlled English to construct their Conceptual Model the person John is married to the person Jane and has red as hair colour.

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 4 Current NLP Research Objectives Improve Natural Language Processing of facts from documents –analyst may utilise more information when inferencing Allow the humans to be part of the NL processing –hybrid reasoning about ambiguities, incomplete parsing, etc Facilitate configuration of NLP tools in CE Define a model of linguistics, grammar, semantics Improve Expressibility of CE –much interest, but needs a more powerful grammar How is the Analysts Conceptual Model related to Natural Language?

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 5 We have used CE to model: [5] Collaborative Planning Analysis of IED activities and societal influences Matching Sensors to Missions Provenance Social Networks (Twitter) UK Government data (crimes, accidents, schools) NL processing itself

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 6 Our design principles for CE enhancement Retain existing principles of a CE conceptual model Based on full English grammar Chart parser for efficient syntax parsing Formal semantics, based upon scientific theory Higher level extensions handled in same theory Parser configurable in CE, based on linguistic model Modelling of Sentence Context Aim to significantly enhance CE expressivity

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 7 Parallel NL and CNL parsers NL Parser CNL Parser lexicon conceptual model Reference English Grammar Semantic Theory Increase expressibility of CE Better understanding of linguistics expressive CE basic CE or predicate logic expressive CE NLP

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 8 Control of ambiguity we start from basic CE and move towards full English How do we handle crossing the ambiguity barrier? Basic CE anaphoric reference sub clauses prepositional phrasesflexible identities verb inflections domain specific syntax Ambiguity Ambiguity Barrier Full English

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 9 Stanford parser as reference But only provides syntax, what about semantics? there is a person named Joe. Stanford CE parser

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 10 Extended CE Parser S NPVP EX there NP DT VP NN aperson VBZ is VBN named NP NNP Joe person(Joe) v(A), A=Joe, person(A) v(A), A=Joe exists(A) v(A), person(A) Semantics (based on Full English Syntax

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 11 Linguistic Frame there is a linguistic frame named vp0 that has 'is the dog Fido' as example and defines the verb phrase VP_vp0 and has the sequence ( the copula BE_vp0, and the noun phrase OBJ_vp0 ) as syntactic pattern and is predicated on the thing T and has the statement that ( the noun phrase OBJ_vp0 is predicated on the thing OBJ ) and ( the thing T is the same as the thing OBJ ) as semantic statement. the word |is| belongs to the linguistic category 'copula'. the word |dog| is a noun. the entity concept ce:Dog is expressed by the word |dog| and has 'dog' as concept term. semantics syntax copula noun phrase verb phrase is the dog fido v(OBJ), dog(OBJ).. v(T) T=OBJ,... Analyst's Conceptual Model Linguistic Model Makes explicit a semantic theory

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 12 Allowing analyst to define how words express concepts Analyst Helper Conceptual Model wordnetitanet Entity Extractor Stanford parser Document the concept C has the same meaning as the synset S. the noun phrase NP has the word W as head/modifier and stands for the thing T. the thing T is categorised as the concept C.

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 13 Mapping CE concepts to words via WordNet synsets meaning synset concept word sense word lexicographeranalyst word sense word the synset {tank, armoured combat vehicle} means the same as the concept tank. {tank,armoured combat vehicle} armoured combat vehicle/1 tank/1 armoured combat vehicle tank conceptualise a ~ tank ~ T. meeting of minds the synset {tank, armoured combat vehicle} has the word sense tank/1 as component. the word |tank| expresses the concept tank. The Analyst STILL has to decide the lexical relations, since only he knows what his concept is

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 14 CE rules to use WordNet to relate words to concepts if ( the synset S means the same as to the concept C ) and ( the synset S has the word sense WS as component ) and ( the word sense WS has the word W as word ) then ( the word W expresses the concept C ) Analyst provides the link between his meaning and a standard meaning Now the parser can link words to concepts

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 15 Rationale for entity extraction the concept C has the same meaning as the synset S. the noun phrase NP has the word W as head/modifier the word sense WS adds meaning to the wordnet synset S. the thing T is categorised as the concept C the noun phrase NP stands for the thing T. the word W expresses the concept C. the word W expresses the word sense WS Stanford Parser wordnet Document Entity Extractor the word sense WS adds meaning to the ita synset S. the word W expresses the word sense WS Analyst Helper Wordnet Inference there is an ita synset named S. (General Semantics)

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 16 Hierarchy of linguistic frames predicate CE semantics syntax the person John attends the meeting X. the person Jane attends the meeting X. there is a situation X that is categorised as the concept meeting and has the person John as agent role and has the person Jane as patient role. linguistic CE semantics syntax domain CE semantics syntax specialist CE semantics syntax John attends a meeting with Jane. Predicate Logic the formula f3 has the statement that ( there is a meeting situation [123] that has the person Jane as patient agent and has the person John as agent role ) as semantic expression

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 17 Combining Linguistic and Analytic Rationale A fact extracted by a parser may lead to conclusions via analysts reasoning –may include assumptions and uncertainty The extraction of the fact may itself include assumptions and uncertainty The total rationale graph of linguistic and analysts reasoning shows all sources of uncertainty –removing a linguistic assumption may lead to no support for the analysts conclusions Argumentation may need to occur at both the linguistic and analytic level –but different skills (and people) needed for the different levels

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 18 CE Store and Agents CE Store pre- processing Analysts Model Documents, Reports Analysis product dialog context grammar parsing1 semantic1 semantic2 semanticN analysts inference semantic models Metadata structure grammar parsing2 semantic3 Entities and relations Lexicon/ Grammar rules Parses Rules Metadata structure

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 19 Extractor/Anaphor Agent CE Store Analysts Model Stanford Parser Entity Extraction Entities and "same as" relations Parse Tree Rules SYNCOIN sentences Anaphor Resolution Java Agent Linguistic Model Analysts Model Linguistically Identified Linguistic Model Stanford Parser reads SYNCOIN data and generates parse trees Anaphor/Extractor Agent reads parse information and uses rules + models to: turn noun phrases into entities ("market") link noun phrases that are anaphoric references ("he")

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 20 Sample Entity Extraction Rules in CE if ( the noun phrase NP stands for the thing T and has the noun N as head ) and ( the noun N expresses the concept C ) then ( the thing T is categorised as the concept C ). if ( the noun phrase NP stands for the thing T and has the adjective A as modifier ) and ( the adjective A expresses the concept C ) then ( the thing T is categorised as the concept C ). if ( the noun phrase NP stands for the thing T and has the personal pronoun |he| as head ) then ( the thing T is a man ).

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 21 Simplistic Anaphor Rules in CE if ( the noun phrase NP has the personal pronoun PRP as head ) then ( the noun phrase NP is an anaphor ). if ( the noun phrase NPA is an anaphor ) and ( the noun phrase NPA follows the noun phrase NP ) and ( the noun phrase NP stands for the man T ) and ( the noun phrase NPA stands for the man TA ) then ( the noun phrase NPA is coreferent with the noun phrase NP ). if ( the noun phrase NP1 is coreferent with the noun phrase NP2 ) and ( the noun phrase NP1 stands for the thing T1 ) and ( the noun phrase NP2 stands for the thing T2 ) then ( the thing T1 is the same as the thing T2 ). Needs much more rules with selection constraints on the target NP Needs to handle more categories

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 22 Extended CE Parser Agent CE Store CE parser CE semantics semantic statement Entities Lexicon SYNCOIN sentences Grammar pattern Linguistic Frame mapping to concepts Predicate Logic Model SYNCOIN Model CE Parser agent reads SYNCOIN data and runs simple CE linguistic frames Agent extracts best" parse", turns into low level CE This is simple entity extraction when the noun phrase is at the start ("the man...") Java Agent Analysts Model

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 23 Extended CE Parser Chart Parser Phrase structure grammar lexical categories annotations lexicon of words, categories and syntactic features Semantic processor Semantic representation and combination lock-step Parse Trees Logical Representation Documents, Reports CE mapping to concepts semantic statement (1-1) syntactic pattern linguistic frame Linguistic Model Analyst's Conceptual Model Predicate Logic Model Mapping assumes simple 1=1 word to concept

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 24 CE fact extraction framework SYNCOIN Sentence as parsed by Stanford Parser + CE semantic extraction rules SYNCOIN Sentence as parsed by CE Parser + CE semantic extraction rules Basic syntactic parse tree information from Stanford Parser Basic syntactic parse tree information from CE Parser Semantic information more general than the ACM Semantic information added from Analysts Conceptual Model CE facts extracted from sentence

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 25 Applying rules to find entities

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 26 Prepositional phrase "in" as a container

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 27 Backup

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 28 Using WordNet to extend the linguistic mappings meaning synset concept word sense word lexicographeranalyst word sense word the synset {tank, armoured combat vehicle} means the same as the concept tank. {tank,armoured combat vehicle} armoured combat vehicle/1 tank/1 armoured combat vehicle tank conceptualise a ~ tank ~ T. meeting of minds the synset {tank, armoured combat vehicle} has the word sense tank/1 as component. synset the synset {tank,armoured combat vehicle} ' is a hyponym of the synset {military vehicle}'. {military vehicle}'. word military vehicle. the synset {military vehicle} means the same as the concept tank. the word |military vehicle| expresses the concept tank.

SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, All Rights Reserved. 29 CE rules to use WordNet to extend word-to-concept relations if ( the synset S means the same as the concept C ) and ( the synset S is a hyponym of the synset Super ) then ( the synset Super means the same as the concept C ).