Presentation is loading. Please wait.

Presentation is loading. Please wait.

SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

Similar presentations


Presentation on theme: "SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural."— Presentation transcript:

1 SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural Language David Mott, Dave Braines, ETS, Hursley, IBM UK

2 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 2 Team Dave Braines, David Mott –IBM, Hursley Steve Poteet, Ping Xue, Anne Kao –Boeing, Seattle Paul Smart, Antonio Penta, Ron Tasker –University of Southampton

3 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 3 International Technology Alliance (ITA) in network and information sciences How can coalition operations be assisted by networks of computer systems? US/UK Academic/Industry collaboration 10 year programme ending in May 2016 –Sponsored by UK MOD and US ARL –Research must be scientific, fundamental, reviewed by academic peers, and published

4 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 4 ITA Consortium Members

5 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 5 Fundamental Research Issues How do we assist people to create and use applications that reason? – Modelling concepts, relationships and rules of inference – Grasping the basic logic of the model and rules – Understanding the reasoning performed by others – Sharing understanding across the human team – Sharing reasoning and artefacts across different systems

6 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 6 Supporting the "analyst" doc27 CE Facts InferenceRationale Argumentation Query Analysts Conceptual Model Assumption s Uncertainty CNL Tools NLP Requirements Product

7 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 7 Analysts's "Conceptual Model" Analyst represents specialist knowledge as concepts, facts and rules for inference –a conceptual model –a common set of concepts The system must "understand" the conceptual model –assist analyst to search for patterns, deduce information A language to build the conceptual model –analyst: easy to understand –system: readable, unambiguous and formal We use Controlled English to express the model

8 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 8 Controlled English A Controlled Natural Language, being a subset of English –limited syntax, but still readable as English –meanings of the expressions unambiguously defined Avoids the complexity of a real Natural Language –computer systems can read, interpret and apply it Retains the appearance of a real language –humans can naturally use it, without learning "computer speak" The analyst may use Controlled English to construct their Conceptual Model the person John is married to the person Jane and has red as hair colour. Based on work by John Sowa

9 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 9 CE for Reasoning CE used to define: –"propositions", facts, assumptions –logical rules –queries –meta model of concepts Inference engines constructed to apply logical rules –Specific Prolog implementations –CE Store based on Java and SQL Rationale may be constructed: –presented to users for hybrid man/machine reasoning –to determine dependencies Formal semantics for CE –(partially defined) in FOPL Applications –analysis of information –societal and open government data –planning and resource allocation –(in progress) NLP

10 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 10 Fact Extraction using Controlled Natural Language As the target of the NL processing –facts in documents can be used for further reasoning As a means of describing the NL processing –to share understanding of the linguistic processing –to help configure NL tooling

11 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 11 Controlled English is "Curiously Useful" – Why? perhaps because humans are naturally good at using language to model, understand and reason we can build upon "literary devices" already developed to solve problems in expressing knowledge

12 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 12 Conceptual Model(s) Meta Model Concept, Entity Concept, Relation Concept, Conceptual Model belongs to, has as domain Semiotic Triangle Thing, Meaning, Symbolstands for, expresses General Agent, Spatial Entity, Temporal Entity, Situation, Container has as agent role, is contained in Linguistic Sentence, Phrase, Word, Noun, Linguistic Category, Linguistic Frame has as dependent, is parsed from ACM Place, Church, Person, Village, IED, Facility,....is located in meaning symbol thing conceptualises stands for expresses "Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ]

13 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 13 Current NL Processing Stanford Parser Entity Extractor Situation Extractor Names CE Aggregator CEStore SYNCOIN Reports Message PreProcessor "Stylistic" CE Conceptual Model (concepts, logical rules, linguistic expression) Proper Nouns (places, units) For Analysis Our focus is on the semantics of the conceptual model

14 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 14 General Semantics: Containers if ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the thing T2 ) then ( the thing T2 is a container ). the noun phrase np1 the prepositional phrase pp1 has as dependent "the patrol in East Rashid discovers the facility." the word |in| the thing t1 stands for the noun phrase np2 has as headhas as object container is a the thing t2 stands for is contained in if ( the noun phrase NP1 stands for the thing T1 and has the prepositional phrase PP as dependent ) and ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the container T2) then ( the thing T1 is contained in the container T2 ). Least Commitment approach – dont say what sort of container

15 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 15 Specific Semantics: Entities from Noun Phrases the noun phrase np1 if ( the noun phrase NP has the noun N as head and stands for the thing T ) and ( the noun N expresses the entity concept C ) then ( the thing T realises the entity concept EC ). "the patrol in East Rashid discovers the facility." the noun |patrol| has as head the thing s1 stands for the entity concept 'patrol unit' expresses realises patrol unit Analyst's helper is a Requires "expresses" link between words and concepts

16 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 16 "Analyst's Helper" Analyst Helper NL parser "expresses" conceptual model Proper Names wordnet/etc meta information ITAnet MetaModel generator gazetteers etc Analyst the word |xxx| is an unrecognised word wordnet/etcgazetteers etc translate semantic rules the word |www| expresses the concept yyy Only the analyst knows what the concepts mean

17 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 17 Current question How should the "expresses" link be made more expressive! –conditional rules to handle ambiguous words –selectional constraints based on semantics of models? –introduce verbnet, etc? –...

18 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 18 The ambiguity barrier we start from basic CE and move towards full English Can we control the crossing of the ambiguity barrier? Basic CE anaphoric reference sub clauses prepositional phrasesflexible identities verb inflections domain specific syntax Ambiguity Ambiguity Barrier Full English CE needs to be enhanced

19 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 19 "Identical" NL and CNL parsers NL Parser CNL Parser lexicon conceptual model Reference English Grammar Semantic Theory Increase stylistic expressibility of CE Better understanding of linguistics stylistically expressive CE basic CE or predicate logic or CE-in-Java stylistically expressive CE NLP

20 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 20 Linguistic Frame for semantics there is a linguistic frame named vp0 that has 'is the dog Fido' as example and defines the verb phrase VP_vp0 and has the sequence ( the copula BE_vp0, and the noun phrase OBJ_vp0 ) as syntactic pattern and is predicated on the thing T and has the statement that ( the noun phrase OBJ_vp0 is predicated on the thing OBJ ) and ( the thing T is the same as the thing OBJ ) as semantic statement. the word |is| belongs to the linguistic category 'copula'. the word |dog| is a noun. the entity concept ce:Dog is expressed by the word |dog| and has 'dog' as concept term. semantics syntax copula noun phrase verb phrase is the dog fido v(OBJ), dog(OBJ).. v(T) T=OBJ,... Analyst's Conceptual Model Linguistic Model We want exactly the same logic here as in the real NL processing

21 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 21 Could we? use LKB instead of the Stanford Parser? use the ERG instead of WordNet etc? –where does the Analysts Helper fit in? improve our linguistic model to take account of LKB semantic theory? represent MRS in CE? represent linguistic rules in CE?


Download ppt "SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural."

Similar presentations


Ads by Google