NL Processing and Fact Extraction 11th May 2013

Slides:



Advertisements
Similar presentations
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,
Advertisements

SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Fact Extraction.
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,
Prolog programming....Dr.Yasser Nada. Chapter 8 Parsing in Prolog Taif University Fall 2010 Dr. Yasser Ahmed nada prolog programming....Dr.Yasser Nada.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Semantics (Representing Meaning)
APA Style Grammar. Verbs  Use active rather than passive voice, select tense and mood carefully  Poor: The survey was conducted in a controlled setting.
Statistical NLP: Lecture 3
Chapter 4 Basics of English Grammar
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Artificial Intelligence 2005/06 From Syntax to Semantics.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
International Technology Alliance in Network & Information Sciences Using the English Resource Grammar to extend fact extraction capabilities v1.1 David.
Writing an ERG mal-rule David Mott IBM Emerging Technology Services.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
“Mr Brown” a simple logic puzzle requiring common sense David Mott (ETS, IBM UK) Nov 2014 David Mott (ETS, IBM UK) Nov 2014 International Technology Alliance.
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
NATURAL LANGUAGE PROCESSING
SYNTAX.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Lexical Model v3.3 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In Network.
ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In.
1 Team Skill 3 Defining the System Part 1: Use Case Modeling Noureddine Abbadeni Al-Ain University of Science and Technology College of Engineering and.
Natural Language Processing Vasile Rus
A Brief intro to Project Management What can it do for you
Adapted from Kaplan SAT Premier 2017 Chapter 23
If: expressing different scenarios through language
Child Syntax and Morphology
COMMUNICATING IN THE WORKPLACE Sixth Canadian Edition
Intervention Strategies
Structure, Constituency & Movement
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
SYNTAX.
Chapter Eight Syntax.
Abstract descriptions of systems whose requirements are being analysed
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
What is Linguistics? The scientific study of human language
Chapter 4 Basics of English Grammar
Syntax.
CSC 594 Topics in AI – Applied Natural Language Processing
Chapter 9 Structuring System Requirements: Logic Modeling
Chapter Eight Syntax.
Representation, Syntax, Paradigms, Types
Linguistic Essentials
Chapter 4 Basics of English Grammar
Chapter 9 Structuring System Requirements: Logic Modeling
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
David Kauchak CS159 – Spring 2019
TECHNICAL REPORTS WRITING
Presentation transcript:

NL Processing and Fact Extraction 11th May 2013 International Technology Alliance In Network & Information Sciences NL Processing and Fact Extraction 11th May 2013 David Mott (IBM UK) Stephen Poteet, Ping Xue, Anne Kao (Boeing)

Summary This document summarises the state of the NL processing and fact extraction as at the 11th May 2013. Key areas addressed: significant extensions to the work demonstrated at ACITA: extending the lexicon, based upon WordNet and VerbNet resources, which have been turned into CE generalising the semantic rules, allowing automatic processing based upon the linguistic structures for verbs defined in VerbNet extending the analysis of complex verb phrases, including auxillary verbs (have/be etc), providing information about tenses, aspect and active/passive voice extending CE by use of “linguistic frames”, allowing the structure of CE to be changed by configuration initial analysis of the LKB structures and how they might be represented in CE, including consideration of “complements” to verb phrases

References WordNet VerbNet George A. Miller (1995). WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41. Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. VerbNet Karin Kipper, Anna Korhonen, Neville Ryant, Martha Palmer, A Large-scale Classification of English Verbs, Language Resources and Evaluation Journal, 42(1), pp. 21-40, Springer Netherland, 2008. Karin Kipper Schuler, Anna Korhonen, Susan W. Brown, VerbNet overview, extensions, mappings and apps, Tutorial, NAACL-HLT 2009, Boulder,Colorado.

Progress since ACITA12 Extensions to CE Exploring the LKB More comprehensive common model of language Handling compound verb phrases: “is being loved” attempting to get tense, voice, aspect, case constructing temporal relations between situations and utterance Using Wordnet defining lexicon of nouns and senses adding in inflections adding in links to Domain Specific concepts Using Verbnet defining lexicon of verbs with typical grammatical patterns (NP V NP) and senses Extensions to CE based on semantic chart parser and linguistic frames Exploring the LKB based on linguistic frames

Architecture CEStore Message PreProcessor Stanford Parser Entity Domain Noun Sense – Concept Domain Verb Sense – Concept SYNCOIN Reports Reference (places, orgs) WordNet VerbNet Message PreProcessor Stanford Parser Entity Extractor Situation Extractor CEStore Domain Specific Reasoning CE “Styliser” Analysts “Helper” Missing word-concept links

Common Model of language

Purpose A comprehensive CE model of how language works, consistent with linguistic practice allows us to build fact extraction applications configured by CE Allows specification of a rich lexicon (or dictionary) how words express concepts in the CE domain model how different forms of words are related (eg singular/plural) representing semantic information to guide parsing and handle ambiguities To be consistent in the future with existing lexical resources Allows us to use Cambridge technology and lexicon (ERG, LKB) To facilitate use of shallow and deep semantics in a consistent manner Can build “simple” lexical processing but using the same model as more complex lexical processing

Some Key Concepts in the Lexicon as written on the page word words seen as inflections, and parts of speech the word |hits| is written as grammatical form unique meanings the plural noun |hits_NNS| is a form of is an inflection of word sense concepts the singular noun |hit_NN| the noun sense |HIT_n_1| expresses the verb sense |HIT_v_8| the present third singular verb |hits_VBZ| CE concept is an inflection of the entity concept hit the entity concept ‘hit situation’ the base form verb |hit_VB| conceptualise a ~ hit ~ H that is a force. a ~ hit situation ~ S that is an attack.

Building Meaning Grammatical disambiguation Semantic disambiguation (POS tagging) word Semantic disambiguation (context, phrase structures, constraints, avoiding inconsistencies) the word |hits| is written as grammatical form the plural noun |hits_NNS| is a form of is an inflection of word sense the singular noun |hit_NN| the noun sense |HIT_n_1| expresses the verb sense |HIT_v_8| the present third singular verb |hits_VBZ| CE concept is an inflection of the entity concept hit the entity concept ‘hit situation’ the base form verb |hit_VB| conceptualise a ~ hit ~ H that is a force. a ~ hit situation ~ S that is an attack.

Other Concepts in the Lexical Model Parts of Speech Phrases head/dependent Features tense/aspect/voice number/person Roles agent/patient/theme/location… Selectional Restrictions (as rules) Ambiguities assumption based reasoning Linguistic Frames

Information Flow NPs VPs PPs parse tree noun verb chain dependency wordnet conceptual model “expresses” reference entities parse tree NPs verb chain noun dependency fragment typed entities specific entities VPs PPs verb dependency fragment situations and generic roles specific roles and relations tenses verbnet conceptual model “expresses” temporal relations dialog context containment relations

Analysing Tense, Aspect, Voice, Mood, Case Tense: the relationship between a situation and the time of utterance before,after,during Aspect: an indication of the time “profile”, eg: an instant in time (punctual) vs a duration completed or ongoing Combining Tense and Aspect: past simple (tense = before, aspect = punctual or habitual) he drank decaffinated coffee Voice: active or passive Mood: should/must/would/can/could Case: singular/plural male/female 1st/2nd/3rd Negation? : “not loved”. Complex Area! just starting here

Represent as features the verb phrase #1 has the person category ‘first’ as feature and has the number category ‘singular’ as feature and has the tense category ‘past’ as feature and has the aspect category ‘progressive’ as feature … Actually we will assign features to verb chains not individual VPs, see later

Detecting chains of nested VPs “are conducting” … the verb phrase #101 has as head has as dependent the present tense verb |are_VBP| the verb phrase #102 ( there is a verb phrase chain named #27 that has '2' as length and has the verb phrase #101 as first phrase and has the verb phrase #102 as last phrase and has the present tense verb |are_VBP| as first item and has the gerund participle |conducting_VBG| as main verb ). … has as head the gerund participle |conducting_VBG|

Assigning features to chains if ( the verb phrase chain F has '2' as length and has the verb |are_VBP| as first item and has the gerund participle GP as main verb ) then ( the verb phrase chain F has the tense category 'present' as feature and has the aspect category 'progressive' as feature and has the voice category 'active' as feature ). Note we cannot tell the person or number features, since could be “you-sg/you-pl/they” Should we also assign features to all/some VPs in the chain?

Sample Rules if ( the verb phrase P1 if “has …ed” is active and past find a 2 verb phrase chain if ( the verb phrase P1 has the verb phrase P2 as dependent and has the verb V1 as head ) and ( the verb phrase P2 has the verb V2 as head ) and ( it is false that there is a verb phrase named P0 that has the verb phrase P1 as dependent ) and ( it is false that the verb phrase P2 has the verb phrase P3 as dependent ) then ( there is a verb phrase chain named F that has '2' as length and has the verb phrase P1 as first phrase and has the verb phrase P2 as last phrase and has the verb V1 as first item and has the verb V2 as main verb ). if ( the verb phrase chain F has '2' as length and has the verb |has_VBZ| as first item and has the past participle PP as main verb ) then ( the verb phrase chain F has the tense category 'past' as feature and has the voice category 'active' as feature ).

Timing constraints in CE We already have a conceptualisation of time in CE from the CPM work. There are a set of temporal constraints that can be applied to temporal entities based on Allen’s interval logic A situation is a type of temporal entity TIME the situation s1 occurs after the situation s2 s2 s1 unfortuately this reverses the time layout – maybe occurs before is better the situation s1 occurs immediately after the situation s2 s2 s1 s1 the situation s1 occurs within the situation s2 s2 s1 the situation s1 is ended by the situation s2 s2 etc etc.

Assign timing constraints to situations based on the utterance Utterance: they are conducting surveys TIME conduct situation utterance occurs within stands for utters is referenced in verb phrase sentence (is parsed from) “they are conducting surveys” has as feature tense category ‘past’ if ( the verb phrase VP has the tense category 'past' and stands for the situation SIT ) and ( the situation SIT is referenced in the sentence S ) and ( the utterance UT utters the sentence S ) then ( the utterance UT occurs after the situation S ). This is over simplified

More complex case! Utterance:he said the train would come TIME say situation s1 utterance occurs after with stands for utters is referenced in verb phrase v1 sentence (is parsed from) simultaneously “he said the train would come” features tam category ‘past simple’ MAGIC HERE occurs utterance come situation s2 occurs after how do we get from “he said the train would come” to “the train will come”? utters sentence stands for “the train will come” (is parsed from) verb phrase v2 tam category ‘future simple’ features

Timings are Useful Given the temporal relations we can now: calculate ordering of events calculate precise times of events Useful for: story telling “forensic” analysis

Dependency structures

Verb/Noun dependency fragments Packaging up all of the grammatical “positions” from the parse tree into a single structure: specifier complement modifier For nouns: the noun dependency fragment #1 has the singular noun |lack_NN| as noun and has the prepositional phrase #2 as first complement phrase. For verbs: the verb dependency fragment #3 has the gerund participle |conducting_VBG| as verb and has the noun phrase #208_NP as subject noun phrase and has the verb phrase #210_VP as verb phrase and has the noun phrase #211_NP as first object noun phrase and has the prepositional phrase #218_PP as first complement prepositional phrase. is it better to just have “first complement phrase” etc?

Building a CE lexicon from Verbnet and Wordnet

but which sense? could use gloss to tell user which is which Using WordNet Generate list of nouns and their inflections: the plural noun |surveys_NNS| is an inflection of the singular noun |survey_NN|. the singular noun |survey_NN| is an inflection of the singular noun |survey_NN|. Create links to noun sense: the singular noun |survey_NN| is a form of the noun sense |SURVEY_N_1|. Link from noun sense to conceptual model must be done by user: the noun sense |SURVEY_N_1| expresses the entity concept ‘survey’. Analyst Helper can suggest based on NPs with missing semantics but which sense? could use gloss to tell user which is which work done in the past to construct synsets, hyponyms etc to help better suggestions via Analyst Helper

Using VerbNet Generate list of verbs and their inflections: the past tense verb |conducted_VBD| is an inflection of the base form verb |conduct_VB|. … Create links to verb sense: the base form verb |conduct_VB| is a form of the verb sense |CONDUCT_V_1|. Link from verb sense to conceptual model must be done by user: the verb sense |CONDUCT_V_1| expresses the entity concept ‘conduct situation’. Analyst Helper can suggest based on VPs with missing semantics

VerbNet for computing roles We can compute situation ROLES from information in VerbNet together with parse structure Use Verbnet to provide the grammatical patterns and roles for each verb: the verbnet frame 'hit-18.1_1' has 'NP V NP' as grammatical description and has the verbnet noun pattern 'NP' as specifier pattern and has the attribute concept 'agent role' as specifier role and has the verbnet noun pattern 'NP' as first complement pattern and has the attribute concept 'patient role' as first complement role. For example, the verb “hit” in one sense: the verb sense |HIT_V_1| is constrained by the verbnet frame ‘hit-18.1_1’. Use this to calculate the roles for things described in parse trees, based upon the verb. for each grammatical “position” define the type and the role

Steps Grammatical patterns Finding the Verb Sense for a given verb rules to turn specific parse tree patterns into “fragments” eg a NP_V_NP fragment Finding the Verb Sense for a given verb the base form verb V expresses the verb sense VS Verbnet Frames the verbnet frame 'hit-18.1_1' has 'NP V NP' as grammatical description and has the verbnet noun pattern 'NP' as specifier pattern and has the attribute concept 'agent role' as specifier role and has the verbnet noun pattern 'NP' as first complement pattern and has the attribute concept 'patient role' as first complement role. Linking Verb Sense to Verbnet Frame the verb sense |HIT_V_1| is constrained by the verbnet frame ‘hit-18.1_1’. Mapping from Verb Sense to situation type the verb sense VS expresses the entity concept ‘XXX’ Mapping to situation attributes and relations the hit situation S agentifies the entity concept hitter and patientifies the entity concept ‘thing hit’ and is viewed relationally as the relation concept hits. Just the template, to be overridden by domain user RESULT the thing T1 hits the thing T2. the hit situation S1 has the thing T1 as hitter and has the thing T2 as thing hit. “ifications” open to discussion!

specialising the roles into domain specific roles and relations In more detail SP NP VP V NP the verb dependency fragment f1 has the past tense verb |hit_VBD| as verb and has the noun phrase NP1 as subject noun phrase and has the verb phrase VP as verb phrase and has the noun phrase NP2 as first object noun phrase. hit fragment patterns in rules the past tense verb |hit_VBD| is an inflection of the verb |hit_VB|. the verb |hit_VB| is a form of the verb sense |HIT_V_1|. the noun phrase NP1 stands for the thing T1. the noun phrase NP2 stands for the thing T2. the verb phrase VP stands for the situation S1. the verb sense |HIT_V_1| is constrained by the verbnet frame ‘hit-18.1_1’. LEXICAL CURVE matching the grammatical “positions” against the verbnet frame patterns the verbnet frame hit-18.1_1’ has the verbnet pattern ‘NP’ as specifier pattern and has ‘agent role‘ as specifier role and has the verbnet noun pattern 'NP' as first complement pattern and has ‘patient role’ as first complement role. the situation S1 has the thing T1 as agent role and has the thing T2 as patient role. the noun sense |HIT_N_1| nominalises the verb sense |HIT_V_1|. DOMAIN CURVE specialising the roles into domain specific roles and relations the noun sense |HIT_N_1| expresses the entity concept ‘hit situation’. the hit situation S agentifies the entity concept ‘hitter’ and patientifies the entity concept ‘thing hit’ and is viewed relationally as the relation concept ‘hits’. the hit situation S1 has the thing T1 as hitter and has the thing T2 as thing hit. the thing T1 hits the thing T2.

Sample Rules – Lexical Curve find a fragment with a subject from active voice verb chain in context of a sentence phrase for a given active verb fragment, get a role of a situation from the verb sense associated with the subject verb phrase if ( the sentence phrase SP has the noun phrase NP as dependent and has the verb phrase VP as head ) and ( the verb phrase chain VBC has the verb phrase VP as first phrase and has the verb phrase VP1 as last phrase and has the verb V as main verb and has the voice category 'active' as feature ) and then ( there is a verb dependency fragment named F that has the verb V as verb and has the noun phrase NP as subject noun phrase and has the verb phrase VP1 as verb phrase and has the voice category 'active' as feature ) . if ( the verb dependency fragment F has the noun phrase NP as subject noun phrase and has the verb phrase VP as verb phrase and has the voice category 'active' as feature) and ( the noun phrase NP stands for the thing T ) and ( the verb phrase VP is associated with the verb sense VS and stands for the situation S ) and ( the verb sense VS is constrained by the verbnet frame VNF ) and ( the verbnet frame VNF has the verbnet pattern 'NP' as specifier pattern and has the value ROLE as specifier role ) then ( the situation S has the thing T as #ROLE ) .

Sample Rule – Domain curve apply the domain specific role name to the agent of a situation if ( the situation S is an #EC and has the thing A as agent role ) and ( the situation concept EC agentifies the attribute concept AC ) then ( the situation S has the thing A as #AC ) ).

Issue Some verbs (eg “hit” ) have different patterns in Verbnet: some are grammatical alternatives: John hit the dog John hit the dog with the stick some are under different senses: John hit the dog (filed under “hit”) John hit the hill (filed under “reach”) Do we: just encode the basic ones for now handle this using assumptions? add selectional restrictions to remove/reduce alternatives? eg agent of the sense “|HIT_V_1|” must be volitional Need to involve both the lexicon and the domain in applying selectional restrictions We can extract more from Verbnet, eg selectional restrictions, event semantics This is the current approach, although other work has been reported to suggest how assumptions could be used

from verbnet NP V NP, roles + domain Sample Sentence HTT are conducting surveys in Adhamiya to judge the level of support for Bath’est return. the organisation #64 known as |Htt| conducts the survey #59. from reference entity from verbnet NP V NP, roles + domain from wordnet + domain

Summary of Generic Semantic Processing

Basic Semantics - Nouns Syntax Conceptual Model Via Noun phrase thing “an NP stands for a thing” specific type head noun, inflections, noun sense [WN] expresses reference entities (places, organisations etc) “same as” based on proper names, propagation of information

Basic Semantics - Verbs Syntax Conceptual Model Via Verb Phrase situation “a VP stands for situation” specific situation (e.g. “conduct situation” head verb, inflections, verb sense [VN] expresses timing information compound verb analysis general roles grammatical patterns (eg NP V NP) [VN], compound verb analysis (active/passive), limited selectional restriction specific relations “agentifies”, “patientifies” etc

Basic Semantics - Prepositions Syntax Conceptual Model Via Prepositions relations specific rules eg “in” is a container” too simplistic!

Not yet handled mutual exclusivity of alternatives full selectional restrictions full set of VerbNet: grammatical patterns selectional restrictions event semantics more types of prepositional phrases adjectives relational clauses, subclauses

Domain Reasoning Using the facts extracted to perform reasoning tasks This section is provided to show the complete picture of the current work, but was taken from the ACITA12 paper

Domain Semantics “Communications” 02/24/10 - Cell call is monitored between unknown caller (7678112233) in Amin to Amir Mahallati (7115452376) in Bayaa. The unidentified caller stated: "The team is a failure! The carpet doesn't match! The carpet maker needs to be replaced." The recipient said: "The measurements were perfect, the installers must have failed.” “Communications” SYNCOIN reports speak about monitoring communications between people together with the things that were said conceptualise a ~ communication ~ C that has the agent A as ~ caller ~ and has the agent B as ~ recipient ~ and has the value D as ~ date ~ and has the value T as ~ time ~ and has the value V1 as ~ caller utterance ~ and has the value V2 as ~ recipient utterance ~. the communication C ~ is from ~ the place FROM and ~ is to ~ the place TO. Could be used to analyse connections between agents over time and over geography

Specific communications examples “the thing being “done to” in a communications monitoring is actually a communication” if ( the situation S is a communications monitoring and has the thing T as patient role ) then ( the communications monitoring S monitors the communication T ). “the communication comes from the place where the caller is” if ( the communication C has the agent A as caller ) and ( the agent A is located in the place P ) ( the communication C is from the place P ). No mention of syntax or phrase structure, you don’t need to be a linguist! Interpreting the situation This is where rationale comes in Each of these are simple steps, but require information derived from other steps: How do we know the caller agent in a communication? How do we know where the agent is located?

Rationale shows the steps leading to a fact Looks complex, but each step is simple. It is possible to keep everything in mind by focussing on a small area at once. To get diagram: run qk_debug. Load Model (basic) load inSen3, runrules. Describe th_cestore0000015, find the is from fact. Select. Then switch to poshce, then r^^

Example communication there is a call named '#3' that has the agent #5 as caller and has the agent #15 known as |Amir Mahallati| as recipient and has '02/24/10' as date and has 'The team is a failure! The carpet doesn\'t match! The carpet maker needs to be replaced@' as caller utterance and has 'The measurements were perfect, the installers must have failed@' as recipient utterance and is from the place #13 known as |Amin| and is to the place #21 known as |Bayaa|. CE Store can display communications on maps, from CE extracted facts

The analyst has a new idea! Why don’t we look for “back to back communications with same middle man”? Analyst creates a new concept conceptualise a ~ replay conversation ~ RC that is a thing and has the agent A as ~ first caller ~ and has the agent B as ~ middle man ~ and has the agent C as ~ second recipient ~ and has the communication COM1 as ~ first communication ~ and has the communication COM2 as ~ second communication ~ and has the sequence ( the value ~ first communication ~ and the value ~ second communication ~ ) as identifier. Analyst creates a new rule to define them if ( there is a communication COM1 that has the agent CALLER1 as caller and has the agent RECIPIENT1 as recipient ) and ( there is a communication COM2 that has the agent CALLER2 as caller and has the agent RECIPIENT2 as recipient ) and ( the agent RECIPIENT1 is the same as the agent CALLER2 ) and then ( there is a replay conversation named RC that has the communication COM1 as first communication and has the communication COM2 as second communication and has the agent CALLER1 as first caller and has the agent RECIPIENT1 as middle man and has the agent RECIPIENT2 as second recipient ). Are they passing on the same information to each other? Perhaps we can track linkages between groups? first caller second recipient middleman <> Means concatenation. Here the id of the replay conversation is formed from the ids of the comms. This is unique First communication Second communication

Exploring the analyst’s concepts “add the utterances of the caller and the recipient to see what they are saying” Now we can create the concept of a code with code words, and rules to infer information: Communications contain code words More possibilities: Showing the spread of codewords on a map Use of codes to suggest organisational structure Looks like some form of code: carpet = device?

Holistic View of rules Multiple skills of language, semantics, and domain specific reasoning involved

We can use “linguistic frames” to define CE extensions Extensions to CE Need to extend the CE language for greater expressibility: adjectives prepositions more readable names in context … We can use “linguistic frames” to define CE extensions

Investigating “Linguistic Frames” A linguistic frame is a CE-based structure that defines a step in the NL processing by specifying: the syntax pattern the preconditions/constraints the resulting semantics By applying all of the steps to a sentence we can generate a parse tree and construct the semantics of the sentence: semantics of one part is composed from the semantics of its subcomponents using a chart parser This can be applied to parsing both CE and NL can define CE extensions language complexity Basic CE Extended CE Natural Language

Structure of Linguistic Frame the frame finds an instance of this component in the sentence WHEN there is a linguistic frame named F that defines the <phrase type> S and has the sequence SEQ as syntax and has the statement that CE_STATEMENT as preconditions and has the statement that CE_STATEMENT as semantics and has the thing T as portal variable. this sequence of words and phrases is present in the sentence AND these statements are true IN WHICH CASE these statements define the semantics of the component and the result is passed back up the tree though this variable

Defining Basic CE in linguistic frames! there is a linguistic frame named np1 that defines the noun phrase NP and has the sequence ( the determiner '|the_DT|' , the singular noun COMMON , and the proper noun PN ) as syntax and has the statement that ( the proper noun PN has the value NAME as written form ) and ( the singular noun COMMON is a form of the noun sense NS ) and ( the noun sense NS expresses the entity concept EC ) as preconditions and ( there is a thing named NAME that is an #EC ) as semantics and has the thing NAME as portal variable. syntax semantics

Extensions to CE can be defined… Intuition: we can represent adjectives as “XXX thing”, eg convert “John is red” to “John is a red thing” there is a linguistic frame named adjective_predicate that defines the verb phrase VP and has the sequence ( the present third singular verb '|is_VBZ|' , and the adjective ADJ ) as syntax and has the statement that ( the adjective ADJ has the value WF as written form ) and ( the value ECTERM = the value WF <> the constant ‘ thing’) and ( the entity concept EC has the value ECTERM as concept name ) as preconditions and ( the thing X is a #EC ) as semantics and has the thing X as portal variable. clunky way to say that the new concept is called, for example ‘red thing’

Representing the LKB in CE The Linguistic Knowledge Builder (LKB) is a syntactic/semantic system developed by Prof Ann Copestake, University of Cambridge In BPP13 we are planning to use resources from the LKB, such as lexicons, parsing rules and semantic processing as an initial step, we are exploring the use of linguistic frames to represent LKB grammars in CE

LKB “Rules” s_rule := phrase & [ CATEG s, NUMAGR #1, ARGS [ FIRST [ CATEG np, NUMAGR #1 ], REST [ FIRST [ CATEG vp, REST *null* ]]] . np_rule := phrase & [ CATEG np, ARGS [ FIRST [ CATEG det, REST [ FIRST [ CATEG n, some simple rules in LKB to force agreement of number between a subject noun phrase and the verb, as in “the dogs sleep” (number = plural)

Equivalent in CE Linguistic frames there is a linguistic frame named s1 that defines the sentence phrase S and has the sequence ( the noun phrase NP , and the verb V ) as syntax and has the statement that the noun phrase NP has the agreement AGR as feature and the verb V has the agreement AGR as feature as precondition and the sentence phrase S has the agreement AGR as semantics. there is a linguistic frame named np1 that defines the noun phrase NP and has the sequence ( the determiner DT , and the noun N ) as syntax and has the statement that the determiner DT has the agreement AGR as feature and the noun N has the agreement AGR as feature as precondition and the noun phrase NP has the agreement AGR as semantics. This is just a start, more complex LKB grammars must be addressed

Converging NL and CNL The work on CE extensions and on the representation of the LKB in CE is starting a convergence of Natural Language and Controlled Natural Language parsing, as per the BPP13 proposal…

Converging NL and CNL parsers Natural Language Extended CE Reference English Grammar CNL parser NL parser lexicon Semantic Theory conceptual model Basic CE Is the lexicon just a set of linguistic frames? Better understanding of linguistics Increase stylistic expressibility of CE