Presentation on theme: "Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict."— Presentation transcript:
Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict possible interpretations. Language is purposeful. There is a goal behind an utterance.
Other Difficulties Ambiguity (different levels) word meanings syntactic structure referential ambiguity intentional ambigiuity Imprecision Idioms, Jargon, Slang Language changes
Analysis of Language Analysis of Language occurs at different levels: Prosody – rhythm and intonation Phonology – sound formation (from phonemes) Morphology – word formation (from morphemes) Syntax – phrase and sentence formation Semantics – applying meaning to expressions Pragmatics – how language is used World knowledge – contextual information
Processing Language Parsing – analyzing the syntactic structure of sentences, often resulting a parse tree. Semantics – analyzing the meaning of sentences, resulting in semantic networks, logical statements, or other KR. Integration of world knowledge – add appropriate knowledge from the domain of discourse. Use of knowledge learning from discourse.
Processing (cont'd) Often, the steps are done sequentially (parse syntax of sentences, make semantic inferences, add domain knowledge, use result), with the output of one stage becoming the input to the next stage. Alternatively, fragments may be pass along once they are determined (incremental parsing). Feedback may be necessary to resolve references (“I shot the bear in my pajamas) – blackboard systems.
Context-Free Grammars A good deal of syntax can be represented by using context-free grammars (cfg). Rules are of the form: and non-terminals are syntactic categories, terminals are words (and punctuation). One non-terminal is “sentence”
CFG Example sent <- np, vp. np <- noun. np <- art, noun. np <- art, adj, noun. vp <- verb. vp <- verb, np. noun <- boy. noun <- dog. art <- a. art <- the. adj <- yellow. verb <- runs. verb <- pets.
Parsing Example | ?- sent([the,boy,pets,a,dog],,M). M = s(np(art(the),noun(boy)),vp(verb(pets),np(art(a),noun(dog)))) ? ; (1 ms) no | ?- sent([the,yellow,dog,runs],,M). M = s(np(art(the),adj(yellow),noun(dog)),vp(verb(run s))) ? ; no
Semantics Since we can use arbitrary Prolog code, it is possible to add tests to the code. For example, we could include a type system and only allow parses that are consistent with the types (for example, only animate actors) In addition, we could return the meaning of the phrase or sentence instead of just a parse tree.
Frame and Slot Notation In this simple example, we will use a frame and slot notation for the meaning of words, phrases, and sentences. A meaning will consist of a pair containing a head item and a list of slots, each of which is an attribute/value pair. Values may be variables to be instantiate at a later time.
Notation Examples For example, a verb in Simmon's semantic representation scheme has attributes agent and object. The meaning of a verb, say, likes, could be represented by the term meaning([likes,[agent,X], [object,Y]], [[agent X],[object,Y]])) X and Y will be instantiated by the meanings of other words and phrases of the sentence.
More on Slots The attribute names may be semantic relationships (agent, object), or surface semantic relationships (adjmod – the thing modified by an adjective, or pobj – the object of a preposition). The slot filler must come from an appropriate part of the sentence as indicated by the grammar.
Another Example prep([over|R], R, meaning([V,[location,[over,X]]A], [[pmod,[V|A]],[pobj,X]])). The preposition over will modify the subject of the preposition V (indicated by pmod) which may already have a list of attributes A. The object of the preposition X, is added to the list of attributes under the attribute name location and value [over,X].
Semantics - Example | ?- sent([i,shot,the,bear,in,my,pajamas],,M). M = meaning([shoot,[location,[in,[pajamas,[owner,me] ]]],[agent,[i]],[object,[bear]],[time,past]],) ? ; ; no
Phrase Structured Grammar These kind of grammars are called phrase structured grammars. As implement in Prolog, they have equivalent computing power of any Turing complete system and yet are simple to follow.
Alternative Methods Chart Parsing (Early) – see book Transition Network Parser: The grammar is represented as a set of finite state machines (transition diagrams). Each FSM implements a non- terminal. Arcs are labeled with non-terminals or terminals. In the former case, a subprogram is invoked (jump to the network for that non-terminal). A path from the start node to the end node indicates acceptance.
Augmented Transition Networks Procedures may be attached to arcs which are triggered when the arcs are traversed. The procedure may perform a test, or set a variable to a value for later use. ATNS are often combines with KR schema to produce a meaning of the sentence or phrase (semantics).
Uses of Natural Language Database Front-end Question and Answering Information Extraction and Summary (Web) Next generation computing Better than keyword search – incorporates context