Natural Language Processing CS480/580. Levels of Linguistic Analysis Phonology---recognize speech sounds Morphology---analysis of word forms (e.g., adding.

Slides:



Advertisements
Similar presentations
Artificial Intelligence: Natural Language and Prolog
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
CSA2050: DCG I1 CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars.
07/05/2005CSA2050: DCG31 CSA2050 Introduction to Computational Linguistics Lecture DCG3 Handling Subcategorisation Handling Relative Clauses.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Syntax. Definition: a set of rules that govern how words are combined to form longer strings of meaning meaning like sentences.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
1 Grammars and Parsing Allen ’ s Chapters 3, Jurafski & Martin ’ s Chapters 8-9.
DEFINITE CLAUSE GRAMMARS Ivan Bratko University of Ljubljana Faculty of Computer and Information Sc.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
Natural Language Processing AI - Weeks 19 Natural Language Processing PART 2 Lee McCluskey, room 2/07
PZ02A - Language translation
Introduction to the Theory of Computation John Paxton Montana State University Summer 2003.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
1 Introduction: syntax and semantics Syntax: a formal description of the structure of programs in a given language. Semantics: a formal description of.
Chapter 3: Formal Translation Models
Natural Language Processing AI - Weeks 19 Natural Language Processing PART 2 Lee McCluskey, room 2/07
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Using Definite Knowledge: NLP and nl_interface.pl Notes for Ch.3 of Poole et.
LING 388: Language and Computers Sandiway Fong Lecture 13: 10/10.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
LING 388: Language and Computers Sandiway Fong Lecture 8.
LING 388: Language and Computers Sandiway Fong 10/4 Lecture 12.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Computing Science, University of Aberdeen1 CS4025: Grammars l Grammars for English l Features l Definite Clause Grammars See J&M Chapter 9 in 1 st ed,
Prolog and grammars Prolog’s execution strategy is useful for writing parsers for natural language (eg. English) and programming languages (interpreters.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Computer Science 112 Fundamentals of Programming II Recursive Processing of Languages.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
1 CS 385 Fall 2006 Chapter 14 Understanding Natural Language (omit 14.4)
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
LING 388: Language and Computers Sandiway Fong Lecture 7.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Syntax and Backus Naur Form
Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
Syntax Why is the structure of language (syntax) important? How do we represent syntax? What does an example grammar for English look like? What strategies.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
The man bites the dog man bites the dog bites the dog the dog dog Parse Tree NP A N the man bites the dog V N NP S VP A 1. Sentence  noun-phrase verb-phrase.
CSA2050 Introduction to Computational Linguistics Parsing I.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Natural Language Processing
CSE 105 Theory of Computation Alexander Tsiatas Spring 2012 Theory of Computation Lecture Slides by Alexander Tsiatas is licensed under a Creative Commons.
Computing Science, University of Aberdeen1 CS4025: Grammars l Grammars for English l Features l Definite Clause Grammars See J&M Chapter 9 in 1 st ed,
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
SYNTAX.
English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Syntax By WJQ. Syntax : Syntax is the study of the rules governing the way words are combined to form sentences in a language, or simply, the study of.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Natural Language Processing Vasile Rus
Definite Clause Grammars
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
David Kauchak CS159 – Spring 2019
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Presentation transcript:

Natural Language Processing CS480/580

Levels of Linguistic Analysis Phonology---recognize speech sounds Morphology---analysis of word forms (e.g., adding s to make a plural etc.) Syntax---sentence structure Semantics---meaning Pragmatics---relation of language to context

Tokenization A string broken into words, punctuations removed, and key information represented as a sequence of words or tokens. E.g., “How are you today?” is converted to [how, are, you, today].

Tokenize.pl lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). tokenize("This is CS480/580 course", X). X = [this, is, cs480580, course]. name(john,X). X = [106, 111, 104, 110].

Template System Templates --- stored sentence patterns Each template is accompanied by a translation schema E.g., [X, is, a, Y] is translated to Y(X). process([X, is, a, Y]) :- Fact =.. [Y, X], assert(Fact). Process([is, X, a T]) :- Query =.. [Y, X], call(Query).

Template.pl grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). write_str([A|B]) :- put(A), write_str(B). write_str([]). read_str_aux(-1, []) :- !. read_str_aux(10, []) :- !. read_str_aux(13, []) :- !. read_str_aux(A, [A|B]) :- read_str(B). do_one_sentence :- write(>), read_str(A), tokenize(A, B), process(B). note(A) :- asserta(A), write('OK'), nl. read_atom(A) :- read_str(B), name(A, B). start :- write('TEMPLATE.PL at your service.'), nl, write('Terminate by pressing Break.'), nl, repeat, do_one_sentence, fail. check(A) :- call(A), !, write('Yes.'), nl. check(_) :- write('Not as far as I know.'), nl. read_num(A) :- read_str(B), name(A, B).

remove_s(A, C) :- name(A, B), remove_s_list(B, D), name(C, D). read_str(B) :- get0(A), read_str_aux(A, B). remove_s_list([115], []). remove_s_list([A|B], [A|C]) :- remove_s_list(B, C). process([B, is, a, A]) :- !, C=..[A, B], note(C). process([A, is, an, B]) :- !, process([A, is, a, B]). process([is, B, a, A]) :- !, C=.. [A, B], check(C). process([is, A, an, B]) :- !, process([is, A, a, B]). process([A, are, B]) :- !, remove_s(A, D), remove_s(B, C), F=..[C, E], G=..[D, E], note((F:-G)). process([does, B, A]) :- !, C=..[A, B], check(C). process([A, B]) :- \+ remove_s(A, _), remove_s(B, C), !, D=..[C, A], note(D). process([A, B]) :- remove_s(A, C), \+ remove_s(B, _), !, E=..[B, D], F=..[C, D], note((E:-F)). process(_) :- write('I do not understand.'), nl. tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). start. TEMPLATE.PL at your service. Terminate by pressing Break. >CS480 is a course. OK >is CS480 a course? Yes. >is cs471 a course? Not as far as I know. >cs471 is a course. OK >is cs471 a course? Yes.

Generative Grammars Templates are inadequate to describe human language (in the last example only sentences that were allowed was X is a Y.) John arrived Max said John arrived Bill claimed Max said John arrived Mary thought Bill claimed Max said John arrived Chomsky’s suggestion: Treat syntax as a problem in set theory---express infinite set as a finite description

Context Free Grammars Phrase Structure Rules – S  NP VP – NP  Det N – N  N PP – N  N N – PP  P NP – VP  IV VP  TV NP VP  DV NP NP Lexical Entries – N  book, cow, course, … – P  in, on, with, … – Det  the, every, … – IV  ran, hid, … – TV  likes, hit, … – DV  gave, showed Noam Chomsky

Context-Free Derivations S  NP VP  Det N VP  the N VP  the kid VP  the kid IV  the kid ran Penn TreeBank bracketing notation (Lisp-like) – (S (NP (Det the) (N kid)) (VP (IV ran))) Theorem: A sequence has a derivation if and only if it has a parse tree

“Standard” Parse Tree Notation

A simple Parser verb_phrase(A, C) :- verb(A, B), noun_phrase(B, C). verb_phrase(A, C) :- verb(A, B), sentence(B, C). determiner([the|A], A). determiner([a|A], A). sentence(A, C) :- noun_phrase(A, B), verb_phrase(B, C). noun_phrase(A, C) :- determiner(A, B), noun(B, C). noun([dog|A], A). noun([cat|A], A). noun([boy|A], A). noun([girl|A], A). verb([chased|A], A). verb([saw|A], A). verb([said|A], A). verb([believed|A], A). 2 ?- sentence([the, cat, saw, the, dog], []). true. 3 ?- sentence([the, dog, saw, the, dog], []). true. 4 ?- sentence([a, dog, chased, the, cat], []). true. 5 ?- sentence([that, dog, chased, the, cat], []). false.

Definite Clause Grammar (DCG) This is a Prolog notation to provide an easy way to write grammar rules. E.g., sentence  non_phrase, verb_phrase. This is equivalent to the rule: – sentence(X,Z) :- noun_phrase(X,Y), verb_phrase(Y,Z). Also, noun  [dog] or noun  [dog] [cat]; [boy]; [girl] or verb  [gives, up] where “gives up” is a single verb. A query to the above sentence rule will be sentence/2 E.g., sentence([the dog, chased, the, cat],[]). Try sentence([A,B,C,D,E],[]) or sentence([the, A, B, C, cat|E],[]). Non-terminal symbols can also take arguments: e.g., sentence(N)  noun_phrase(N), verb_phrase(N).

Parser2.pl based on DCG sentence --> noun_phrase, verb_phrase. noun_phrase --> determiner, noun. verb_phrase --> verb, noun_phrase. verb_phrase --> verb, sentence. determiner --> [the]. determiner --> [a]. noun --> [dog]; [cat]; [boy]; [girl]. verb --> [chased]; [saw]; [said]; [believed]. verb --> [saw]. verb --> [said]. verb --> [believed].

Grammatical Features How to handle agreement in tense and number between the noun and the verb? sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(singular) --> [dog];[cat];[boy];[girl]. noun(plural) --> [dogs];[cats];[boys];[girls]. verb(singular) --> [chases];[sees];[says];[believes]. verb(plural) --> [chase];[see];[say];[believe].

sentence(plural, [the, dogs, A, B, C],[]). A = chase, B = a, C = dog ; A = chase, B = a, C = cat ; A = chase, B = a, C = boy ; A = chase, B = a, C = girl ; A = chase, B = the, C = dog

Morphology How to generate plural nouns from singular? How to generate third person singular verbs from plural verbs? Mostly by adding: s

Sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(N) --> [X], { morph(noun(N),X) }. verb(N) --> [X], { morph(verb(N),X) }. morph(noun(singular),dog). % Singular nouns morph(noun(singular),cat). morph(noun(singular),boy). morph(noun(singular),girl). morph(noun(singular),child). morph(noun(plural),children). % Irregular plural nouns morph(noun(plural),X) :- % Rule for regular plural nouns remove_s(X,Y), morph(noun(singular),Y). morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(singular),X) :- % Rule for singular verbs remove_s(X,Y), morph(verb(plural),Y). % remove_s(+X,-X1) [lifted from TEMPLATE.PL] % removes final S from X giving X1, % or fails if X does not end in S. remove_s(X,X1) :- name(X,XList), remove_s_list(XList,X1List), name(X1,X1List). remove_s_list("s",[]). remove_s_list([Head|Tail],[Head|NewTail]) :- remove_s_list(Tail,NewTail).

morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(singular),X) :- % Rule for singular verbs remove_s(X,Y), morph(verb(plural),Y). % remove_s(+X,-X1) [lifted from TEMPLATE.PL] % removes final S from X giving X1, % or fails if X does not end in S. remove_s(X,X1) :- name(X,XList), remove_s_list(XList,X1List), name(X1,X1List). remove_s_list("s",[]). remove_s_list([Head|Tail],[Head|NewTail]) :- remove_s_list(Tail,NewTail).