Presentation is loading. Please wait.

Presentation is loading. Please wait.

Izmit, June 16, 2012 Man-Machine Communication Parsing and Semantic analysis of Natural Language Adam Mickiewicz University in Poznań Dept of Computer.

Similar presentations

Presentation on theme: "Izmit, June 16, 2012 Man-Machine Communication Parsing and Semantic analysis of Natural Language Adam Mickiewicz University in Poznań Dept of Computer."— Presentation transcript:

1 Izmit, June 16, 2012 Man-Machine Communication Parsing and Semantic analysis of Natural Language Adam Mickiewicz University in Poznań Dept of Computer Linguistics and Artificial Intelligence Zygmunt Vetulani

2 Izmit, June 12, 2014

3 Formal Language Terminal symbols : Polish (Turksh ?) words Non-terminal symbols : grammar category symbols (S, Sb, Ob,...) Initial symbol : S Context free rules (rewritting rules, production rules): ordered pairs, where LS is a single non-terminal symbol, RS is a finite sequence of terminal and/or non-terminal symbols Notation : instead we use to write LS  RS The re-writing operation consists in replacement od the non-terminal symbol LS by the sequence PS for some rule LS  PS Context free grammar is an arbitrary set of context-free rules Izmit, June 12, 2014

4 Derivation: P  */G W : „W is the result of a finite iteration of the rewriting operations (first applied to P) on the basis of the context-free rewriting rules from G: By the „language defined by the grammar G” we mean the following set L(G): L(G) = df {W: for S being the initial symbol S  */G W, where W is a sequence of terminal symbols} IDEA: Consider Polish (Turkish etc.) as formal language (see Montague) Computational PROBLEM : Find such a grammar that: L(G) = the set of all correct sentences of Polish (similar for Turkish, English, …) Izmit, June 12, 2014

5 For the CF grammars there exist effective parsing algorithms BUT some (rare) natural languages are not context-free. (as e.g. one of the Swiss-German dialects (by Zürich), where constructions like NP 1 NP 2 NP 3 NP 4 NP 5 V 1 V 2 V 3 V 4 V 5 are in use. In (Gazdar & Mellish, 1989), p. 134, one can find an example (in English transcription) where the sentence with the meaning equivallent to Claudia watched Helmut (who) let Eva to help Hans make Ulrike work will have the surface form as follows Claudia Helmut Eva Hans Ulrike watched let help make work. Izmit, June 12, 2014

6 An example of a CF grammar for a small fragment of Polish: a grammar for the sentence „Ala owns a nice cat” ("Ala ma ładnego kota") (and some other). S  Sb Pred Sb  NG NG  Name NG  Adj Noun NG  NG Conective NG Pred  Verb NG Verb  "ma" Verb  "miał" Verb  "miała”. Name  "Ala" Name  "Olek" Adj  "ładnego" Adj  "złego" Noun  "kota" Noun  "psa" Connective  "oraz " Connective  ”i " Izmit, June 12, 2014 S stands for „sentence” Sb stands for „subject” NG stands for „noun group” Pred stands for „predication phrase”

7 DCG notation ( DCG – Definite Clause Grammar) The DCG notation allows the usage of parameters in combination with the non-terminal and terminal symbols. These symbols must open special argument positions for parameters Nonterminal symbols with parameters (arguments) are terms in which argument positions are occupied by these parameters (constants or variables). Example : definition of a „sentence” in the DCG notation : S  Sb(Number,Number,nominative) Pred(Number,Gender) Pred(Number,Gender)  Verb(Number,Gender) NG(_,_,accusative)... Izmit, June 12, 2014

8 Translating grammars to PROLOG CF grammars may be „translated” to PROLOG using the method of difference lists By difference list we mean a pair of two lists, the second of them is the n-th tail of the first one. Let (A,B) be a difference list. Then what remains from A after the substraction of B is called difference determined by (A,B). For the non-terminal symbol K and the difference list (A,B) the predicate K(A,B) may be interpreted as follows: ”the category K is attributed to the difference determined by the difference list (A,B)” Izmit, June 12, 2014

9 The context free rule C  A B may be interpreted as follows „the given list is of category C if its beginning is of the category A and the rest is of the category B”. In terms of difference lists we may say it as follows : „if the difference (X0,X1) is of the category A, and the difference (X1,X2) is of the category B, then the difference (X0,X2) is of the category C". This last statement is expressed in PROLOG as follows: c(X0,X2) :- a(X0,X1), b(X1,X2). The rule C  A t B where t is a terminal symbol (or a sequence of terminal symbols) may be expressed in PROLOG as follows: c(X0,X2) :- a(X0,[t|X1]), b(X1,X2). The terminal rule C  t is to be translated into PROLOG as: c([t|X],X). Izmit, June 12, 2014

10 The CF rule K  t 0 M 0 t 1 M 1...t n M n t n+1 where t i is a sequence of terminal symbols may be translated into PROLOGU as: k([t 0 |X 0 ],X n+1 ) :- m 0 (X 0,[t 1 |X 1 ]), m 1 (X 1,[t 2 |X 2 ]),..., m n (X n,[t n+1 |X n+1 ]) The translation operation may be extended to context depending rules of the following form Kt  t 0 M 0 t 1 M 1...t n M n t n+1 In this case the correct translation will be: k([t 0 |X 0 ],[t|X n+1 ]) :- m 0 (X 0,[t 1 |X 1 ]), m 1 (X 1,[t 2 |X 2 ]),..., m n (X n,[t n+1 |X n+1 ]) (In this notation [t|X] stands for concatenation of the list t with the list X. ) Izmit, June 12, 2014

11 Let us notice, that the following is a conditional statement about sequences of words in terms of relations (predicates) associating categories to difference lists. It is possible to generalise this formalisation considering category symbols as parameters denoting relations (rather then predicate symbols). For this purpose we introduce a general relation segment. We then translate S  Sb Pred into segment(s,X 0,X 2 ) :- segment(sb,X 0,X 1 ),segment(pred,X 1,X 2 ). We may of course impose further constraints on the composition of larger fragments from the smaller ones (as parts). E.g. the clause: segment(s,X 0,X 2 ) :- segment(sb,L,R,X 0,X 1 ),segment(pred,L,R,X 1,X 2 ). may be considered as the PROLOG translation of the CF-like parametrized rule S  Sb(L,R) Pred(L,R) Izmit, June 12, 2014

12 Conversion program (in PROLOG) of CF rules into PROLOG (according: Gazdar, Mellish, p. 114) konwersja((LSwe->PSwe),(LSwy:-PSwy)) :- LSwy=..[LSwe,S0,SN], dodzmienne(PSwe,S0,SN,PSwy). dodzmienne((PSwe1,PSwe2),S0,SN,PSwy) :- !, dodzmienne(PSwe1,S0,S1,PSwy1), dodzmienne(PSwe2,S1,SN,PSwy2), kombinacja(PSwy1,PSwy2,PSwy). dodzmienne(PSwe,S0,SN,true) :- islist(PSwe),!, append(PSwe,SN,S0). dodzmienne(PSwe,S0,SN,PSwy) :- atom(PSwe), PSwy=..[PSwe,S0,SN]. kombinacja(true,PSwy2,PSwy2):-!. kombinacja(PSwy1,true,PSwy1):-!. kombinacja(PSwy1,PSwy2,(PSwy1,PSwy2)). Izmit, June 12, 2014

13 islist([]). islist([_|_]). % conversion of CF rules to Prolog konw_gram(G,P):-islist(G),konw_gram0(G,P). konw_gram0([],[]):-!. konw_gram0([R|G],[C|P]):-konwersja(R,C),konw_gram0(G,P). % conversion with the tnstalation procedure (in PROLOG), where assertz_prog performs the asert of the translated clauses install_gram(G):-konw_gram(G,P),assertz_prog(P). assertz_prog(P):-islist(G),assertz_prog0(P). assertz_prog0([]):-!. assertz_prog0([C|P]):-assertz(C),assertz_prog0(P). Notice: the above conversion procedure is simplified and the produced code may not behave correctly because of the „left-recursion” (it may need to be corrected manually) Izmit, June 12, 2014

14 PROLOG translation of the grammar"Ala-ma-kota" into a RECOGNIZER s(X0,X2) :- sb(X0,X1), pred(X1,X2). sb(X0,X1) :- ng(X0,X1). ng(X0,X1) :- name(X0,X1). ng(X0,X1) :- noun(X0,X1). ng(X0,X2):- adj(X0,X1), noun(X1,X2). ng(X0,X2):- adj(X0,X1), name(X1,X2). ng(X0,X2):- ng(X0,["oraz"|X1]), ng(X1,X2). %left-recursion in „ng” pred(X0,X2):- verb(X0,X1), ng(X1,X2). verb(["ma"|X0],X0). verb(["miała"|X0],X0). verb(["kupił"|X0],X0). verb(["zobaczył"|X0],X0). verb(["zobaczyła"|X0],X0). adj(["ładna"|X0],X0). adj(["ładnego"|X0],X0). adj(["zły"|X0],X0). adj(["zła"|X0],X0). adj(["złego"|X0],X0). name(["ala"|X0],X0). name(["alę"|X0],X0). name(["olek"|X0],X0). name(["olka"|X0],X0). noun(["kot"|X0],X0). noun(["kota"|X0],X0). noun(["pies"|X0],X0). noun(["psa"|X0],X0)............ Izmit, June 12, 2014

15 Left-recursion solving ng(X0,X1) :- ng0(X0,X1). ng(X0,X1) :- ng1(X0,X1). ng0(X0,X1) :- name(X0,X1). ng0(X0,X1) :- noun(X0,X1). ng0(X0,X2):- adj(X0,X1), noun(X1,X2). ng0(X0,X2):- adj(X0,X1), name(X1,X2). ng1(X0,X2):- ng0(X0,["oraz"|X1]), ng(X1,X2). %left-recursion eliminated pred(X0,X2):- verb(X0,X1), ng(X1,X2). %Conjunction ng1(X0,X3):- ng0(X0,X1]), connective(X1,X2), ng(X2,X3). connective(["i"|X0],X0). connective(["oraz"|X0],X0). connective(["lub"|X0],X0). connective([","|X0],X0). Izmit, June 12, 2014

16 PROLOG translation of the grammar "Ala-ma-kota" into a PARSER s(sentence(A,B),X0,X2) :- sb(A,X0,X1),pred(B,X1,X2). pred(predicate_group(A,B),X0,X2):- verb(A,X0,X1), ng(B,X1,X2). sb(subject(A),X0,X1) :- ng(A,X0,X1). ng(noun_gr_simple(A),X0,X1) :- ng0(A,X0,X1). ng(noun_gr_comp(A),X0,X1) :- ng1(A,X0,X1). ng0(name(A),X0,X1) :- name(A,X0,X1). ng0(noun(A),X0,X1) :- noun(A,X0,X1). ng0(adj_noun(A,B),X0,X2):- adj(A,X0,X1), noun(B,X1,X2). ng0(adj_name(A,B),X0,X2):- adj(A,X0,X1), name(B,X1,X2). ng1(noun_gr_con(A,B,C),X0,X3):- ng0(A,X0,X1]), connective(B,X1,X2), ng(C,X2,X3). Izmit, June 12, 2014

17 verb(verb(['mieć',3,sing,_,present]),["ma"|X0],X0). verb(verb(['mieć',3,sing,masc.prze]),["miał"|X0],X0). verb(verb(['mieć',3,sing,fem,past]),["miała"|X0],X0). verb(verb(['kupić',3,sing,masc.prze]),["kupił"|X0],X0). verb(verb(['zobaczyć',3,sing,masc.prze]),["zobaczył"|X0],X0). verb(verb(['zobaczyć',3,sing,fem,past]),["zobaczyła"|X0],X0). adj(adjective(['ładny',nom,sing,fem]),["ładna"|X0],X0). adj(adjective(['ładny',gen,sing,fem]),["ładnej"|X0],X0). adj(adjective(['ładny',acc,sing,fem]),["ładną"|X0],X0). adj(adjective(['ładny',nom,sing,m]),["ładny"|X0],X0). adj(adjective(['ładny',gen,sing,m]),["ładnego"|X0],X0). adj(adjective(['ładny',acc,sing,m]),["ładnego"|X0],X0). adj(adjective(['zły',nom,sing,m]),["zły"|X0],X0). adj(adjective(['zły',nom,sing,fem]),["zła"|X0],X0). adj(adjective(['zły',gen,sing,m]),["złego"|X0],X0). adj(adjective(['zły',acc,sing,m]),["złego"|X0],X0). name(name(['Ala',nom,sing,fem]),["ala"|X0],X0). name(name(['Ala',nom,acc,fem]),["alę"|X0],X0). name(name(['Olek',nom,sing,m]),["olek"|X0],X0). name(name(['Olek',acc,sing,m]),["olka"|X0],X0). noun(noun(['kot',nom,sing,m]),["kot"|X0],X0). noun(noun(['kot',acc,sing,m]),["kota"|X0],X0). noun(noun(['pies',nom,sing,m]),["pies"|X0],X0). noun(noun(['pies',acc,sing,m]),["psa"|X0],X0). connective(connective(['i']),["i"|X0],X0). connective(connective(['oraz']),["oraz"|X0],X0). connective(connective(['lub']),["lub"|X0],X0). connective(connective(['przecinek']),[","|X0],X0). Izmit, June 12, 2014

18 Parsing of a sentence " Ala ma ładnego kota " Prolog question: ?- s(T,['ala', 'ma', 'ładnego', 'kota'],[ ]). Answer: T=sentence(subject(noun_gr_simple(name(['Ala',nom,sing,fem]))), predicate_group(verb(['mieć',3,sing,_,present]),noun_gr_simple(ad j_noun(adjective(['ładny',gen,sing,m])),noun(['kot',acc,sing,m] )) )) Notice: modification of a DCG resulted with a self-parsing grammar. Izmit, June 12, 2014

19 A self-parsing grammar may be extended to a semantic grammar through a modification of syntactic production rules with semantic evaluation procedures. The rule structure(...) :- substructure_a(...), substructure_b(...). will be transformed to structure(..., Structure_meaning)):- substructure_a(...,Substructure_meaning_a), substructure_b(..., Substructure_meaning_b), semantics(Substructure_meaning_a, Substructure_meaning_b, Structure_meaning). ‘semantics’ performs meaning composition Example below: application to „Ala ma kota” Izmit, June 12, 2014

20 Assumption: semantic representation of an affirmative sentence (=semantic content) provides information which support what the sentence tell us about the world („world”= database) Remark: Answering questions „czy...?” consists in application of a semantic procedure which calculates the semantic content of the affirmative text following the question word „czy” („whether”) Izmit, June 12, 2014

21 An example of a „world description” in PROLOG owns(ala,kot1). owns(ala,kot2). owns(ola,kot1). owns(ola,pies1). cats(kot1). cats(kot2). dogs(pies1). beatiful(kot1). Izmit, June 12, 2014

22 The Ala-ma-kota grammar with semantics % semantics of an affirmative sentence reports on how the sentence is validated in the world s(sentence(A,B),S,X0,X2) :- sb(A,S1,X0,X1),sb(B,S2,X1,X2),semantics(sentence,S1,S2,S). go(predicate_group(A,B),S,X0,X2):- verb(A,S1,X0,X1), ng(B,S2,X1,X2), semantics(predicate_group,S1,S2,S). sb(subject(A),S,X0,X1) :- ng(A,S,X0,X1). ng(noun_gr_simple(A),S,X0,X1) :- ng0(A,S,X0,X1). ng(noun_gr_comp(A),S,X0,X1) :- ng1(A,S,X0,X1). ng0(name(A),S,X0,X1) :- name(A,S1,X0,X1),semantics(name,S1,S). ng0(noun(A),S,X0,X1) :- noun(A,S1,X0,X1),semantics(rz,S1,S). ng0(adj_noun(adjective([F1|C]),noun([F2|C])),S,X0,X2):- adj(adjective([F1|C]),S1,X0,X1), noun(noun([F2|C]),S2,X1,X2), semantics(adj_noun,S1,S2,S). ng0(adj_name(adjective([F1|C]),prop_name([F2|C])),S,X0,X2):- adj(adjective([F1|C]),S1,X0,X1), name(prop_name([F2|C]),S2,X1,X2), semantics(adj_name,S1,S2,S). ng1(gr_noun_conn(A,B,C),S,X0,X3):- ng0(A,S1,X0,X1), connective(B,S2,X1,X2), ng(C,S3,X2,X3), semantics(noun_comp,S1,S2,S3,S). Izmit, June 12, 2014

23 Ala-ma-kota dictionary verb(verb(['miecx',3,sing,_,present]),owns,['ma'|X0],X0). verb(verb(['miecx',3,sing,masc.prze]),owns,['mialx'|X0],X0). verb(verb(['miecx',3,sing,fem,past]),owns,['mialxa'|X0],X0). verb(verb(['miecx',3,mn,_,present]),owns,['majax'|X0],X0). przym(adjective(['lxadny',nom,sing,fem]),beatiful,['lxadna'|X0],X0). adj(adjective(['lxadny',gen,sing,fem]),beatiful,['lxadnej'|X0],X0). adj(adjective(['lxadny',acc,sing,fem]),beatiful,['zxadnax'|X0],X0). adj(adjective(['lxadny',nom,sing,m]),beatiful,['lxadny'|X0],X0). adj(adjective(['lxadny',gen,sing,m]),beatiful,['lxadnego'|X0],X0). adj(adjective(['lxadny',acc,sing,m]),beatiful,['lxadnego'|X0],X0). name(prop_name(['Ala',nom,sing,fem]),ala,[’Ala'|X0],X0). name(prop_name(['Ala',nom,acc,fem]),ala,[’Alex'|X0],X0). name(prop_name(['Ola',nom,sing,fem]),ola,[’Ola'|X0],X0). name(prop_name(['Ola',acc,sing,fem]),ola,[’Olex'|X0],X0). zaimek_pytajny(zaim_pyt([’kogo',acc,sing,m]),kogo,['kogo'|X0],X0). noun(noun(['kot',nom,sing,m]),cats,['kot'|X0],X0). noun(noun(['kot',acc,sing,m]),cats,['kota'|X0],X0). noun(noun(['pies',nom,sing,m]),dogs,['pies'|X0],X0). noun(noun(['pies',acc,sing,m]),dogs,['psa'|X0],X0). spojnik(connective(['i']),i,['i'|X0],X0). connective(connective(['oraz']),i,['oraz'|X0],X0). Izmit, June 12, 2014

24 Semantic and auxiliary procedures: semantics(adj_noun,S1,S2,S) :- ekstensja_konceptu(S1,E1),ekstensja_konceptu(S2,E2), przekroj(E2,E1,S). semantics(name_z_prz,S1,S2,S) :- ekstensja_konceptu(S1,E1), ekstensja_nazwy(S2,E2), przekroj(E2,E1,S). semantics(rz,S,E) :- ekstensja_konceptu(S,E). semantics(name,S,E) :- ekstensja_nazwy(S,E). semantics(noun_comp,S1,i,S3,S) :- suma_zbiorow(S1,S3,S). semantics(predicate_group,S1,S2,[S1,S2]). semantics(sentence,S1,[S2,S3],S):-lista_faktow(S2,S1,S3,S). semantics(sentence,[ala],[owns,[kot1,kot2]],S) ekstensja_konceptu(S,E):-T=..[S,X], setof(X,T,E). ekstensja_nazwy(S,[S]). lista_faktow(P,D1,D2,F) :- T=..[P,X,Y],setof(T,(nal-do(X,D1),nal-do(Y,D2),T), F). lista_faktow(owns,[ala],[kot1,kot2],F) suma_zbiorow(X,Y,Z) :- append(X,Y,Z0),bez_powtorzen(Z0,Z). nal-do(E,[E|_]). nal-do(E,[_|X]):-nal-do(E,X). Izmit, June 12, 2014

25 Example of a query: ?- s(X,Y,[‘Ala’,’i’,’Ola’,’mają’,’kota’],[ ]). Answer: X=sentence(subject(noun_gr_comp(gr_noun_connectiv e(name(prop_name([‘Ala’,nom,sing,fem])),connective([i] ),noun_gr_simple(name(name([‘Ola’,nom,sing,fem])))))), predicate_group(verb([miecx,3,mn,_0458,past]),noun_ gr_simple(noun(noun([kot,acc,sing,m]))))) Y= [owns(ala,kot1),owns(ala,kot2),owns(ola,kot1)] Izmit, June 12, 2014

26 The 2-run analysis of a sentence. The first run (N=0) is non-deterministic, the second run (N=1) is deterministic, as it is controlled by the syntactic tree calculated at the first run (X). ?- s(0,X,_,[‘Ala’,’i’,’Ola’,’mają’,’kota’],[ ]), s(1,X,Y,[‘Ala’,’i’,’Ola’,’mają’,’kota’],[ ]). Answer: X=sentence(subject(noun_gr_comp(gr_noun_connective(name(prop_n ame([‘Ala’,nom,sing,fem])),connective([i]),noun_gr_simple(name(prop_ name([‘Ola’,nom,sing,fem])))))),predicate_group(verb([miecx,3,mn,_04 58,past]),noun_gr_simple(noun(noun([kot,acc,sing,m]))))) Y= [owns(ala,kot1),owns(ala,kot2),owns(ola,kot1)] Izmit, June 12, 2014

27 Transformaction to a 2-runs grammar (addition of the variable N with values 0 or 1) s(N,sentence(A,B),S,X0,X2) :- sb(N,A,S1,X0,X1),sb(N,B,S2,X1,X2),semantics(N,sentence,S1,S2,S). question_about_subject(A,B),S,X0,X2) :- interrogative_subject_phrase(N,A,S1,X0,X1),sb(N,B,S2,X1,X2),semantics(N,p yt_o_podmiot,S1,S2,S). sb(N,predicate_group(A,B),S,X0,X2):- verb(A,S1,X0,X1), ng(B,S2,X1,X2), semantics(N,predicate_group,S1,S2,S). sb(N,subject(A),S,X0,X1) :- ng(N,A,S,X0,X1). ng(N,noun_gr_simple(A),S,X0,X1) :- ng0(N,A,S,X0,X1). ng(N,noun_gr_comp(A),S,X0,X1) :- ng1(N,A,S,X0,X1). ng0(N,name(A),S,X0,X1) :- name(A,S1,X0,X1),semantics(N,name,S1,S). ng0(N,noun(A),S,X0,X1) :- noun(A,S1,X0,X1),semantics(N,rz,S1,S). ng0(N,adj_noun(adjective([F1|C]),noun([F2|C])),S,X0,X2):- adj(adjective([F1|C]),S1,X0,X1), noun(noun([F2|C]),S2,X1,X2), semantics(N,adj_noun,S1,S2,S). ng0(N,adj_name(adjective([F1|C]),prop_name([F2|C])),S,X0,X2):- adj(adjective([F1|C]),S1,X0,X1), name(prop_name([F2|C]),S2,X1,X2), semantics(N,adj_name,S1,S2,S). ng1(N,gr_noun_connective(A,B,C),S,X0,X3):- ng0(N,A,S1,X0,X1), connective(B,S2,X1,X2), ng(N,C,S3,X2,X3), semantics(N,noun_comp,S1,S2,S3,S). Izmit, June 12, 2014

28 Transformation of the semantic procedures: semantics(0,_,_,_). semantics(0,_,_,_,_). semantics(0,_,_,_,_,_). semantics(1,adj_noun,S1,S2,S) :- ekstensja_konceptu(S1,E1),ekstensja_konceptu(S2,E2), przekroj(E2,E1,S). semantics(1,name_z_prz,S1,S2,S) :- ekstensja_konceptu(S1,E1), ekstensja_nazwy(S2,E2), przekroj(E2,E1,S). semantics(1,rz,S,E) :- ekstensja_konceptu(S,E). semantics(1,name,S,E) :- ekstensja_nazwy(S,E). semantics(1,noun_comp,S1,i,S3,S) :- suma_zbiorow(S1,S3,S). semantics(1,predicate_group,S1,S2,[S1,S2]). semantics(1,sentence,S1,[S2,S3],S):- lista_faktow(S2,S1,S3,S). Izmit, June 12, 2014

29 Further exercices: take into account accord and government constraints within a sentence processing of ‘czy’-questions processing of questions about subject and complements Izmit, June 12, 2014

Download ppt "Izmit, June 16, 2012 Man-Machine Communication Parsing and Semantic analysis of Natural Language Adam Mickiewicz University in Poznań Dept of Computer."

Similar presentations

Ads by Google