Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSA3050: Natural Language Algorithms Finite State Devices.

Similar presentations


Presentation on theme: "CSA3050: Natural Language Algorithms Finite State Devices."— Presentation transcript:

1 CSA3050: Natural Language Algorithms Finite State Devices

2 October 2004CSA3050 NLP Algorithms2 Sources Blackburn & Striegnitz Ch. 2

3 October 2004CSA3050 NLP Algorithms3 Parsers vs. Recognisers Recognizers tell us whether a given input is accepted by some finite state automaton. Often we would like to have an explanation of why it was accepted. Parsers give us that kind of explanation. What form does it take?

4 October 2004CSA3050 NLP Algorithms4 Finite State Parser The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4]. The technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found.

5 October 2004CSA3050 NLP Algorithms5 Base Case Recogniser recognize1(Node,[ ]) :- final(Node). Parser parse1(Node,[ ],[Node]) :- final(Node).

6 October 2004CSA3050 NLP Algorithms6 Recursive Case Recogniser recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString). Parser parse1(Node1, String, [Node1,Label|Path]) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), parse1(Node2, NewString, Path).

7 October 2004CSA3050 NLP Algorithms7 Complex Labels So far we have only considered transitions with single-character labels. More complex labels are possible – e.g. symbols comprising several characters. We can construct an FSA recognizing English noun phrases that can be built from the words: the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast.

8 October 2004CSA3050 NLP Algorithms8 FSA for Noun Phrases

9 October 2004CSA3050 NLP Algorithms9 FSA for NPs in Prolog initial(1). final(3). arc(1,2,a). arc(1,2,the). arc(2,2,brave). arc(2,2,fast). arc(2,3,witch). arc(2,3,wizard). arc(2,3,broomstick). arc(2,3,rat). arc(1,3,harry). arc(1,3,ron). arc(1,3,hermione). arc(3,1,with).

10 October 2004CSA3050 NLP Algorithms10 Parsing a Noun Phrase testparse1(Symbols,Parse) :- initial(Node), parse1(Node,Symbols,Parse). ?-testparse1([the,fast,wizard],Z). Z=[1, the, 2, fast, 2, wizard, 3]

11 October 2004CSA3050 NLP Algorithms11 Rewriting Categories It is also possible to obtain a more abstract parse, e.g. ?- testparse2([the,fast,wizard],Z). Z=[1, det, 2, adj, 2, noun, 3] What changes are required to obtain this behaviour?

12 October 2004CSA3050 NLP Algorithms12 1. Changes to the FSA %FSA %Lexicon initial(1). lex(a,det). final(3). lex(the,det). arc(1,2,det). lex(fast,adj). arc(2,2,adj). lex(brave,adj). arc(2,3,cn). lex(witch,cn). arc(1,3,pn). lex(wizard,cn). arc(3,1,prep). lex(broomstick,cn). lex(rat,cn). lex(harry,pn). lex(hermione,pn). lex(ron,pn). lex(with,prep).

13 October 200413 Changes to the Parser Parse1 parse1(Node1, String, [Node1,Label|Path]) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), parse1(Node2, NewString, Path). Parse2 parse2(Node1, String, [Node1,Label|Path]) :- arc(Node1,Node2,Label), traverse2(Label, String, NewString), parse2(Node2, NewString, Path). traverse2(Label,[Symbol|Sym bols],Symbols) :- lex(Symbol,Label).

14 October 2004CSA3050 NLP Algorithms14 Handling Jumps traverse3('#',String,String). traverse3(Cat,[Word|Words],Words) :- lex(Word,Cat).

15 October 2004CSA3050 NLP Algorithms15 Finite State Transducers A finite state transducer essentially is a finite state automaton that works on two (or more) tapes. The most common way to think about transducers is as a kind of ``translating machine'‘ which works by reading from one tape and writing onto the other.

16 October 2004CSA3050 NLP Algorithms16 A Translator from a to b initial state: arrowhead final state: double circle a:b read from first tape and write to second tape

17 October 2004CSA3050 NLP Algorithms17 Prolog Representation :- op(250,xfx,:). initial(1). final(1). arc(1,1,a:b).

18 October 2004CSA3050 NLP Algorithms18 Modes of Operation generation mode: It writes a string of as on one tape and a string bs on the other tape. Both strings have the same length. recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. translation mode (left to right): It reads as from the first tape and writes an b for every a that it reads onto the second tape. translation mode (right to left): It reads bs from the second tape and writes an a for every f that it reads onto the first tape.

19 October 2004CSA3050 NLP Algorithms19 Transducers and Jumps Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes. So, transitions of the form a:# or #:a or #:# are possible.

20 October 2004CSA3050 NLP Algorithms20 Simple Transducer in Prolog transduce1(Node,[ ],[ ]) :- final(Node). transduce1(Node1,Tape1,Tape2) :- arc(Node1,Node2,Label), traverse1(Label, Tape1, NewTape1, Tape2, NewTape2), transduce1(Node2,NewTape1,NewTape2).

21 October 2004CSA3050 NLP Algorithms21 Traverse for FST traverse1(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2). testtrans1(Tape1,Tape2) :- initial(Node), transduce1(Node,Tape1,Tape2).

22 October 2004CSA3050 NLP Algorithms22 Handling Jumps: 4 cases Jump on both tapes. Jump on the first but not on the second tape. Jump on the second but not on the first tape. Jump on neither tape (this is what traverse1 does).

23 October 2004CSA3050 NLP Algorithms23 4 Corresponding Clauses traverse2('#':'#',Tape1,Tape1,Tape2,Tape2). traverse2('#':L2,Tape1,Tape1,[L2|RestTape2],RestTape2). traverse2(L1:'#',[L1|RestTape1],RestTape1,Tape2,Tape2). traverse2(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2).

24 October 2004CSA3050 NLP Algorithms24 Morphological Analysis with FSTs Morphology is concerned with the internal structure of words. –How can a word be decomposed into morphemes? –How do the morphemes combine? –What are legitimate combinations? Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa. Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST.

25 October 2004CSA3050 NLP Algorithms25 Plural Nouns in English Regular Forms –add an s as in wizard+s. –add –es as in witch +s Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative. Irregular forms –mouse/mice –automaton/automata Handled on a case-by-case basis Require transducer that translates wizard+s into wizard+PL, witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL.

26 October 2004CSA3050 NLP Algorithms26 FST for English Plurals

27 October 2004CSA3050 NLP Algorithms27 FST in Prolog lex(wizard:wizard,`STEM-REG1'). lex(witch:witch,`STEM-REG2'). lex(automaton:automaton,`IRREG-SG'). lex(automata:`automaton-PL',`IRREG-PL'). lex(mouse:mouse,`IRREG-SG'). lex(mice:`mouse-PL',`IRREG-PL').


Download ppt "CSA3050: Natural Language Algorithms Finite State Devices."

Similar presentations


Ads by Google