Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finite State Machinery - I Fundamentals Recognisers and Transducers.

Similar presentations


Presentation on theme: "Finite State Machinery - I Fundamentals Recognisers and Transducers."— Presentation transcript:

1 Finite State Machinery - I Fundamentals Recognisers and Transducers

2 4 Reference Outline Websites – Xerox: www.xrce.xerox.com/research/mltt/fst/ – Groningen: grid.let.rug.nl/~vannoord/FSA/fsa.html – AT & T: www.research.att.com/sw/tools/fsm Books/Collections – Karttunen & Oflazer (2000) – Jurafsky & Martin (2000) – Hopcraft and Ullman (1979) – Roche and Schabes (1977) Classic Articles – Kaplan and Kay (1994) – Koskenniemi (1983) – Johnson (1972) Tools – Van Noord et al. – Mohri et al. – Daciuk. – Karttunen & Beesley

3 5 Acknowledgements to Lauri Karttunen, Ken Beesley and colleagues at Xerox. Most materials in this tutorial are from their website. Forthcoming book: Finite State Morphology – Xerox Tools and Techniques.

4 FS Motivation Chomsky hierarchy of language classes based on classes of descriptive notation, and also on asociated classes of machine. Chomsky (1957) dismissed FS grammars, and associated machinery, as fundamentally inadequate for the description of NL.

5 Embedding Basic problem is not that sentences can grow to arbitrary length, it is that the description of a syntactic constitutent may embed any other constituents including the sentence itelf. The dog bit the cat. The dog that the man saw bit the cat. The dog that the man that the horse kicked saw bit the cat. etc

6 On the other hand …... Plenty of language just ain't like that. Words – Orthographic spelling. – Phonological spelling. – Morphology. Fixed expression types (e.g dqtes). Gross constitutent structures (e.g. the big, bad, blue wolf).

7 Recent Application Areas for FS Technology Include POS Tagging Spell Checking Information Extraction Speech Recognition Text to Speech Spoken Dialogue Parsing

8 21 Recognition of Italian Words The coke machine recognises words in the coke machine language. The following machine recognises two words in Italian. Recognition mechanism is language independent. CASA I NQUE

9 22 The Process of Analysis Start in the initial state and at the first symbol of the word. If there is an arc labelled with that symbol, the machine transitions to the next state, and the symbol is consumed. The process continues with successive symbols until.....

10 23 The Process of Analysis One or more of these conditions holds: A. A final state is reached B. All symbols are consumed C. There are no transitions out of a state for the current symbol. – If both A and B, analysis succeeds and the word is recognised. – Otherwise recognition fails.

11 24 Success and Failure I NQUE CASA EENT L LE; CASA; CINQUANTA; LENTEMENTE

12 27 Transducers Recognisers either accept or reject a word. Although this is useful, networks can actually return more substantial information. This is achieved by providing networks with the ability to write as well as to read.

13 28 Basic Transducer Each transition of a transducer is labelled with a pair of symbols rather than with a single symbol. Analysis proceeds as before, except that input symbols are matched against the lower-side symbols on transitions. If analysis succeeds, return the string of upper- side symbols on the path to the final state

14 Confusing Terminology Lower side = surface side. Upper side = "deep" side. Analysis proceeds from lower to upper. Synthesis (generation) proceeds from upper to lower.

15 29 Lexical Transducers In common parlance, a transducer is a device which converts one form of energy into another, e.g. a microphone converts from sound to electrical signals. Next we look at lexical transducers which convert one string of symbols into another.

16 30 Lexical Transducer Example CASA CASE Input: CASE Output: CASA lexical string surface string

17 31 Morphological Analysis R ATNOC OCNT   +VE+SG+1P  O  Input: CONTO Output: CONTARE +V +1P +SG

18 32 Remarks  stands for "epsilon". During analysis, epsilon transitions are taken freely without consuming any input. Note also single symbols with multi- character print names (e.g. +SG). The order of these symbols, and the choice of infinitive as baseform, is determined by linguists.

19 33 Exercise The word "conto" in Italian is also a masculine noun meaning (a) story and (b) bank account Draw the corresponding 2-level networks. How can the different meanings be incorporated into the same network

20 31 Conto +N +SG +N OTNOC OCNT  O +SG  Input: CONTO Output: CONTO +N+SG A 

21 34 Synthesis Transducers are reversible. This means that they can be used to perform the inverse transduction from an transducers. The process of synthesis is the inverse of analysis

22 35 The Process of Synthesis Start at the start state and at the beginning of the input string. Match the input symbols against the upper- side symbols of the arcs, consuming symbols until a final state is reached. If successful, return the string of lower-side symbols (else nothing).

23 36 Morphological Synthesis R ATNOC OCNT   +VE+SG+1P  O  Input: CONTARE +V +1P +SG Output: CONTO N.B.  symbols are ignored on output

24 37 Analysis and Synthesis Upper Side Language (Lexical Strings). Lower Side Language (Surface Strings). Transducer maps between the two. However large the lexical transducer may become, analysis and synthesis are performed by the same language- independent matching techniques.


Download ppt "Finite State Machinery - I Fundamentals Recognisers and Transducers."

Similar presentations


Ads by Google