Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.

Similar presentations


Presentation on theme: "November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation."— Presentation transcript:

1 November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation

2 November 2003Computational Morphology III2 3 Related Perspectives LANGUAGE NOTATION MECHANISM

3 November 2003Computational Morphology III3 3 Related Perspectives REGULAR LANGUAGES REGULAR EXPRESSIONS FINITE STATE NETWORKS denotes encodes compiles into

4 November 2003Computational Morphology III4 Xerox R.E. Notation Atomic Expressions –Normal Symbol –Special Symbol Complex Expressions –Union –Intersection –Concatenation –Closure Other Operators Abbreviations

5 November 2003Computational Morphology III5 Atomic Expressions The simplest kind of RE is a symbol. Typically, a symbol is the sort of item that can appear on the arc of a network. For example, the symbol a is an RE that designates the language containing the string "a" and nothing else Multicharacter symbols such as Plur are also symbols, but they happen to have multicharacter print names.

6 November 2003Computational Morphology III6 Special Atomic Expressions The epsilon (  symbol 0 denotes the empty string language {""}. The ANY symbol ? denotes the language of all single symbol strings. The empty string is not included in ?.

7 November 2003Computational Morphology III7 Empty and Universal Language Machines I NQUE CASA set of strings ? universal language empty language

8 November 2003Computational Morphology III8 Brackets [A] denotes the same language as A [ ] can also be used to denote the empty string language Brackets ensure unique syntax but can sometimes be dropped. Brackets not the same as ( ) which are used for optional elements Checkpoint: what are the FSAs –for a –for (a)

9 November 2003Computational Morphology III9 Complex REs: Union If A and B are arbitrary REs, [A | B] is the union of A and B which denotes the union of the languages denoted by A and B respectively. Union is associative and commutative Checkpoint: Write down the strings in the language denoted by [ a | b | ab].

10 November 2003Computational Morphology III10 Complex REs: Intersection If A and B are arbitrary REs, [A & B] is the intersection of A and B which denotes the intersection of the languages denoted by A and B respectively. Intersection is associative and commutative Checkpoint: Write down the strings in the language denoted by [a | b | c | d | e] & [ab| d | e | f | g]

11 November 2003Computational Morphology III11 Complex REs: Concatenation If A and B are arbitrary REs [A B] is the concatenation of A and B Checkpoint: do the following denote the same languages? – [d o g] – dog – [d og] What are the strings in the language denoted by [[a|b] [c|d]]

12 November 2003Computational Morphology III12 Concatenation of 2 Networks a b c d a b c d  

13 November 2003Computational Morphology III13 Complex REs: Closures A+ denotes the concatenation of A with itself 0 or more times. What is the FSA for a+ ? A* (Kleene Star) denotes [A+ | 0]. What is the FSA for a* ?

14 November 2003Computational Morphology III14 Complex REs: Closures A+ denotes the concatenation of A with itself 0 or more times. A* (Kleene Star) denotes [A+ | 0]. a a a

15 November 2003Computational Morphology III15 Other Operations Complementation: ~A denotes the complement language of A = the set of strings not in A Minus: [A - B] denotes the set difference of the languages denoted by A and B. ([A-B] = [A & ˜B]) Checkpoint: Write a definition of complementation involving minus

16 November 2003Computational Morphology III16 Abbreviations A* Closure (Kleene Star) (A) Optional Element ? Any symbol \b Any symbol other than b ~A Complement (= [?* - A ]) 0 Empty string language $A [ ?* A ?* ]

17 November 2003Computational Morphology III17 String Relations Ordered pair: set having two members (distinct from ). A relation is simply a set of ordered pairs. Some familiar relations over integers: – {,,,,,….}

18 November 2003Computational Morphology III18 Relations and Morphology In morphological analysis and generation, we are typically interested in relations made up of ordered pairs of strings over lexical and surface languages, e.g. {, }.

19 November 2003Computational Morphology III19 Describing Relations Notation: Regular Expressions with extra operations including – Cross Product – Composition Mechanism: Finite State Transducers (i.e. FS networks whose arcs are labelled with ordered pairs of symbols).

20 November 2003Computational Morphology III20 3 Related Perspectives REGULAR RELATIONS XEROX R.E. NOTATION FINITE STATE TRANSDUCERS denotes encodes compiles into

21 November 2003Computational Morphology III21 Cross Product [A.x. B] denotes the relation that pairs every string of language A with every string of language B. Example: [[c a t].x. [c h a t]] Special case and special notation when A and B are symbols: [a.x. b] = a:b

22 November 2003Computational Morphology III22 Symbol Pairs Any pair of symbols a:b denotes a relation that consists of the corresponding pair of strings, i.e. { } The left symbol a is the upper, lexical, symbol; the right symbol b is the lower, surface symbol. No distinction between a:a and a

23 November 2003Computational Morphology III23 Checkpoint A = [a|b|c] B = [b|c|d] What are the elements in the relation A.x. B?

24 November 2003Computational Morphology III24 Composition [A.o. B] denotes the composition of relations A and B. Definition If A contains And B contains Then [A.o. B] contains A and B must be relations. If either is just a language, it is assumed to abbreviate the identity relation.

25 November 2003Computational Morphology III25 Examples     "" }  "a"}  a) Expression Language/Relation Network a  "", "a"} a a

26 November 2003Computational Morphology III26 Exercise: fill in the blanks [a:0 b:a] Expression Language/Relation Network [a 0 b] a* {"","a","aa",..} a+ b:a a:0 a b:0

27 November 2003Computational Morphology III27 Issue The class of Regular Languages is closed under the operations union, intersection, concatenation, and complementatation. Is the same true of Regular Relations? Is the same true if we include the operations of cross product and composition?

28 November 2003Computational Morphology III28 Closure Properties: Definition of P Consider P = [a:b]* [0:c]*. P is a relation that maps a string of zero or more a to an equal length string of b followed by zero or more c {,,,...,,,...,,,...} P is a regular relation

29 November 2003Computational Morphology III29 Closure Properties: Definition of Q Consider Q = [0:b]* [a:c]*. Q is a relation that maps a string of zero or more a to an equal length string of c preceded by zero or more b {,,,...,,,...,,,...} N.B. Q is a regular relation

30 November 2003Computational Morphology III30 Closure Properties: [P & Q] Let us now consider the intersection of P and Q. {,,,,....} The lower side language is clearly b n c n Is this finite state?

31 November 2003Computational Morphology III31 bncnbncn This language is generated by the following context free grammar: S → ε S → aSb It cannot be generated by a regular grammar whose rules must take the form A → a A → aB

32 November 2003Computational Morphology III32 Closure Properties: b n c n Consequently, the language cannot be generated by a FSA, and the same goes for any relation involving that language. Therefore, there is no operation on the FSTs for P and Q that yields their intersection. If not closed under intersection, then not closed under complementation nor subtraction.

33 November 2003Computational Morphology III33 Closure Properties of Regular Languages and Relations Operation Regular Languages Regular Relations Union yes yes Concatenation yes yes Iteration yes yes Intersection yes no Subtraction yes no Complementation yes no Composition n/a yes


Download ppt "November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation."

Similar presentations


Ads by Google