Download presentation

Presentation is loading. Please wait.

Published byJarred Lemmon Modified about 1 year ago

1
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011

2
Announcements Wednesday online GP meeting scheduling Seminar on Friday: Luke Zettlemoyer (CSE) Automatic grammar induction Treehouse Friday: Classifiers – Memory Lane

3
Roadmap Motivation: FST applications FST perspectives FSTs and Regular Relations FST Operations

4
FSTs Finite automaton that maps between two strings Automaton with two labels/arc input:output

5
FST Applications Tokenization Segmentation Morphological analysis Transliteration Parsing Translation Speech recognition Spoken language understanding….

6
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects

7
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects FST as generator: Outputs pairs of strings in languages

8
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects FST as generator: Outputs pairs of strings in languages FST as translator: Reads an input string and prints output string

9
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects FST as generator: Outputs pairs of strings in languages FST as translator: Reads an input string and prints output string FST as set relator: Computes relations between sets

10
FSTs & Regular Relations FSAs: equivalent to regular languages

11
FSTs & Regular Relations FSAs: equivalent to regular languages FSTs: equivalent to regular relations Sets of pairs of strings

12
FSTs & Regular Relations FSAs: equivalent to regular languages FSTs: equivalent to regular relations Sets of pairs of strings Regular relations: For all (x,y) in Σ 1 x Σ 2, {(x,y)} is a regular relation The empty set is a regular relation If R 1,R 2 are regular relations, R 1 R 2, R 1 U R 2 and R 1 * are regular relations

13
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages

14
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:

15
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:R1 ={(a n b *,c n )} & R2={(a*b m,c m )}, intersection is {(a n b n,c n )} => not regular

16
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:R1 ={(a n b *,c n )} & R2={(a*b n,c n )}, intersection is {(a n b n,c n )} => not regular Difference

17
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:R1 ={(a n b *,c n )} & R2={(a*b n,c n )}, intersection is {(a n b n,c n )} => not regular Difference Complementation

18
Regular Relation Closures Regular relations are also closed under: Composition:

19
Regular Relation Closures Regular relations are also closed under: Composition: Inversion:

20
Regular Relation Closures Regular relations are also closed under: Composition: Inversion: Operations: Projection:

21
Regular Relation Closures Regular relations are also closed under: Composition: Inversion: Operations: Projection: Identity & cross-product of regular languages

22
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ

23
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ

24
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F

25
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transition relations between states: δsubset Q x (Σuε) x (ΓU ε) x Q

26
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transition relations between states: δsubset Q x (Σuε) x (ΓU ε) x Q FSAs are a special case of FSTs

27
FST Operations Union:

28
FST Operations Union: Concatenation:

29
FST Operations Inversion: Switching input and output labels If T maps from I to O, T -1 maps from O to !

30
FST Operations Inversion: Switching input and output labels If T maps from I to O, T -1 maps from O to I Composition: If T 1 is a transducer from I 1 to O 2 and T 2 is a transducer from O 2 to O 3, then T 1 T 2 is a transducer from I 1 to O 3

31
FST Operations Inversion: Switching input and output labels If T maps from I to O, T -1 maps from O to I Composition: If T 1 is a transducer from I 1 to O 2 and T 2 is a transducer from O 2 to O 3, then T 1 T 2 is a transducer from I 1 to O 3

32
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}

33
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}

34
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}

35
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….} R(T) = {(a,x),(ab,xy),(abb,xyy),…}

36
FST Application Examples Case folding: He said he said

37
FST Application Examples Case folding: He said he said Tokenization: “He ran.” “ He ran. “

38
FST Application Examples Case folding: He said he said Tokenization: “He ran.” “ He ran. “ POS tagging: They can fish PRO VERB NOUN

39
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R Morphological generation: Fox s Foxes Morphological analysis: cats cat s

40
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R

41
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R Morphological generation: Fox s Foxes

42
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R Morphological generation: Fox s Foxes Morphological analysis: cats cat s

43
FST Algorithms Recognition: Is a given string pair (x,y) accepted by the FST? (x,y) yes/no

44
FST Algorithms Recognition: Is a given string pair (x,y) accepted by the FST? (x,y) yes/no Composition: Given a pair of transducers T1 and T2, create a new transducer T1 T2.

45
FST Algorithms Recognition: Is a given string pair (x,y) accepted by the FST? (x,y) yes/no Composition: Given a pair of transducers T1 and T2, create a new transducer T1 T2. Transduction: Given an input string and an FST, compute the output string. x y

46
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q

47
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q Initial state probabilities: Q R +

48
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q Initial state probabilities: Q R + Transition probabilities: δ R +

49
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q Initial state probabilities: Q R + Transition probabilities: δ R + Final state probabilities: Q R +

50
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications

51
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications Closed under union, concatenation, Kleene*, inversion, composition Project to FSAs

52
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications Closed under union, concatenation, Kleene*, inversion, composition Project to FSAs Not closed under intersection, complementation, difference

53
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications Closed under union, concatenation, Kleene*, inversion, composition Project to FSAs Not closed under intersection, complementation, difference Algorithms: recognition, composition, transduction

54
Morphology and FSTs

55
Roadmap Motivation: Representing words A little (mostly English) Morphology Stemming FSTs & Morphology FSTs & Phonology

56
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports

57
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports Many possible surface forms: Televised, television, televise,.. Sports, sport, sporting,…

58
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports Many possible surface forms: Televised, television, televise,.. Sports, sport, sporting,… How can we match?

59
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports Many possible surface forms: Televised, television, televise,.. Sports, sport, sporting,… How can we match? Convert surface forms to common base form Stemming or morphological analysis

60
The Lexicon Goal: Represent all the words in a language Approach?

61
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words?

62
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words? Doable for English Typical for ASR (Automatic Speech Recognition) English is morphologically relatively impoverished

63
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words? Doable for English Typical for ASR (Automatic Speech Recognition) English is morphologically relatively impoverished Other languages?

64
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words? Doable for English Typical for ASR (Automatic Speech Recognition) English is morphologically relatively impoverished Other languages? Wildly impractical Turkish: 40,000 forms/verb; uygarlas¸tıramadıklarımızdanmıs¸sınızcasına “(behaving) as if you are among those whom we could not civilize”

65
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes

66
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language.

67
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix

68
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible

69
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking

70
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking Infix: e.g., hingi humingi (Tagalog)

71
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking Infix: e.g., hingi humingi (Tagalog) Circumfix: e.g., sagen gesagt (German)

72
Two Perspectives Stemming: writing

73
Two Perspectives Stemming: writing write (or writ) Beijing

74
Two Perspectives Stemming: writing write (or writ) Beijing Beije Morphological Analysis:

75
Two Perspectives Stemming: writing write (or writ) Beijing Beije Morphological Analysis: writing write+V+prog

76
Two Perspectives Stemming: writing write (or writ) Beijing Beije Morphological Analysis: writing write+V+prog cats cat + N + pl writes write+V+3rdpers+Sg

77
Ambiguity in Morphology Alternative analyses: Flies

78
Ambiguity in Morphology Alternative analyses: Flies fly+N+Pl Flies fly+V+3rdpers+Sg Saw

79
Ambiguity in Morphology Alternative analyses: Flies fly+N+Pl Flies fly+V+3rdpers+Sg Saw see+V+past Saw

80
Ambiguity in Morphology Alternative analyses: Flies fly+N+Pl Flies fly+V+3rdpers+Sg Saw see+V+past Saw saw+N

81
Multi-linguality in Morphology Morphologically impoverished languages E.g. English

82
Multi-linguality in Morphology Morphologically impoverished languages E.g. English Isolating languages E.g., Chinese

83
Multi-linguality in Morphology Morphologically impoverished languages E.g. English Isolating languages E.g., Chinese Morphologically rich languages: E.g. Turkish

84
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped

85
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped Derivation: Stem + gram. morphone new class E.g. Walk + er walker (N)

86
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped Derivation: Stem + gram. morphone new class E.g. Walk + er walker (N) Compounding: multiple stems new word E.g. doghouse, catwalk, …

87
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped Derivation: Stem + gram. morphone new class E.g. Walk + er walker (N) Compounding: multiple stems new word E.g. doghouse, catwalk, … Clitics: stem+clitic I + ll I’ll; he + is he’s

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google