Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tree Automata for Automatic Language Translation kevin knight information sciences institute university of southern california.

Similar presentations


Presentation on theme: "Tree Automata for Automatic Language Translation kevin knight information sciences institute university of southern california."— Presentation transcript:

1 Tree Automata for Automatic Language Translation kevin knight information sciences institute university of southern california

2 Outline History of the World (of Automata in NLP) Weighted string automata in NLP – Applications transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits Weighted tree automata in NLP – Applications – Generic algorithms and toolkits Some connections with theory

3 History of the World [Markov 1913] [Shannon 1948] [Chomsky 1956] [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] consonant/vowel sequences in Pushkin novels noisy channel model cryptography context free grammars transformational grammars tree transducers, to formalize transformational grammars

4 Transformational Grammar S VP NP DTNVNP theboysaw DTN thedoor * S VP AUX was NP DTN thedoor V seen PP PNP by DTN theboy the boy saw the door the door was seen by the boy *

5 History of the World [Markov 1913] [Shannon 1948] [Chomsky 1956] [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] consonant/vowel sequences in Pushkin novels noisy channel model cryptography context free grammars transformational grammars tree transducers, to formalize transformational grammars

6 History of the World [Markov 1913] [Shannon 1948] [Chomsky 1956] [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] [Thatcher 1973] consonant/vowel sequences in Pushkin novels noisy channel model cryptography context free grammars transformational grammars tree transducers, to formalize transformational grammars tree automata survey article

7 History of the World [Markov 1913] [Shannon 1948] [Chomsky 1956] [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] [Thatcher 1973] consonant/vowel sequences in Pushkin novels noisy channel model cryptography context free grammars transformational grammars tree transducers, to formalize transformational grammars tree automata survey article “The number one priority in the area [of tree automata theory] is a careful assessment of the significant problems concerning natural language and programming language semantics and translation. If such problems can be found and formulated, I am convinced that the approach informally surveyed here can provide a unifying framework within which to study them.”

8 History of the World Linguistics Tree Automata Theory Computers

9 History of the World LINGUISTICS Let’s drop formalism until we understand things better! NATURAL LANGUAGE PROCESSING Let’s build demo systems! THEORY Let’s prove theorems!

10 Natural Language Processing 1970-80s – models of English syntax, demonstration grammars – beyond CFG augmented transition networks (ATN) unification-based grammars (HPSG, LFG,...) – mostly turned out to be formally equivalent to each other … and to Turing machines tree-adjoining grammar (TAG), categorial grammar – mildly context-sensitive grammars Meanwhile, in speech recognition… – probabilistic finite-state grammars of English – built automatically from training data (corpus) – word n-grams – successful paradigm

11 Natural Language Processing 1993 – US agency DARPA presided over forced marriage of speech and language research 1990s – NLP dominated by probabilistic finite-state string formalisms and automatic training – Weighted FSA/FST toolkits 2000s – Re-awakened interest in tree formalisms for modeling syntax-sensitive operations

12 Back to the Outline History of the World (of Automata for NLP) Weighted string automata in NLP – Applications transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits Weighted tree automata in NLP – Applications – Generic algorithms and toolkits Some connections with theory

13 Natural Language Transformations Machine Translation Name Transliteration Compression Question Answering Spelling Correction Speech Recognition Language Generation Text to Speech Input Output

14 Finite-State Transducer (FST) k n i g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* Original input:Transformation: q k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

15 Finite-State (String) Transducer q2 n i g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

16 Finite-State (String) Transducer N q i g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

17 Finite-State (String) Transducer q g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* AY N Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

18 Finite-State (String) Transducer q3 h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* AY N Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

19 Finite-State (String) Transducer q4 t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* AY N Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

20 Finite-State (String) Transducer q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* T qfinal AY N k n i g h t Original input:Transformation: FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

21 Transliteration Angela Knight a n ji ra na i to transliteration Frequently occurring translation problem across languages with different sound systems and character sets. (Japanese, Chinese, Arabic, Russian, English…) Can’t be solved by dictionary lookup.

22 Forward and Backward Transliteration Angela Knight a n ji ra na i to forward transliteration (some variation allowed) Angela Knight a n ji ra na i to backward transliteration (no variation allowed)

23 Practical Problem

24 Transliteration Angela Knight WFST 7 input symbols13 output symbols

25 Transliteration Angela Knight WFST 7 input symbols13 output symbols ra

26 Transliteration Angela Knight WFST P(k | e) WFSA P(e) generate/accept well-formed English sequences make transformations w/o worrying too much about context noisy channel framework

27 Transliteration Angela Knight WFST P(k | e) WFSA P(e) make transformations w/o worrying too much about context noisy channel framework Angela Knight DECODE argmax P(e | k) = e argmax P(e) P(k | e) e generate/accept well-formed English sequences

28 Transliteration Angela Knight WFST B WFSA A WFST D AE N J EH L UH N AY T WFST C a n j i r a n a i t o “generative story”

29 WFST B WFSA A WFST D WFST C a n j i r a n a i t o AE N J IH R UH N AY T AH N J IH L UH N AY T OH + millions more DECODE

30 Machine Translation 美国关岛国际机场及其办公室均接获一 名自称沙地阿拉伯富商拉登等发出的电 子邮件,威胁将会向机场等公众地方发 动生化袭击後,关岛经保持高度戒备。 The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport.

31 Machine Translation “I see a Spanish sentence on the page. How did it get there?” direct model noisy channel model

32 Machine Translation [Brown et al 93] [Knight & Al-Onaizan 98]

33 Machine Translation [Brown et al 93] [Knight & Al-Onaizan 98] WFSA A

34 Machine Translation [Brown et al 93] [Knight & Al-Onaizan 98] WFSA B

35 Machine Translation [Brown et al 93] [Knight & Al-Onaizan 98] WFSA C

36 Machine Translation [Brown et al 93] [Knight & Al-Onaizan 98] WFSA D

37 Machine Translation [Brown et al 93] [Knight & Al-Onaizan 98] WFSA E

38 Other Applications of Weighted String Automata in NLP speech recognition [Pereira, Riley, Sproat 94] lexical processing – word segmentation [Sproat et al 96] – morphological analysis/generation [Kaplan and Kay 94; Clark 02] tagging – part of speech tagging [Church 88] – name finding summarization [Zajic, Dorr, Schwartz 02] optical character recognition [Kolak, Byrne, Resnik 03] decipherment [Knight et al 06]

39 Algorithms for String Automata N-best …… paths through an WFSA (Viterbi, 1967; Eppstein, 1998) EM trainingForward-backward EM (Baum & Welch, 1971; Eisner 2001) Determinization …… of weighted string acceptors (Mohri, 1997) IntersectionWFSA intersection Applicationstring  WFST  WFSA Transducer compositionWFST composition (Pereira & Riley, 1996)

40 String Automata Toolkits for Used in NLP Unweighted – Xerox finite-state calculus plus many children Weighted – AT&T FSM – plus many children Google OpenFST, ISI Carmel, Aachen FSA, DFKI FSM toolkit, MIT FST toolkit …

41 String Automata Toolkits for Used in NLP % echo 'a n ji ra ho re su te ru na i to' | carmel -rsi -k 5 -IEQ word.names.50000wds.transducer /* wfsa */ word-epron.names.55000wds.transducer /* wfst */ epron-jpron.1.transducer /* wfst */ jpron.transducer /* wfst */ vowel-separator.transducer /* wfst */ jpron-asciikana.transducer /* wfst */ ANGELA FORRESTAL KNIGHT 2.60e-20 ANGELA FORRESTER KNIGHT 6.00e-21 ANGELA FOREST EL KNIGHT 1.91e-21 ANGELA FORESTER KNIGHT 1.77e-21 ANGELA HOLLISTER KNIGHT 1.33e-21

42 The Beautiful World of Composable Transducers P(e)P(f|e)P(e|f) P(p|e) P(e|p) P(p|e) P(r|e) P(f|r) P(r) English word sequence Foreign word sequence English word sequence English phoneme sequence English word sequence English phoneme sequence Foreign phoneme sequence P(l|e) Long English word sequence

43 The Beautiful World of Composable Transducers P(e)P(f|e)P(e|f) P(p|e) P(e|p) P(p|e) P(r|e) P(f|r) P(r) English word sequence Foreign word sequence English word sequence English phoneme sequence English word sequence English phoneme sequence Foreign phoneme sequence P(l|e) Long English word sequence

44 The Beautiful World of Composable Transducers P(e)P(f|e)P(e|f) P(p|e) P(e|p) P(p|e) P(r|e) P(f|r) P(r) English word sequence Foreign word sequence English word sequence English phoneme sequence English word sequence English phoneme sequence Foreign phoneme sequence P(l|e) Long English word sequence

45 The Beautiful World of Composable Transducers P(e)P(f|e)P(e|f) P(p|e) P(e|p) P(p|e) P(r|e) P(f|r) P(r) English word sequence Foreign word sequence English word sequence English phoneme sequence English word sequence English phoneme sequence Foreign phoneme sequence P(l|e) Long English word sequence

46 The Beautiful World of Composable Transducers P(e)P(f|e)P(e|f) P(p|e) P(e|p) P(p|e) P(r|e) P(f|r) P(r) English word sequence Foreign word sequence English word sequence English phoneme sequence English word sequence English phoneme sequence Foreign phoneme sequence P(l|e) Long English word sequence

47 The Beautiful World of Composable Transducers P(e)P(f|e)P(e|f) P(p|e) P(e|p) P(p|e) P(r|e) P(f|r) P(r) English word sequence Foreign word sequence English word sequence English phoneme sequence English word sequence English phoneme sequence Foreign phoneme sequence P(l|e) Long English word sequence

48 Finite-State String Transducers Nice properties  Nice toolkits 20042005 Translation Accuracy 20022003 NIST Common Evaluations 2006 phrase substitution/transposition

49 Finite-State String Transducers Not expressive enough for many problems! For example, machine translation: – Arabic to English: Move the verb from the beginning of the sentence to the middle (in between the subject and object) – Chinese to English: When translating noun- phrase “de” noun-phrase, flip the order of the noun-phrases & substitute “of” for “de”

50 Experimental Progress in Statistical Machine Translation 20042005 Translation Accuracy 20022003 NIST Common Evaluations 2006 phrase substitution, no linguistic categories tree transformation, linguistic categories

51 Syntax Started to Be Helpful in 2006 30 35 40 45 apr may jun jul aug sept oct nov dec jan feb mar apr may jun july jan feb 2005 2006 2007 Chinese/English all sentences (NIST-2003) String-based sentences < 16 words (NIST-03/04) Translation Accuracy

52 String-Based Output Gunman of police killed. Decoder Hypothesis #1. 击毙警方 被 枪手

53 String-Based Output Gunman of police attack. Decoder Hypothesis #7. 击毙警方 被 枪手

54 String-Based Output Gunman by police killed. Decoder Hypothesis #12. 击毙警方 被 枪手

55 String-Based Output Killed gunman by police. Decoder Hypothesis #134. 击毙警方 被 枪手

56 String-Based Output Gunman killed the police. Decoder Hypothesis #9,329. 击毙警方 被 枪手

57 String-Based Output Gunman killed by police. Problematic: VBD “killed” needs a direct object VBN “killed” needs an auxiliary verb (“was”) countable “gunman” needs an article (“a”, “the”) “passive marker” in Chinese controls re-ordering Can’t enforce/encourage any of this! Decoder Hypothesis #50,654. 击毙警方 被 枪手 highest scoring output, phrase- based model

58 The gunman killed by police. DT NN VBD IN NN NPB PP NP-C VP S Tree-Based Output Decoder Hypothesis #1. 击毙警方 被 枪手

59 Gunman by police shot. NN IN NN VBD NPB PP NP-C VP S Tree-Based Output Decoder Hypothesis #16. 击毙警方 被 枪手

60 The gunman was killed by police. DT NN AUX VBN IN NN NPB PP NP-C VP S Tree-Based Output Decoder Hypothesis #1923. 击毙警方 被 枪手 highest scoring output, syntax- based model OK, so how does a Chinese string transform into an English tree, or vice-versa?

61 Back to the Outline History of the World (of Automata for NLP) Weighted string automata in NLP – Applications transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits Weighted tree automata in NLP – Applications – Generic algorithms and toolkits Some connections with theory

62 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

63 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

64 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: NP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970), wa, ga

65 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: VBZ enjoys NP VBG listening VP P to NP SBAR music, karewa, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970),, ga

66 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music karekikuongakuowadaisukidesugano Original input:Final output:,,,,,,,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

67 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: q S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

68 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: q S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) q S x0:NPVP  s x0, wa, r x2, ga, q x1 x1:VBZx2:NP 0.2

69 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: s NP PRO he q VBZ enjoys r NP VBG listening VP P to NP SBAR music,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970), wa, ga

70 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: s NP PRO he q VBZ enjoys r NP VBG listening VP P to NP SBAR music,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970), wa, ga s NP PRO  kare he 0.7

71 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: q VBZ enjoys r NP VBG listening VP P to NP SBAR music, karewa, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970),, ga

72 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music karekikuongakuowadaisukidesugano Original input:Final output:,,,,,,,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) To get total probability, multiply probabilities of the individual steps.

73 Top-Down Tree Transducer Introduced by Rounds (1970) & Thatcher (1970) “Recent developments in the theory of automata have pointed to an extension of the domain of definition of automata from strings to trees … parts of mathematical linguistics can be formalized easily in a tree-automaton setting … Our results should clarify the nature of syntax-directed translations and transformational grammars …” (Rounds 1970, “Mappings on Grammars and Trees”, Math. Systems Theory 4(3)) Large theory literature – e.g., Gécseg & Steinby (1984), Comon et al (1997) Once again re-connecting with NLP practice – e.g., Knight & Graehl (2005), Galley et al (2004, 2006)

74 Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) i felt obliged to do my part 我 有 责任 尽 一份 力 RULES ACQUIRED: VBD(felt)  有 VBN(obliged)  责任 VB(do)  尽 NN(part)  一份 NN(part)  一份 力 VP-C(x0:VBN x1:SG-C)  x0 x1 VP(TO(to) x0:VP-C)  x0 … S(x0:NP-C x1:VP)  x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB PRP PRP$ NN Tree-to-String Transducer, used (noisy-channel-wise) to do string to tree translation.

75 Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) i felt obliged to do my part 我 有 责任 尽 一份 力 RULES ACQUIRED: VBD(felt)  有 VBN(obliged)  责任 VB(do)  尽 NN(part)  一份 NN(part)  一份 力 VP-C(x0:VBN x1:SG-C)  x0 x1 VP(TO(to) x0:VP-C)  x0 … S(x0:NP-C x1:VP)  x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB PRP PRP$ NN Tree-to-String Transducer, used (noisy-channel-wise) to do string to tree translation.

76 Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) i felt obliged to do my part 我 有 责任 尽 一份 力 RULES ACQUIRED: VBD(felt)  有 VBN(obliged)  责任 VB(do)  尽 NN(part)  一份 NN(part)  一份 力 VP-C(x0:VBN x1:SG-C)  x0 x1 VP(TO(to) x0:VP-C)  x0 … S(x0:NP-C x1:VP)  x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB PRP PRP$ NN Tree-to-String Transducer, used (noisy-channel-wise) to do string to tree translation.

77 Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) i felt obliged to do my part 我 有 责任 尽 一份 力 RULES ACQUIRED: VBD(felt)  有 VBN(obliged)  责任 VB(do)  尽 NN(part)  一份 NN(part)  一份 力 VP-C(x0:VBN x1:SG-C)  x0 x1 VP(TO(to) x0:VP-C)  x0 … S(x0:NP-C x1:VP)  x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB PRP PRP$ NN Additional extraction methods: (Galley et al, 2006) (Marcu et al, 2006) Current systems learn ~500m rules.

78 Sample “said that” rules 0.57VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说, x0 0.09VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说 x0 0.02VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 他 说, x0 0.02VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 指出, x0 0.02VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> x0 0.01VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 表示 x0 0.01VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说, x0 的 VP VBDSBAR-C INx0:S-C that said ?

79 Sample Subject-Verb-Object Rules CHINESE / ENGLISH 0.82S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 x2 x3 0.02S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1, x2 x3 0.01S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0, x1 x2 x3 ARABIC / ENGLISH 0.54S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 x2 x3 0.44S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x1 x0 x2 x3 S x0:NP-CVP x1:VBDx2:NP-C x3:. ?

80 Decoding argmax P(etree | cstring) etree Difficult search problem – Bottom-up CKY parser – Builds English constituents on top of Chinese spans – Record of rule applications (the derivation) provides information to construct English tree – Returns k-best trees

81 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. Rules apply when their right-hand sides (RHS) match some portion of the input. Syntax-Based Decoding

82 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . Rules apply when their right-hand sides (RHS) match some portion of the input. “these”“Russia”“astronauts”“.”“include”“France”“and” Syntax-Based Decoding

83 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 “France and Russia” “include”“these”“France”“and”“Russia”“astronauts”“.” Syntax-Based Decoding

84 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 “France and Russia” “coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&” Syntax-Based Decoding

85 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 RULE 16: NP(x0:NP, x1:VP)  x1, 的, x0 “astronauts coming from France and Russia” “France and Russia” “coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&” Syntax-Based Decoding

86 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 16: NP(x0:NP, x1:VP)  x1, 的, x0 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 RULE 14: VP(x0:VBP, x1:NP)  x0, x1 “include astronauts coming from France and Russia” “France and Russia” “coming from France and Russia” “astronauts coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&”

87 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 10: NP(x0:DT, CD(7), NNS(people)  x0, 7 人 RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 15: S(x0:NP, x1:VP, x2:PUNC)  x0, x1, x2 RULE 16: NP(x0:NP, x1:VP)  x1, 的, x0 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 RULE 14: VP(x0:VBP, x1:NP)  x0, x1 “These 7 people include astronauts coming from France and Russia” Derivation Tree “France and Russia” “coming from France and Russia” “astronauts coming from France and Russia” “these 7 people” “include astronauts coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&”

88 These7peopleincludeastronautscomingfromFranceandRussia. DTCDVBPNNSIN NNPCCNNPPUNC NP VP NP VP S NNSVBG PP NP Derived English Tree

89 Chinese/English Translation Examples Chinese gloss: six unit Iraq civilian today in Iraq south part possessive protest in, police and UK troops shot killed. Machine Translation: Police and British troops shot and killed six Iraqi civilians in protests in southern Iraq today.

90 Chinese/English Translation Examples Chinese: Machine Translation: Currently, a total of 74 types of medicine prices increased after the price of medicines will account for more than 40 per cent of medicines sold by India. 印度 目前 共 有 74 种 控价 药, 增加 后 的 控价 药品 将 占 印度 所售 药品 的 40% 以上 。

91 First, this is not a sentence. The VP below is not finite (e.g., “visited Iran”). Second, even if the S-C really were a sentence, the verb “discussed” doesn’t take an S argument. So this is a bogus VP. Third, even if the lower VP weren’t bogus, “confirms” only takes a certain type of VP, namely a gerund (“confirms discussing the idea”). Arabic-English translation

92 Tree Automata Operations for Machine Translation? e = yield(best-tree(intersect(lm.rtg, b-apply(cstring, tm.tt))) Weighted tree grammar that accepts/scores English trees Weighted tree-to-string transducer that turns English trees into Chinese strings argmax P(etree | cstring) etree

93 Tree Automata Algorithms String Automata Algorithms Tree Automata Algorithms N-best …… paths through an WFSA (Viterbi, 1967; Eppstein, 1998) … trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005) EM trainingForward-backward EM (Baum/Welch, 1971; Eisner 2003) Tree transducer EM training (Graehl & Knight, 2004) Determinization …… of weighted string acceptors (Mohri, 1997) … of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005) IntersectionWFSA intersectionTree acceptor intersection (despite CFG not closed) Applying transducersstring  WFST  WFSAtree  TT  weighted tree acceptor Transducer compositionWFST composition (Pereira & Riley, 1996) Many tree transducers not closed under composition (Rounds 70; Engelfriet 75)

94 Tree Automata Toolkits for Used in NLP Tiburon: Weighted tree automata toolkit – Developed by Jonathan May, USC/ISI – First version distributed in April 2006 – Includes tutorial – Inspired by string automata toolkits www.isi.edu/licensed-sw/tiburon [May & Knight 06]

95 Tree Automata Toolkits for Used in NLP % echo "A(B(C) B(B(C)))" | tiburon -k 1 - even.rtg three.rtg A(B(C) B(B(C))): 3.16E-9 % echo "A(B(C) B(C))" | tiburon -k 1 - even.rtg three.rtg Warning: returning fewer trees than requested 0

96 Back to the Outline History of the World (of Automata for NLP) Weighted string automata in NLP – Applications transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits Weighted tree automata in NLP – Applications – Generic algorithms and toolkits Some connections with theory

97 Desirable Properties of Transducer Formalism Expressiveness – Can express the knowledge needed to capture the transformation & solve the linguistic problem Modularity – Can integrate smaller components into bigger systems, co-ordinate search Inclusiveness – Encompasses simpler formalisms Teachability – Can learn from input/output examples

98 Desirable Formal Properties of Transformation Formalism Modularitybe closed under composition Inclusivenesscapture any transformation that a string-based FST can Teachabilitygiven input/output tree pairs, find locally optimal rule probabilities in low-polynomial time Expressivenesssee next few slides

99 Expressiveness S XVP YZ Y X Z S PROVP VBXthere are hay X NP X PP of PY Y X Re-Ordering Non-constituent Phrases Lexicalized Re-Ordering VP VBZVBG is está cantando Phrasal Translation singing VP VBX PRT put poner X Non-contiguous Phrases on NPB DTX the X Context-Sensitive Word Insertion/Deletion some necessary things for machine translation

100 Expressiveness S VP NP DTNVNP theboysaw DTN thedoor * S S’ CONJ VNP wa- [and] ra’aa [saw] N atefl [the boy] NP N albab [the door] Local rotation

101 Desirable Formal Properties of Transformation Formalism How do different tree formalisms fare? Expressivenessdo local rotation Modularitybe closed under composition Inclusivenesscapture any transformation that a string-based FST can Teachabilitygiven input/output pairs, find locally optimal rule probabilities in low- polynomial time

102 Top-down Tree Transducers S x0x2x1 S VP x2 x0 LNT T – top-down L – linear (non-copying) N – non-deleting arabic verb arabic subject arabic object one-level LHSmultilevel RHS every rule has this form

103 S x0x2x1 T – top-down L – linear (non-copying) N – non-deleting arabic verb arabic subject arabic object S VP x2 x0 LT can delete subtrees one-level LHSmultilevel RHS Top-down Tree Transducers

104 S x0x2x1 T – top-down L – linear (non-copying) N – non-deleting arabic verb arabic subject arabic object S x0VP x2x0 T can copy & delete subtrees one-level LHSmultilevel RHS Top-down Tree Transducers

105 q S x0x2x1 T – top-down L – linear (non-copying) N – non-deleting arabic verb arabic subject arabic object S r x1VP s x2q x0s x0 T all employ states one-level LHSmultilevel RHS LTLNT Top-down Tree Transducers

106 LNT LT T copying non-copying deleting non-deleting T – top-down L – linear (non-copying) N – non-deleting

107 LNT LT T copying non-copying deleting non-deleting q S x0x1 ? T – top-down L – linear (non-copying) N – non-deleting S VNPPRO S VP NP V * Expressiveness:

108 LNT LT T copying non-copying deleting non-deleting q S x0x1 ? q S x0x1 S r x1s x1q x0 r VP x0x1 q x0 s VP x0x1 q x1 T – top-down L – linear (non-copying) N – non-deleting S VNPPRO S VP NP V * Expressiveness:

109 Extended (x-) Transducers S x1x2x0 S VP x2 x1 xLNT T – top-down L – linear (non-copying) N – non-deleting x – extended LHS english verb english subject english object multilevel LHSmultilevel RHS can grab more structure possibility mentioned in [Rounds 70] defined in [Graehl & Knight 04] used for practical MT by [Galley et al 04, 06]

110 LNT LT T copying non-copying deleting non-deleting GS’84

111 LNT LT T xLNT xLT xT xT R =T R copying non-copying deleting non-deleting GS’84 GK’04 GS’84 + local rotation + finite-check before delete Expressive power theorems in [Maletti, Graehl, Hopkins, Knight, submitted]

112 LNT LT T xLNT xLT xT xT R =T R copying non-copying deleting non-deleting GS’84 GK’04 GS’84 Expressive enough for local rotation S VNPPRO S VP NP V * Expressiveness: Expressive power theorems in [Maletti, Graehl, Hopkins, Knight, to appear SIAM J. Comput]

113 LNT LT T LB=LT R B copying non-copying deleting non-deleting GS’84 Expressive enough for local rotation xLNT xLT xT xT R =T R GK’04 bottom up transducers

114 copying non-copying deleting non-deleting Closed under composition Expressive enough for local rotation bottom up transducers LNT LT T LB=LT R B GS’84 xLNT xLT xT xT R =T R GK’04

115 Tree transducers are described as generalizing FSTs (strings are “long skinny trees”) Inclusiveness q r A/B q r *e*/B q r A/*e* q A x0 B r x0 q A x0 r x0 q xB r x FST transition Equivalent tree transducer rule But these transitions are not part of traditional tree transducers, which must consume a symbol at each step.

116 xLNT (xRHS, input-e, output-e) LNT (xRHS, output-e) LT T LB=LT R B xLNT (xRHS, output-e) xLT xT xT R =T R copying non-copying deleting non-deleting xLNT (xRHS, e-free) LNT (xRHS, e-free) LNT (xRHS, input-e) xLNT (xRHS, input-e) LNT (xRHS, input-e, output-e) MBOT FST GSM Closed under composition Expressive enough Generalizes FST

117 More Theory Connections Other desirable properties – More expressivity – Other types of teachability – Process trees horizontally as well as vertically – Graph transduction Papers: – Overview of tree automata in NLP [Knight & Graehl 05] – MT Journal [Knight 07] – SIAM J. Comput. [Maletti et al, forthcoming] – CIAA, e.g., Tiburon paper [May & Knight 06] – WATA (Weighted Automata Theory and Applications) – FSMNLP (Finite-State Methods and Natural Language Processing). Subworkshop: “Tree Automata and Transducers” (papers due 4/13/09)

118 Conclusion Weighted string automata for NLP – well understood and exploited Weighted tree automata for NLP – just starting Some connections with theory – of continuing interest Good news from the empirical front – making good progress on machine translation


Download ppt "Tree Automata for Automatic Language Translation kevin knight information sciences institute university of southern california."

Similar presentations


Ads by Google