Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spoken Language Translation 1 Intelligent Robot Lecture Note.

Similar presentations


Presentation on theme: "Spoken Language Translation 1 Intelligent Robot Lecture Note."— Presentation transcript:

1 Spoken Language Translation 1 Intelligent Robot Lecture Note

2 Spoken Language Translation 2 Intelligent Robot Lecture Note

3 Spoken Language Translation Spoken language translation (SLT) is to directly translate spoken utterances into another language. Major components ► Automatic Speech Recognition (ASR) ► Machine Translation (MT) ► Text-to-Speech (TTS) 3 ASR MT TTS Source Speech Source Sentence Target Sentence Target Speech 버스 정류장이 어디에 있나요 ? Where is the bus stop? Intelligent Robot Lecture Note

4 Spoken Language Translation In comparison with written language, ► Speech and especially spontaneous speech poses additional difficulties for the task of automatic translation. ► Typically, these difficulties are caused by errors of the speech recognition step, which is carried out before the translation process. ► As a result, the sentence to be translated is not necessarily well- formed from a syntactic point-of-view. Why a statistical approach for machine translation? ► Even without recognition errors, structures of spontaneous speech differ from those of written language. ► The statistical approach ◦ Avoid hard decisions at any level of the translation process ◦ For any source sentence, a translated sentence in the target language is guaranteed to be generated. 4 Intelligent Robot Lecture Note

5 Spoken Language Translation Coupling ASR to MT Motivation ► ASR cannot secure an error-free system ◦ One best of ASR could be wrong ◦ SLT must be designed robust to speech recognition errors ◦ MT could be benefited from wide range of supplementary information provided by ASR ► MT quality may depend on WER of ASR ◦ Strong correlation between recognition and translation quality ◦ WER of ASR decreases in a set of hypotheses ◦ Idea : Exploitation of more transcriptions SLT systems vary in the degree to which SMT and ASR are integrated within the overall translation process. 5 Intelligent Robot Lecture Note

6 Spoken Language Translation Coupling ASR to MT Loose coupling ► SMT uses ASR output (1-best, N-best, lattice, or confusion network) as input for 1-way module communication Tight coupling ► The whole search space of ASR and MT is integrated 6 ASR SMT TTS Source Speech 1-best, N-best, Lattice, or CN Target Sentence Target Speech ASR + SMT TTS Source Speech Target Sentence Target Speech Intelligent Robot Lecture Note

7 Spoken Language Translation Coupling ASR to MT Statistical spoken language translation ► Given a speech input x in the source language, find the best translation e ► F(o) is a set of possible transcriptions ◦ Loose coupling : 1-best, N-best, lattice, or confusion network ◦ Tight coupling : full search space ► Pr(f,e|x) : speech translation model ◦ Acoustic and translation features 7 Intelligent Robot Lecture Note

8 Spoken Language Translation Coupling ASR to MT Loose coupling vs. Tight couplings 8 Loose CouplingTight Coupling Modularity of Knowledge Sources Each KS in stand- alone module All KSs integrated in single model Inter-module Communication Typically one-way (pipelined) N/A ScalabilityEasyNot easy ComplexityFeasible Feasible only for very small domains Intelligent Robot Lecture Note

9 Spoken Language Translation ASR Outputs Automatic speech recognition (ASR) is a process by which an acoustic speech signal is converted into a set of words. Architecture 9 Feature Extraction Decoding Acoustic Model Pronunciation Model Language Model Speech Signals ASR outputs ( 1-best, N-best, Lattice, or CN ) Network Construction Speech DB Text Corpora HMM Estimation G2P LM Estimation Intelligent Robot Lecture Note

10 Spoken Language Translation ASR Outputs Network Structure Decoding of HMM-based ASR ► Searching the best path in a huge HMM-state lattice 10 ONETWOONE THREE ONE TWO THREE ONE Sentence HMM WAHN 1 2 3 ONE Word HMM Phone HMM W Intelligent Robot Lecture Note

11 Spoken Language Translation ASR Outputs 1-best ► The best path could find from back tracking ► Why a 1-best “word” sequence? ◦ Storing the backtracking pointer table for state sequence takes a lot of memory ◦ Usually a backtrack pointer storing : The previous words before the current word N-best ► Traceback not only from the 1 st -best, also from the 2 nd best and 3 rd best, etc. ► Methods ◦ Directly from search backtrack pointer table – Exact N-best algorithm, Word pair N-best algorithm, A* search using Viterbi score as heuristic ◦ Generate lattice first, then generate N-best from lattice 11 Intelligent Robot Lecture Note

12 Spoken Language Translation ASR Outputs Lattice ► A word-based lattice ◦ A compact representation of state-lattice ◦ Only word node are involved ► From the decoding backtracking pointer table ◦ Only record all the links between word nodes ► From N-best list ◦ Become a compact representation of N-best 12 Intelligent Robot Lecture Note

13 Spoken Language Translation ASR Outputs Confusion Network (L. Mangu et al., 2000) ► Or “Sausage Network” ► Or “Consensus Network” ► A weighted directed graph with a start node, an end node, and word labels over its edges ► Each path from the start node to the end node goes through all the other nodes ► From lattice ◦ Multiple alignment 13 Intelligent Robot Lecture Note

14 Spoken Language Translation Loose Coupling : 1-best The best hypothesis produced by the ASR system is passed as a text to the MT system. ► Baseline ► Simple structure ► Fast translation The speech recognition module and translation module are running rather independently ► Lacks joint optimality No use of multiple transcriptions ► Supplementary information easily available from the ASR system were not exploited in the translation process 14 Intelligent Robot Lecture Note

15 Spoken Language Translation Loose Coupling : 1-best Structure ► Recognition ► Translation 15 ASR SMT TTS Source Speech 1-best Target Sentence Target Speech Intelligent Robot Lecture Note

16 Spoken Language Translation Loose Coupling : N-best N hypotheses are translated by a text MT decoder and re-ranked according to ASR & SMT scores (R. Zhang et al., 2004) Structure 16 ASR SMT Rescore Source Speech N-best NxM translation Best translation Intelligent Robot Lecture Note

17 Spoken Language Translation Loose Coupling : N-best ASR module ► To generate N-best speech recognition hypotheses ► : n-th best speech recognition hypothesis SMT module ► To generate M-best translation hypotheses ► : m-th best translation hypotheses produced from Rescore module ► To rescore all NXM translations ► Key component ► Log linear model ◦ Features derived from ASR and SMT are combined in this module to rescore translation candidates. 17 Intelligent Robot Lecture Note

18 Spoken Language Translation Loose Coupling : N-best Rescore : Log-linear models ► : all possible translation hypotheses ► : m-th feature in log value ◦ ASR features : acoustic model, source language model ◦ SMT features : target language model, phrase translation model, distortion model, length model, … ► : weight of each feature 18 Intelligent Robot Lecture Note

19 Spoken Language Translation Loose Coupling : N-best Parameter optimization (F.J. Och, 2003) ► Objective function ► : translation output after log-linear model rescoring ► : references of English sentences ► : automatic translation quality metrics ◦ BLUE : A weighted geometric mean of the n-gram matches between test and reference sentences plus a short sentence penalty ◦ NIST : An arithmetic mean of the n-gram matches between test and reference sentences ◦ mWER : multiple reference word error rate ◦ mPER : multiple reference position independent word error rate 19 Intelligent Robot Lecture Note

20 Spoken Language Translation Loose Coupling : N-best Parameter optimization : Direction Set Methods 20 Change initial lambda Local optimization Change Direction Local lambda Best lambda Intelligent Robot Lecture Note

21 Spoken Language Translation Loose Coupling : Lattice Lattice-based MT ► Input ◦ Word lattices produced by the ASR system ► Directly integrate all models in the decoding process ◦ Phrase based lexica, single word based lexica, recognition features ► Problem ◦ How to translate the word lattices? Approach ► Joint probability approach ◦ WFST (E. Matusov et al., 2005) ► Phrase-based approach ◦ Log-linear model (E. Matusov et al., 2005) ◦ WFST (L. Mathias et al., 2006) 21 Intelligent Robot Lecture Note

22 Spoken Language Translation Loose Coupling : Lattice Structure 22 ASR Rescore Source Speech Best translation Word lattice Intelligent Robot Lecture Note

23 Spoken Language Translation Loose Coupling : Lattice From the derived decision rule ► : Standard acoustic model ► : Target language model ► : Translation model Source language model? ► To take into account requirement for the well-formedness of the source sentence, the translation model has to include context dependency on the previous source words ► This dependency for the whole sentence can be approximated by including a source language model 23 Intelligent Robot Lecture Note

24 Spoken Language Translation Loose Coupling : Lattice (Joint Probability Approach : WFST) Joint probability approach ► The conditional probability term and can be rewritten when using a joint probability translation model ► This simplifies coupling the systems ◦ The joint probability translation model can be used instead of the usual LM in ASR 24 Intelligent Robot Lecture Note

25 Spoken Language Translation Loose Coupling : Lattice (Joint Probability Approach : WFST) WFST-Based Joint Probability System ► The joint probability MT system is implemented with WFST ► First, the training corpus is transformed based on a word alignment ► Then, a statistical m-gram model is trained on the bilingual corpus ► This language model is represented as a finite-state transducer which is the final translation model 25 vorrei|I’d_like del|some gelato|ice_cream per|ε favore|please Intelligent Robot Lecture Note

26 Spoken Language Translation Loose Coupling : Lattice (Joint Probability Approach : WFST) WFST-Based Joint Probability System ► Searching for the best target sentence is done in the composition of the input represented as a WFST and the translation transducer. ► Coupling the FSA system with ASR is simple ◦ The output of the ASR represented as WFST can be used directly as input to the MT search ◦ Feature – Only Acoustic, translation probability ◦ The Source LM scores are not included – The joint m-gram translation probability serve as a source LM 26 Intelligent Robot Lecture Note

27 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Probability distributions are represented as features in a loglinear model ► The translation model probability is decomposed into several probabilities ► Acoustic model and source langue model probabilities are also included ► For a hypothesized recognized source sentence f 1 J and a hypothesized translation e 1 I, let k → (j k, i k ), k = 1,…,K be a monotone segmentation of the sentence pair into K bilingual phrases 27 Intelligent Robot Lecture Note

28 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Features ► The m-gram target langue model ► The phrasal lexicon models ◦ The phrase translation probabilities are computed as a log-linear interpolation of the relative frequencies ► The single word based lexicon models 28 Intelligent Robot Lecture Note

29 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Features (con’t) ► c1, c2 : word, phrase penalty feature ► The recognition model ◦ The acoustic model probability ◦ The m-gram source langue model probability Optimization ► All features are scaled with a set of exponents λ = {λ 1,…,λ 7 } and μ = {μ 1,μ 2 }. ► The scaling factors are optimized in a minimum error training framework iteratively by performing 100 to 200 translations of a development set ► The criterion : WER, BLEU, mWER, mPER 29 Intelligent Robot Lecture Note

30 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Practical aspects of lattice translation ► Generation of Word Lattices ◦ In a first step, We mapped all entities that were not spoken words onto the empty arc label ε ◦ The time information is not used - Remove it from the lattices ◦ The structure is compressed by applying ε-removal, determinization, and minimization ◦ This step significantly reduced runtime without changing the results ► Phrase Extraction ◦ The number of different phrase pairs is very large ◦ Candidate phrase pairs have to be kept in main memory ◦ In case of ASR word lattice input, the lattice for each test utterance is traversed, and only phrases which match sequences of arcs in the lattice are extracted ◦ Thus only phrases which can be used in translation will be loaded 30 Intelligent Robot Lecture Note

31 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Practical aspects of lattice translation (Con’t) ► Pruning ◦ A word lattice of high density as input → an enormous search space → pruning is necessary ◦ Coverage pruning and histogram pruning ◦ Based on the total costs of a hypothesis ◦ It may also be necessary to prune the input word lattices Advantage ► The utilization of multiple features ► The direct optimization for an objective error measure Disadvantage ► A less efficient search ► Heavy pruning unavoidable 31 Intelligent Robot Lecture Note

32 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Statistical Modeling for Text Translation ► Ω : All foreign phrase sequences that could have generated the foreign text ► The translation system effectively translates phrase sequences, rather than word sequences ◦ This is done by first mapping the sentence into all its phrase sequences 32 Intelligent Robot Lecture Note

33 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Phrase Sequence Lattice contains the phrase sequence that can be extracted from the text ► All phrase sequences correspond to the unique foreign sentence ► Here, a phrase is a sequence of word which can be translated ► Different phrase sequences lead to different translations ► The lattice is unweighted 33 Intelligent Robot Lecture Note

34 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Statistical Modeling for Speech Translation ► The Target Phrase Mapping transducer is applied to the foreign language ASR word lattice ► L·Ω : The likely foreign phrase sequences that could have generated the foreign speech ► The translation system still effectively translates phrase sequences, rather than word sequences ◦ These are extracted from the ASR lattice, with ASR score, rather than from a text sentence 34 Intelligent Robot Lecture Note

35 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Phrase Sequence Lattice contains the phrase sequences that can be extracted from the text ► Phrase sequences correspond to the translatable word sequences in the lattice ► The lattice contains weights from the ASR system ► Translating this foreign phrase lattice is MAP translation of the foreign speech under the generative model 35 Intelligent Robot Lecture Note

36 Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Spoken language translation is recast as an ASR analysis problem in which the goal is to extract translatable foreign language phrases from ASR word lattices ► Step 1. Perform foreign language ASR to generate a foreign language word lattice L ► Step 2. Analyze the foreign language word lattice and extract the phrases to be translated ► Step 3. Build the target language phrase mapping transducer Ω ► Step 4. Compose L and to create the foreign language ASR Phrase Lattice Ω ► Step 5. Translate the foreign language phrase lattice ASR and MT must be very compatible for this approach 36 Intelligent Robot Lecture Note

37 Spoken Language Translation Loose Coupling : Confusion Network CN-based decoder (N. Bertoldi et al., 2005) ► Input ◦ Confusion network represented as a matrix ◦ Text vs. CN – Text – CN ► Problem ◦ How to translate confusion network input? 37 나는소년입니다. 나 1.0 는 0.7 은 0.3 소녀 0.6 소년 0.4 입니다 0.5 입니까 0.3 합니다 0.2. 0.8 ? 0.2 Intelligent Robot Lecture Note

38 Spoken Language Translation Loose Coupling : Confusion Network Solution ► Simple! ► CN-based SLT decoder can be developed starting from phrase-based SMT decoder ► CN-based SLT decoder is substantially the same as the phrase-based SMT decoder apart from the way the input is managed Compare to N-best methods ► N-best Decoder ◦ Does not advantage from overlaps among N-best ► CN Decoder ◦ Exploits overlaps among hypotheses 38 Intelligent Robot Lecture Note

39 Spoken Language Translation Loose Coupling : Confusion Network Phrase-based Translation Model ► Phrase ◦ Sequence of consecutive words ► Alignment ◦ Map between CN and target phrases one word per column aligned with a target phrase ► Search criterion ► is a log-linear phrase- based model 39 Intelligent Robot Lecture Note

40 Spoken Language Translation Loose Coupling : Confusion Network Log-Linear Phrase-based Translation Model ► The conditional distribution is determined through suitable real valued feature functions, and takes the parametric form: ► Feature functions ◦ Language model ◦ Fertility models ◦ Distortion models ◦ Lexicon model ◦ Likelihood of the path within CN ◦ True length of the path 40 Intelligent Robot Lecture Note

41 Spoken Language Translation Loose Coupling : Confusion Network Step-wise translation process ► Translation is performed with a step-wise process ► Each step translates a sub-CN and produces a target phrase ► The process starts with a empty translations ► After each step, we get a partial translation ► A partial translation is complete if the whole input CN is translated Complexity Reduction ► Recombining theories ► Beam search ► Reordering constraints ► Lexicon pruning ► Confusion network pruning 41 Intelligent Robot Lecture Note

42 Spoken Language Translation Loose Coupling : Confusion Network Algorithms 42 Intelligent Robot Lecture Note

43 Spoken Language Translation Loose Coupling : Confusion Network Step-wise translation process 43 Intelligent Robot Lecture Note

44 Spoken Language Translation Loose Coupling 1-BestN-BestLatticeCN Multiple hypotheses?XOOO ASR features into MT decoding? XXOO Overlaps among hypotheses? XOOX Approximation for word lattice? XOXO 44 Intelligent Robot Lecture Note

45 Spoken Language Translation Tight Coupling Theory (H. Ney, 1999) ► Three factors ◦ Pr(e) : target language model ◦ Pr(f|e) : translation model ◦ Pr(x|f) : acoustic model 45 Baye’s Rule Introduce f as hidden variable Baye’s Rule Assume x doesn’t depend on target language Sum to Max Intelligent Robot Lecture Note

46 Spoken Language Translation Tight Coupling ASR vs. Tight Coupling (SLT) ► Brute Force Method ◦ Instead of incorporating LM into standard Viterbi algorithm, incorporating P(e) and P(f|e) ◦ Very complicated ◦ Not feasible 46 ASR vs SLT Acoustic Model Acoustic Model Source LM Source LM Acoustic Model Acoustic Model Target LM Target LM Translation Model Translation Model Intelligent Robot Lecture Note

47 Spoken Language Translation Tight Coupling WFST-Based Joint Probability System (Fully integration) ► The ASR search network ◦ A composition of WFSTs ◦ : the HMM topology ◦ : the context-dependency ◦ : the lexicon ◦ : the LM ◦ Only need to replace the source LM by the translation model ► Speech translation search network ST ► Result ◦ Small improvement of translation quality ◦ But, very slow 47 Intelligent Robot Lecture Note

48 Spoken Language Translation Tight Coupling Bleu scores against lattice density (S.Saleem et al, 2004) ► Improvements from tighter coupling may only be observed when ASR lattices are sparse, i.e. when there are only few hypothesized words per spoken word in the lattice ► This would mean that a fully integrated speech translation would not work at all. 48 Intelligent Robot Lecture Note

49 Spoken Language Translation Tight Coupling Possible issues of tight coupling ► In ASR, source n-gram LM is very closed to the best configuration ► The complexity of the algorithm is too high, approximation is still necessary to make it work ► The current approaches still haven’t really implement tight-coupling Conclusion ► The approach seem to be haunted by very high complexity of search algorithm construction 49 Intelligent Robot Lecture Note

50 Spoken Language Translation Reading List L. Mangu, E. Brill, A. Stolcke. 2000. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech and Language 14(4), 373- 400. V. H. Quan, M. Federico, M. Cettolo. 2005. Integrated N-best Re- ranking for Spoken Language Translation. EuroSpeech. R. Zhang, G. Kikui, H. Yamamoto, T. Watanabe, F. Soong, and W. K. Lo. 2004. A unified approach in speech-to-speech translation: Integrating features of speech recognition and machine translation. In Proc. of Coling 2004. F.J. Och. 2003. Minimum Error Rate Training in Statistical Machine Translation. In Proc. of ACL. E. Matusov, S. Kanthak, and H. Ney. 2005. On the Integration of Speech Recognition and Statistical Machine Translation. in Proc. Interspeech 2005. E. Matusov, H. Ney, R. Schluter. 2005. Phrase-based Translation of Speech Recognizer Word Lattices Using Loglinear Model Combination. ASRU 2005. 50 Intelligent Robot Lecture Note

51 Spoken Language Translation Reading List E. Matusov, H. Ney, R. Schluter. 2006. Integrating Speech Recognition And Machine Translation : Where Do We Stand. ICASSP 2006. L. Mathias, W. Byrne.2006. Statistical Phrase-based Speech Translation. ICASSP 2006. N. Bertoldi, M. Federico. 2005. A new decoder for spoken language translation based on confusion networks. in IEEE ASRU Workshop. H. Ney. 1999. Speech translation: Coupling of recognition and translation. in Proc. ICASSP. S.Saleem, S. C. Jou, S. Vogel, and T. Schultz, 2004. Using word lattice information for a tighter coupling in speech translation systems. in Proc. ICSLP, 2004.. 51 Intelligent Robot Lecture Note


Download ppt "Spoken Language Translation 1 Intelligent Robot Lecture Note."

Similar presentations


Ads by Google