Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Published byModified over 4 years ago
Presentation on theme: "Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke."— Presentation transcript:
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke
Introduction Based on an article “PDA translates Speech” by Kimberley Patch. Combined effort of researchers from CMU, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc. What is the Aim? Two-way translation of medical information from English to Arabic and Arabic to English. System Used: iPaq handheld computer
System iPaq handheld computer 64 MB memory Requirements Two recognizers Translators Synthesizers
Automatic Speech Recognition ASR-Technology that recognizes and executes voice commands Steps in ASR Feature Extraction Acoustic modeling Language modeling Pattern Classification Utterance verification Decision
Speech Recognition Process Feature Extraction Pattern Classification Acoustic Modeling Language Modeling Utterance Verification Decision Functions of a speech recognizer
Feature Extraction Features:- Attributes pertaining to a person that enable a speech recognizer to distinguish the phonemes in each word. Energy:
Visual Display of Frequencies Spectrogram. The energy levels are decoded to extract the features, which are stored in an feature vector for further processing.
Feature Extraction Speech Signal ->Microphone->Analog signal. Digitization of analog signal to store in the computer. Digitization involves sampling (Common sampling rates…8000hz to 16,000hz). Features are extracted from the digitized speech. Results in feature vector (numerical measurements of speech attributes ) Speech recognizer uses the feature vectors to decode the digitized speech signal.
Acoustic Modeling Numerical representation of sound (utterances of words in a language). Comparison of speech features of digitized speech signal with the features of existing models. Determination of sound is probabilistic by nature. Hidden Markov Model (HMM) is a statistical technique which forms basis for the development of acoustic models. HMMs give the statististical likelihood of particular sequence of words or phonemes HMMs are used in both speech training and speech recognition
HMMs Cont’d Depend on the Markov Chain. (a sequence of random variables whose next values depend on the previous values as represented below).
Other Speech Recognition Components Pattern Classifier: The Pattern classification component groups the patterns generated by the acoustic modeling component. Speech patterns having similar speech features are grouped together. The correctness of the words generated by the pattern classifier is measured by the utterance verification component. What the Speechalator Prototype uses… The prototype uses a HMM based recognizer, designed and developed by Multi-Modal Technologies Inc. The speech recognizer needs 1 MB of memory and the acoustic models occupy 3MB of memory.
What is Machine Translation (MT)? Translation of Speech from one language to another with the help of software. Types of MT: Direct Translation (Word–to-word) Transfer Based Translation Interlingua Translation
Why MT is difficult Ambiguity: Sentence and words have different meanings. Lexical Ambiguity, Structural Ambiguity, Semantically Ambiguous. Structural Differences between Language Idioms cannot be translated
Approaches in Machine Translation Analysis IL Synthesis Source Language Target Language Direct Translation Machine Translation Triangle or Vauqois Triangle Transfer
Differences between the three translation architectures: Direct translation: Word-to-word translation Transfer based: Requires the knowledge of both source and target language. Suits for Bilingual Translation Intermediate representations are language dependent Parses the source language sentence, and applies transfer rules that map grammatical segments of the source and target language.
Differences between the three translation architectures cont’d.. Interlingual Transaltion. Generates a language independent representation called Interlingua (IL) for the meaning of sentences or segments of sentences in the source language. A text in source language can be converted into any target language. Hence suits for multilingual translation.
More on Machine Translation Knowledge Based MT (KBMT): Completely analyze and understand the meaning of the source text . Translate into target language text. Performance heavily relies on the amount of world knowledge present to analyze the source language. Knowledge represented in the form of frames. [Event: Murder is a: Crime]
Machine Translation Cont’d Example Based MT (EBMT): Sentence are analyzed on the basis of similar example sentences analyzed previously. What Speechalator Prototype Uses? Statistical based MT (SBMT) : Uses Corpora that is analyzed previously. No linguistic information required. N-gram modeling used
Conclusions Speechalator is an good achievement in both mobile technology and NLP. Simple push-to-talk button interface. Uses optimized Speech recognizers and speech synthesizers. This architecture allows components to be placed both on-device and on a server. Presently most of the components are ported to the device. Performance: 80% accuracy Takes 2-3 seconds for translation Presently restricted to a domain…
Future Work Increase accuracy of the device to deal with noisy environments. Build more learning algorithms. Multi-lingual speech recognizer. To achieve Domain independence.
References 1.Kimberley Patch. PDA Translates Speech. Technology and Research News (TRN), 17/24 December, 2003. 2.Richard V. Cox, Lawrence R. Rabiner, Candace A. Kamm. Speech and Language Processing for next-millennium communication services. Proceedings of the IEEE, 88(8):1314-1337, Feb 2000. 3.http://www.isip.msstate.edu/projects/speech/ ASR Home page.http://www.isip.msstate.edu/projects/speech/ 4.Speechalator: Two-Way Speech-To-Speech Translation on a Consumer PDA, Eurospeech 2003 Geneva, Switzerland Pages:1-4. 5.Machine Translation: A survey of approaches. Joseph Seaseley. University of Michigan Ann Arbor. 6.Thierry Dutoit. A short introduction to Text-to-Speech Synthesis (TTS). http://tcts.fpms.ac.be/synthesis/introtts.htmlThierry Dutoit http://tcts.fpms.ac.be/synthesis/introtts.html