Presentation is loading. Please wait.

Presentation is loading. Please wait.

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Similar presentations


Presentation on theme: "MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech."— Presentation transcript:

1 MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc npatel@bhrigus.com Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

2 ABSTRACT This paper describes our work in developing multilingual speech recognition and speech synthesis systems in Indian Languages. This paper describes our work in developing multilingual speech recognition and speech synthesis systems in Indian Languages. Existing speech technologies are TTS and ASR in US-Eng, Ind –Eng, Hindi no such systems exist for any other Indian languages. Existing speech technologies are TTS and ASR in US-Eng, Ind –Eng, Hindi no such systems exist for any other Indian languages. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

3 Introduction Voice enabled services are rapidly growing and high margin opportunity, specifically in multilingual country such as India. Voice enabled services are rapidly growing and high margin opportunity, specifically in multilingual country such as India. It is very difficult to have one speech synthesizer for each language. It is very difficult to have one speech synthesizer for each language. The focus is also to develop common multilingual corpora with support for multiple Indian languages and to build appropriate language specific linguistic analysis modules for text-to-speech synthesis. The focus is also to develop common multilingual corpora with support for multiple Indian languages and to build appropriate language specific linguistic analysis modules for text-to-speech synthesis. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

4 Important issues involved Enumerating a phone set to represent Indian languages. Enumerating a phone set to represent Indian languages. Selection of basic unit for synthesis - half- phones, diphones, syllables. Selection of basic unit for synthesis - half- phones, diphones, syllables. Creating a generic acoustic database that covers language variations. Creating a generic acoustic database that covers language variations. Modeling language specific prosody. Modeling language specific prosody. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

5 Our approaches A common notation for graphemes is developed using IT-3 transliteration. A common notation for graphemes is developed using IT-3 transliteration. Di phone based speech synthesis. Di phone based speech synthesis. Data-driven prosody modeling using Classification and Regression Trees (CART). Data-driven prosody modeling using Classification and Regression Trees (CART). Concatenative synthesis using cluster unit selection techniques with syllable-like units. Concatenative synthesis using cluster unit selection techniques with syllable-like units. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

6 Our Current research work Our Current research work  Text to speech synthesis TTS is a multi lingual Text–To–Speech Engine which would enable speech applications to be built in local Indian languages using unit selection algorithm and large corpus. TTS is a multi lingual Text–To–Speech Engine which would enable speech applications to be built in local Indian languages using unit selection algorithm and large corpus. A Telugu TTS system has been built and a voice portal which reads out the local language news in Telugu has been developed. A Telugu TTS system has been built and a voice portal which reads out the local language news in Telugu has been developed.  Speech recognition ASR is a multi lingual automatic speech recognition System that in conjunction with our TTS will enable full fledged speech solutions, the advance features of this engine would allow customization to a vertical within a few hours. ASR is a multi lingual automatic speech recognition System that in conjunction with our TTS will enable full fledged speech solutions, the advance features of this engine would allow customization to a vertical within a few hours. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

7  Search engine This is a cross-lingual search engine capable of searching through the content of all Indian languages. This is a cross-lingual search engine capable of searching through the content of all Indian languages. This advanced cross-lingual search engine makes use of several novel features of Indian language scripts including phonetic nature, common phonetic base and syllabic structure of Indian languages. This advanced cross-lingual search engine makes use of several novel features of Indian language scripts including phonetic nature, common phonetic base and syllabic structure of Indian languages. The other novelty of this search engine is that it uses phonetic level units for indexing which enable seamless cross-lingual search across the languages. The other novelty of this search engine is that it uses phonetic level units for indexing which enable seamless cross-lingual search across the languages.  Phonetic typing tool This tool make use of an intuitive and advanced readable transliteration scheme and phonetic properties to key-in scripts in Indian languages. This tool make use of an intuitive and advanced readable transliteration scheme and phonetic properties to key-in scripts in Indian languages. The Bhrigus phonetic typing tool comes with a friendly user interface as well as with APIs to get integrated in applications such as Email, Blogging framework etc. The Bhrigus phonetic typing tool comes with a friendly user interface as well as with APIs to get integrated in applications such as Email, Blogging framework etc. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

8  Font converters  There is a chaos as far as the Indian languages in electronic form are concerned. Neither can one exchange the notes in Indian languages as conveniently as in English language, nor can one perform search on texts in Indian languages available over the web. Neither can one exchange the notes in Indian languages as conveniently as in English language, nor can one perform search on texts in Indian languages available over the web. This is so because the texts are being stored in font dependent glyph codes. This is so because the texts are being stored in font dependent glyph codes. The glyph coding schemes for these fonts is typically different for different fonts. The glyph coding schemes for these fonts is typically different for different fonts. To view the content of these sites then one requires these fonts on local machine. To view the content of these sites then one requires these fonts on local machine. We are building the font converters for almost all Indian languages. We are building the font converters for almost all Indian languages.  Multi lingual dictionary We are developing a multi lingual dictionary which consists of English as source language and the target languages are Indian languages such as Telugu, Tamil, Gujarathi, Hindi etc. We are developing a multi lingual dictionary which consists of English as source language and the target languages are Indian languages such as Telugu, Tamil, Gujarathi, Hindi etc. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

9 Bhrigus ASR and TTS Process Framework The project components of a TTS system could be divided into language-independent component (LIC) and language-dependant component (LDC). The project components of a TTS system could be divided into language-independent component (LIC) and language-dependant component (LDC). LIC consists of speech synthesis engine dealing with unit selection algorithm and signal processing. LIC consists of speech synthesis engine dealing with unit selection algorithm and signal processing. LDC deals with building language specific resources such as pronunciation dictionary, unit selection database to build a synthetic voice. LDC deals with building language specific resources such as pronunciation dictionary, unit selection database to build a synthetic voice. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

10 Language Dependant (LDC) and Language independent (LIC) components of a TTS system Linguistic resources Text data collection Text normalization Pronunciation dictionary Letter to sound rules Syllabification, Stress Prosodic Pause pred. Unit-selection Synthesis engine Speech resources 1. Unit-selection database 2. Prosodic modeling LDC LIC (Bhrigus TTS) Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

11 Language Dependant (LDC) and Language Independent components (LIC) of an ASR system Language Dependant (LDC) and Language Independent components (LIC) of an ASR system Linguistic resources 1. Text data collection 2. Pronunciation dictionary 3. Letter to sound rules 4. Language Model Speech Recognition Engine Speech resources 1. Acoustic Models LDC LIC (Bhrigus ASR) Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

12 The development time for building a TTS and an ASR system should consists of developing LIC components and LDC components. The development time for building a TTS and an ASR system should consists of developing LIC components and LDC components. The LIC component of ASR systems is Bhrigus ASR speech recognition-engine, while the LIC component of TTS system is Bhrigus TTS unit-selection-engine. The LIC component of ASR systems is Bhrigus ASR speech recognition-engine, while the LIC component of TTS system is Bhrigus TTS unit-selection-engine. To build LDC components for ASR and TTS, it is suggested to build them together as it would decrease the development time primarily due to sharing of language dependent resources across TTS and ASR systems. To build LDC components for ASR and TTS, it is suggested to build them together as it would decrease the development time primarily due to sharing of language dependent resources across TTS and ASR systems. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

13 The LDC resources that could be shared across TTS and ASR systems are text data, pronunciation dictionary and letter-to- sound rules. The LDC resources that could be shared across TTS and ASR systems are text data, pronunciation dictionary and letter-to- sound rules. The collected text would be used to build language models for ASR and at the same time would be used to extract a set of optimal sentences to be recorded in the case of TTS system. The collected text would be used to build language models for ASR and at the same time would be used to extract a set of optimal sentences to be recorded in the case of TTS system. Similarly pronunciation dictionary and letter-to-sound rules could be shared across the TTS and ASR system. Similarly pronunciation dictionary and letter-to-sound rules could be shared across the TTS and ASR system. It should also be noted that there exists several modules inside the TTS and ASR engines which could be shared too. It should also be noted that there exists several modules inside the TTS and ASR engines which could be shared too. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

14 Demos Demos are at Demos are at http://196.12.38.23/index.html http://196.12.38.23/index.html http://196.12.38.23/index.html Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

15 Conclusion Conclusion Four basic principles are to create and sustain the leading market solution for professional services. Four basic principles are to create and sustain the leading market solution for professional services. text-to-speech, text-to-speech, speech-to-text, speech-to-text, search, machine translation search, machine translation natural dialogue management for Indian languages including Indian-English; interface that solution into the vast majority of technical environments relevant to these types of applications; provide skilled services; and provide services at differentiated low rates natural dialogue management for Indian languages including Indian-English; interface that solution into the vast majority of technical environments relevant to these types of applications; provide skilled services; and provide services at differentiated low rates Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

16


Download ppt "MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech."

Similar presentations


Ads by Google