G. Anushiya Rachel Project Officer

Slides:



Advertisements
Similar presentations
By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
Text to Speech for In-car Navigation Systems Luisa Cordano August 8, 2006.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Auditory User Interfaces
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Assistive Technology By: Roxanne Majeski, Oscar Guerin, Tasha Reaves, Elias Luna.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Prepared by: Waleed Mohamed Azmy Under Supervision:
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
Synthesis of Child Speech With HMM Adaptation and Voice Conversion Oliver Watts, Junichi Yamagishi, Member, IEEE, Simon King, Senior Member, IEEE, and.
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
OPTIMAL TEXT SELECTION ALGORITHM ASR Project Meetings Dt: 08 June Rohit Kumar - LTRC, IIIT Hyderabad.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主 持 人 : 傅立成 共同主持人 : 李琳山,歐陽明,洪一平, 陳祝嵩 水美溫泉會館研討會
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
S PEECH T ECHNOLOGY Answers to some Questions. S PEECH T ECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH.
AN INTELLIGENT ASSISTANT FOR NAVIGATION OF VISUALLY IMPAIRED PEOPLE N.G. Bourbakis*# and D. Kavraki # #AIIS Inc., Vestal, NY, *WSU,
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Collaborator Revolutionizing the way you communicate and understand
Performance Comparison of Speaker and Emotion Recognition
© 2013 by Larson Technical Services
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Siri Voice controlled Virtual Assistant Haroon Rashid Mithun Bose 18/25/2014.
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
High Quality Voice Morphing
Products/Solutions/Expertise of C-DAC Mumbai in Smart City Domain
Natural Language Processing and Speech Enabled Applications
Mr. Darko Pekar, Speech Morphing Inc.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Text-To-Speech System for English
3.0 Map of Subject Areas.
Speech Technology for Language Learning
PhoNET Voice based web access ASWIN.P S3 EC ROLL : 24.
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
University of West Bohemia – Department of Cybernetics
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
Artificial Intelligence
HoloSync: Exploring Discoverable Conversational Interfaces for Model State Control ALI SIDDIQUI.
Indian Institute of Technology Bombay
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Role of Speech Technology in Enhancing Human-Computer Interactions in Tamil G. Anushiya Rachel Project Officer Speech Lab, SSN College of Engineering

Introduction Human-computer interactions Graphical user interface Voice user interface Communicate with computers/machines as with a human being through speech Personal assistants such as Siri, Google assistant, Cortana, Alexa Requirements of the computer Recognize user’s voice/speech (speaker/speech recognition) Respond appropriately through speech (speech synthesis) Possible application of speech: Speech recognition/synthesis, speaker verification/identification, language identification, emotion recognition/synthesis Presently, text-to-speech (TTS) synthesis systems, a restricted vocabulary speech recognition system, and a speech-enabled enquiry system developed for Tamil

Text-to-Speech Synthesis Converts any given text in a language to speech Basic components Text pre-processing Text to phonetic-prosodic translation Signal processing component to generate speech Language dependent and independent modules Restricted and unrestricted domain synthesizers

Unit Selection Synthesis (USS) Waveform concatenation approach Pre-recorded speech units combined based on the given text, such that target and concatenation costs are reduced Speech units could be words or sub-word units (eg: phonemes, CV units, syllables) Synthesized speech Natural Contains glitches at the concatenation points Larger speech unit – better quality Large footprint size (in the order of GBs) Larger speech unit – more amount of data required

Unit Selection Synthesis (USS)

HMM-Based Speech Synthesis System (HTS) Statistical parametric approach Uses source-filter model to synthesize speech Synthesized speech Highly intelligible Slightly less natural Small footprint size (in the order of few kBs)

Requirements Text data Domain specific or unrestricted Speech data Record in quiet/studio environment Amount of data Basic unit could be phone, diphone, syllable, etc. Larger the unit greater the amount of training data Letter-to-sound rules of the language Time aligned transcriptions

Letter-to-sound rules

Time-Aligned Transcriptions

HMM-Based Speech Synthesis Effect of context information on quality Monophone – context independent (/aa/ /g/ /aa/ /y/ /a/ /m/) Triphone – right and left contexts (/x-aa+g/ /aa-g+aa/ /g-aa+y/ ….) Pentaphone – 2 contexts to right and left Pentaphone with additional features Web demo: http://speech.ssn.edu.in Prosody modification To improve naturalness of speech, pitch contour can be modified Emotions can also be incorporated

Polyglot HTS Bilingual synthesizers for Tamil and Indian English Tamil phonemes mapped to similar Indian English phonemes Separate synthesizers for Tamil and English Perceptually similar phonemes merged Acoustically similar phonemes merged Polyglot synthesizers for Tamil, Hindi, Malayalam, Telugu GMM-based voice conversion used Characteristics of each speaker adapted to desired speaker’s characteristics

Mobile Application and Screen Reader Android mobile application allows the user to type the desired text and synthesizes it. Tamil TTS system is integrated with the “Talkback” feature in Android phones, which serves as a screen-reader Linux-based screen reader that synthesizes selected text has also been developed.

Speech-Enabled Interactive Enquiry System Communicates to the user entirely through speech Consists of three components - speech recognition system, TTS synthesis system, and database containing relevant information

Speech-Enabled Interactive Enquiry System Developed to provide information on agriculture, specifically, paddy, sugarcane, and ragi Obtains user’s query through a series of questions Questions formulated such that they elicit 1 to 3-word responses Garbage models used to eliminate out-of-vocabulary words Recognized result verified from the user in the event of a doubt Information relevant to the user’s query fetched from a database and synthesized by the TTS system

Future Directions Development of an unrestricted vocabulary speech recognition system for Tamil Identification of emotions from speech Synthesis of emotional speech

Demo