Speech recognition in mobile environment Robust ASR with dual Mic

Slides:

Advertisements

Similar presentations

COURSE: COMPUTER PLATFORMS

Advertisements

Speech Processing for NSR Vs DSR Veeru Ramaswamy PhD CTO, Vianix LLC

Advanced Speech Enhancement in Noisy Environments

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.

Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

Advances in WP1 Turin Meeting – 9-10 March

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.

Article Review: Spoken Dialogue Technology: Enabling the Conversational User MICHAEL F.M C TEAR University of Ulster University of Ulster This article.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Advances in WP1 and WP2 Paris Meeting – 11 febr

Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.

Why is ASR Hard? Natural speech is continuous

Course: Introduction to Computers

Case Studies Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIVERSITI MALAYSIA SARAWAK.

SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.

Speech Recognition Final Project Resources

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.

Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.

Scheme for Improved Residual Echo Cancellation in Packetized Audio Transmission Jivesh Govil Digital Signal Processing Laboratory Department of Electronics.

By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.

Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.

17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.

Project By: Brent Elder, Mike Holovka, Hisham Algadaibi.

Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.

November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End.

Jacob Zurasky ECE5526 – Spring 2011

Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.

Geospatial Data Integration Peer-to-Peer Discovery Music Information Processing Automatedtranscription ExpressivePerformance Music Info Retrieval APPLICATIONS.

4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.

Rate-distortion Optimized Mode Selection Based on Multi-channel Realizations Markus Gärtner Davide Bertozzi Classroom Presentation 13 th March 2001.

Controlling Computer Using Speech Recognition (CCSR) Creative Masters Group Supervisor : Dr: Mounira Taileb.

PROPOSAL : The Use of Voice Command in Operating Personal Computer By : COLLEGE OF ART & SCIENCE UNIVERSITI UTARA MALAYSIA STIW5023 ADVANCED PROGRAMMING.

CSCI-100 Introduction to Computing Hardware Part II.

Basic structure of sphinx 4

Using Voice to Solve Ergonomic Problems Dr. William Lenharth, CHFP UNH – Project54.

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Chapter 1. SIGNAL PROCESSING:  Signal processing is concerned with the efficient and accurate extraction of information in a signal process.  Signal.

ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.

1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

Presentation of Curricula THE SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING OF APPLIED STUDIES ELECTRONICS AND TELECOMMUNICATIONS DBBT project meeting,

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.

Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.

SPEECH TECHNOLOGY An Overview Gopala Krishna. A

Automatic Speech Recognition

Reza Yazdani Albert Segura José-María Arnau Antonio González

A Power Efficient Scheme for Speech Controlled IoT Applications

Yes, I'm able to index audio files within Alfresco

Speech Processing AEGIS RET All-Hands Meeting

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments Good morning, My name is Guan-Lin Chao, from Carnegie Mellon.

Automatic Speech Recognition

3.0 Map of Subject Areas.

Lab 2: Isolated Word Recognition

What is Pattern Recognition?

Automatic Speaker Identification Using Sentinel Word Discrimination

Speech Capture, Transcription and Analysis App

On the Integration of Speech Recognition into Personal Networks

Lab 3: Isolated Word Recognition

Audio Compression Techniques

A maximum likelihood estimation and training on the fly approach

Combination of Feature and Channel Compensation (1/2)

Huawei CBG AI Challenges

Presentation transcript:

Speech recognition in mobile environment Robust ASR with dual Mic UNIVERSITE D’ORAN 1 Ahmed Ben Bella Speech recognition in mobile environment Robust ASR with dual Mic Présenté par : Yacine IKKACHE Encadré par : Pr. Med SENOUCI Dr. B KOUNINEF

WHAT IS ASR Command and control Automatic transcription Automatic translation Home automation Voice dialing

How its work

HMM-Based Recognizer pattern classification Mathematical Formulation:

HMM-Based Recognizer pattern classification acoustic model

HMM-Based Recognizer pattern classification acoustic model

HMM-Based Recognizer pattern classification language model

HMM-Based Recognizer pattern classification search problem

Building Quran reader controlled by speech ASR with sphinx Sphinx4 is a software implementation of HMM speech recognizer, it’s architecture is highly flexible

Acoustic model for Quranic reader data collection Speech collection We prepared a text file which contain 114 suras name’s, famous receiters names

Acoustic model for Quranic reader data collection The audio file was recorded using a sampling rate of 16KHZ and 16 bit per sample Each file has been named using this convention: speakername-commandID.wav These audio files were divided into two sets

Building Quran reader controlled

Building Quran reader controlled

Publication "Building Quranic reader voice interface using sphinx toolkit" in the Journal of American sciences (novembre 2013) "Toward Quranic reader controlled by speech" in international journal of Advanced Computer Science & Application ( avril 2012) The audio file was recorded using a sampling rate of 16KHZ and 16 bit per sample Each file has been named using this convention: speakername-commandID.wav These audio files were divided into two sets

Speech recognition in mobile environment

Speech recognition in mobile environment Architecture The decision is driven by factors including device and network resources, ASR components complexity and application.

Speech recognition in mobile environment NSR Coding Transmission errors

Speech recognition in mobile environment DSR The absence of coding and transcoding problems Robustness against comm. channel & acoustic noise Thin client, easy to update, no limits in ASR complexity Front-end must be implemented in the device Network dependency and transmission errors

Robust speech recognition on mobile environments. Main research lines of the group: Robust speech recognition on mobile environments. Robust ASR on mobile devices with small microphone array. Robust transmission of speech and video. Ultrasonic non-destructive testing. Signal processing in proteomics.

Robust speech recognition on mobile environments.

Robust speech recognition on mobile environments.

Robust speech recognition on mobile environments.

Robust speech recognition on mobile environments.

Robust speech recognition on mobile environments.

Robust speech recognition on mobile environments.

Robust speech recognition on mobile environments Robust speech recognition on mobile environments. Noise reduction with single microphone

Robust speech recognition on mobile environments Robust speech recognition on mobile environments. Noise reduction with dual Mic

Robust speech recognition on mobile environments Robust speech recognition on mobile environments. Noise reduction with dual mic

Robust speech recognition on mobile environments Robust speech recognition on mobile environments. Noise reduction with dual mic

Noise reduction with dual mic DNN to extract binary mask Marginilization Frame reconstruction

Noise reduction with dual mic DNN to extract soft mask Y’= ES * Y1

Noise reduction with dual mic Dual Mic database creation

conclusion Multichannel information can be exploited to improve ASR performance. We are working on implementing novel technique ( DNN based soft mask estimation for robust ASR in Matlab ) The extracted features will be used in sphinx for recognition

Merci pour votre attention