Speech Recognition Application

Slides:



Advertisements
Similar presentations
Audio Workgroup Neuro-inspired Speech Recognition.
Advertisements

Building an ASR using HTK CS4706
USA AREA CODES APPLICATION by Koffi Eddy Ihou May 6,2011 Florida Institute of Technology 1.
Page 1. Page 2 Virtual Speaker: A Virtual Studio The software: Virtual Speaker is a package that automatically creates your voice files, prompts or any.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Speech Recognition Problem and Hidden Markov Model Ziba Rostamian CS Winter 2008.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Non-native Speech Languages have different pronunciation spaces
LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006.
Auditory User Interfaces
Why is ASR Hard? Natural speech is continuous
Automatic Transcript Generation Helmer Strik A 2 RT Dept. of Language & Speech University of Nijmegen.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Automatic Continuous Speech Recognition Database speech text Scoring.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Introduction to Automatic Speech Recognition
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
Temple University Speech Recognition using Sphinx 4 (Ti Digits test) Jaykrishna shukla,Amir Harati,Mubin Amehed,& cara Santin Department of Electrical.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used.
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
7-Speech Recognition Speech Recognition Concepts
Integrating VoiceXML with SIP services
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture /09/05.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
Temple University Training Acoustic Models Using SphinxTrain Jaykrishna Shukla, Mubin Amehed, and Cara Santin Department of Electrical and Computer Engineering.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
Assistive Technology November 14, Screen Reader Who uses screen readers? –People with little to no vision What is it? –A form of “Assistive Technology”
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak and Tom Ko Hong Kong University of Science and Technology.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
© 2013 by Larson Technical Services
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Speech Recognition Created By : Kanjariya Hardik G.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Speech Recognition Xiaofeng Lai. What is speech recognition?  Speech recognition :  This is the ability of a machine or program to identify words and.
Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.
Presented By Sharmin Sirajudeen S7 CS Reg No :
1 Speech Recognition. 2 Introduction What is Speech Recognition? - Voice Recognition? Where can it be used? - Dictation - System control/navigation -
G. Anushiya Rachel Project Officer
Automatic Speech Recognition
Reza Yazdani Albert Segura José-María Arnau Antonio González
Automatic Speech Recognition
Automatic Speech Recognition Introduction
Statistical Models for Automatic Speech Recognition
Speech Recognition Application
Statistical Models for Automatic Speech Recognition
From Word Spotting to OOV Modeling
Automatic Speech Recognition: Conditional Random Fields for ASR
Lab 3: Isolated Word Recognition
Command Me Specification
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presentation transcript:

Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah

Process of Speech Recognition Speaker dependent vs. Speaker Independent Vocabulary  Isolated vs. Continuous Frequency changes Pronunciation Speech Processing HMM – Probabilities, Parameters, Training Phonemes to words

Problem Automatic speech interacting phone directory assistance without human interaction.

Automatic Speech Recognition - Sphinx Acoustic modeling Language Model Unigrams: <s> & </s> Bigrams: P(word2 | word1) Trigrams: P(word3| word2 | word1) Lexicon Structure ZERO Z IH R OW ONE W AH N TWO T UW <sil>

Input / Output FWDVIT: H E L L (null) 24003 samples in file /usr/local/share/sphinx3/model/lm/an4/hell.raw INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2) INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTH INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH Backtrace(null) LatID SFrm EFrm AScr LScr Type 254 0 45 -391470 -74100 -1<sil> 594 46 81 -472155 -148846 0 H 1291 82 102 -288621 -148846 0 E 1850 103 126 -235274 -148846 0 L 2599 127 147 -430694 -148846 0 L 2650 148 148 0 -148846 0 </s> 0 148 -1818214 -818330 (Total) FWDVIT: H E L L (null)

Difficulties Hardware issues ASR software issues Letter phonemes - “e-set” Time

Solution Database (PostgreSQL) Names Numbers Phone number Fast access

Solution Architecture of application Example (general idea): … PC: Say the letters of first name, press space bar before and after you speak: User: S AA EM PC: Did you say, SAM ? Architecture of application User Interaction Connects to Database Communicates with Sphinx Uses of C, Perl, shell scripts

Solution

Check List Reading ASR system Database - PSQL Applications in C, Perl, PHP, vxml, shell

Timeline