A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

Building an ASR using HTK CS4706
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Jacob Zurasky ECE5525 Fall  Goals ◦ Determine if the principles of speech processing relate to snoring sounds. ◦ Use homomorphic filtering techniques.
CMSC Assignment 1 Audio signal processing
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Dynamic Time Warping Applications and Derivation
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Topics covered in this chapter
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Mel-Cepstrum Based Steganalysis for VoIP-Steganography Christian Kraetzer and Jana Dittmann Electronic Imaging International Society for Optics and.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Speaker Verification System Part B Final Presentation
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
PATTERN COMPARISON TECHNIQUES
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
PROJECT PROPOSAL Shamalee Deshpande.
Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu
Isolated word, speaker independent speech recognition
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
8-Speech Recognition Speech Recognition Concepts
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Digital Systems: Hardware Organization and Design
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Mark Hasegawa-Johnson 10/2/2018
Speech Signal Representations
Measuring the Similarity of Rhythmic Patterns
Keyword Spotting Dynamic Time Warping
Presentation transcript:

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

G OALS Learn how it works ! Focus: Pre-Processing Dynamic Time Warping/Dynamic Programming Verify using MATLAB Build a simple Voice to Text Converter application.

H OW DOES IT WORK ? Record Extract a voice Feature Vectors Digitized Speech Signal (.wave file) Acoustic Preprocessing (DFT + MFCC) Speech Recognizer (Dynamic Time Warping)

S PEECH SIGNAL Voiced Excitation  fundamental frequency (Speaker dependent) Loudness  signal amplitude Vocal tract shape  spectral shaping (most important to recognize words) A time signal of vowel /a:/ (fs=11 kHz, length=100ms) time

ACOUSTIC PRE-PROCESSING Log power spectrum of a vowel /a:/ Cepstrum of a vowel /a:/ Power spectrum of the vowel /a:/ after cepstral smoothing (liftering) LIFTERING IDFTDFT

MFCC (M EL FREQUENCY CEPSTRAL COEFFICIENTS ) The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum.cepstrummel scale Coeff. Of power spectrum  Mel Spectral Coeff. (FEATURE VECTOR)

RECOGNIZER One word spoken contains dozens of feature vectors. (preprocessing every 10 ms of signal) Compute a ”distance” between this unknown sequence of vectors (unknown word) and known sequence of vectors (prototypes of words to recognize) PROBLEM !! Unequal length of vector sequence

D YNAMIC TIME WARPING : F IND OPTIMAL ASSIGNMENT PATH

DTW : R ECOGNIZING CONNECTED WORDS

MATLAB FUNCTIONS PRE-PROCESSING recordMelMatrix(3) S = wavread(“speech.wav”) C = Melfiltermatrix(S, N, K) computeMelSpectrum( C,S); DISPLAY FEATURES Featuredisp.m WORD RECOGNITION dp_asym(vector1, vector2)

R ESULTS hellohello1

libraryhello

computerhello

3.0304e e e+00 3

Welcome home (male) Welcome home (female)

Welcome homeWelcome back

Welcome homeComputer Science

Welcome backComputer Science

2.6418e e e e+003

THANKS ! ANY QUESTIONS?