In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi.

Slides:



Advertisements
Similar presentations
Programming Paradigms Introduction. 6/15/2005 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved. L1:
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
MotoHawk Training Model-Based Design of Embedded Systems.
MATLAB Presented By: Nathalie Tacconi Presented By: Nathalie Tacconi Originally Prepared By: Sheridan Saint-Michel Originally Prepared By: Sheridan Saint-Michel.
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
1 A Tool for System Simulation: SIMULINK Can be used for simulation of various systems: – Linear, nonlinear; Input signals can be arbitrarily generated:
Ameriranikistan Muhammad Ahmad Kyle Huston Farhad Majdeteimouri Dan Mackin.
CS 201 Functions Debzani Deb.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 15 - C++ As A "Better C" Outline 15.1Introduction.
Guide To UNIX Using Linux Third Edition
Chapter 8: Introduction to High-Level Language Programming Invitation to Computer Science, C++ Version, Fourth Edition.
C++ fundamentals.
The Origin of the VM/370 Time-sharing system Presented by Niranjan Soundararajan.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Numerical Text-to-Speech Synthesis System Presentation By: Sevakula Rahul Kumar.
 What’s a Computer? What’s a Computer?  Characteristics of a Computer Characteristics of a Computer  Evolution of Computers Evolution of Computers.
© 2004 The MathWorks, Inc. 1 MATLAB for C/C++ Programmers Support your C/C++ development using MATLAB’s prebuilt graphics functions and trusted numerics.
I. Pribela, M. Ivanović Neum, Content Automated assessment Testovid system Test generator Module generators Conclusion.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Team 1 Jesus Weibo Mina Yunming. Client’s Needs  Short term, medium term, long term plans  Simulation of energy system to provide affordable future.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
CIS Computer Programming Logic
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Data Acquisition Data acquisition (DAQ) basics Connecting Signals Simple DAQ application Computer DAQ Device Terminal Block Cable Sensors.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Implementing a Speech Recognition System on a GPU using CUDA
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Jacob Zurasky ECE5526 – Spring 2011
An I/O Simulator for Windows Systems Jalil Boukhobza, Claude Timsit 27/10/2004 Versailles Saint Quentin University laboratory.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
1 CSC 222: Computer Programming II Spring 2004 See online syllabus at: Course goals:
Applied Computing Technology Laboratory QuickStart C# Learning to Program in C# Amy Roberge & John Linehan November 7, 2005.
Beginning C++ Through Game Programming, Second Edition
Property of Jack Wilson, Cerritos College1 CIS Computer Programming Logic Programming Concepts Overview prepared by Jack Wilson Cerritos College.
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
CPSC 252 Operator Overloading and Convert Constructors Page 1 Operator overloading We would like to assign an element to a vector or retrieve an element.
Abstract ESOLID is a computational geometry system that performs boundary evaluation using exact computation. Boundary Evaluation Exact computation Problem.
Covenant College November 27, Laura Broussard, Ph.D. Professor COS 131: Computing for Engineers Chapter 5: Functions.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Chapter 3 Top-Down Design with Functions Part II J. H. Wang ( 王正豪 ), Ph. D. Assistant Professor Dept. Computer Science and Information Engineering National.
ISIP: Research Presentation Seungchan Lee Feb Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Functions: Part 2 of /11/10: Lecture 16 CMSC 104, Section 0101 John Y. Park 1.
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
Our project main purpose is to develop a tool for a combinatorial game researcher. Given a version of combinatorial puzzle game and few more parameters,
Designing a Voice Activated Compartmentalized Safe with Speech Processing using Matlab Preliminary Design Review Amy Anderson Ernest Bryant Mike Joyner.
Graphical Design Environment for a Reconfigurable Processor IAmE Abstract The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture.
General Computer Science for Engineers CISC 106 Lecture 12 James Atlas Computer and Information Sciences 08/03/2009.
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
1 ® ® Agenda 8:30 a.m.Introduction to The MathWorks, Xilinx, and Avnet 9:00 a.m.Video System Design with Simulink 9:45 a.m.Break 10:00 a.m.FPGA Implementation.
Language Model Grammar Conversion Wesley Holland, Julie Baca, Dhruva Duncan, Joseph Picone Center for Advanced Vehicular Systems Mississippi State University.
PHP Reusing Code and Writing Functions 1. Function = a self-contained module of code that: Declares a calling interface – prototype! Performs some task.
High Performance Flexible DSP Infrastructure Based on MPI and VSIPL 7th Annual Workshop on High Performance Embedded Computing MIT Lincoln Laboratory
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Introduction to Programming Lecture # 43. Math Library Complex number Matrix Quadratic equation and their solution …………….…
© Copyright Mistras Group Inc MISTRAS GROUP CONFIDENTIAL Noesis Noesis specializes in Acoustic Emission (AE) data analysis including real-time software.
Wednesday NI Vision Sessions
Introduction to C++ programming Recap- session 1 Structure of C++ program Keywords Operators – Arithmetic – Relational – Logical Data types Classes and.
ARTIFICIAL NEURAL NETWORKS
Object-Orientated Programming
Srinivas Aluri Jaimin Mehta
CSE 1020:Software Development
Combination of Feature and Channel Compensation (1/2)
Multichannel Link Path Analysis
Presentation transcript:

in collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi State University SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION Presented by Richard Duncan Tablet PC Microsoft Corporation

Page 1 of 38 Signal Processing Tools for Speech Recognition WHICH TWO ARE THE SAME PHONEME? We need to extract meaningful information from the signal for a speech recognition system to model

Page 2 of 38 Signal Processing Tools for Speech Recognition WHICH TWO ARE THE SAME PHONEME? a: “ow”b: “aa”c: “ow”

Page 3 of 38 Signal Processing Tools for Speech Recognition WHAT IS AN ACOUSTIC FRONT-END? It encapsulates the signal processing of a speech recognition system. It computes a sequence of feature vectors from an audio stream. These vectors are then processed by HMMs, neural networks, or other classifiers.

Page 4 of 38 Signal Processing Tools for Speech Recognition WHY REINVENT THE WHEEL? A Front-end has many areas of complexity: Run-time efficiency File I/O Data management (framing) DSP algorithm complexity Algorithm re-use Our system abstracts the researcher/student from these mundane issues to so he or she can focus on the algorithms

Page 5 of 38 Signal Processing Tools for Speech Recognition DATA FRAMING frame n frame n+1 window n window n+1 New dataShared data

Page 6 of 38 Signal Processing Tools for Speech Recognition FEATURES OF ISIP FOUNDATION CLASSES Efficient memory management and tracking; System and I/O libraries that abstract details of the operating system; Math classes that provide basic linear algebra and efficient matrix manipulations; Generic data structures; Built-in unit tests to verify component correctness.

Page 7 of 38 Signal Processing Tools for Speech Recognition DESIGN REQUIREMENTS A library of standard algorithms provides basic digital signal processing (DSP) functions; New algorithms can be added without modifying existing classes; A block diagram tool allows rapid prototyping without programming or recompiling; The same system is used for offline feature extraction, recognition, and general DSP work.

Page 8 of 38 Signal Processing Tools for Speech Recognition BASIC DIGITAL PROCESSING FUNCTIONS This example shows how to realize the basic digital signal processing functions. It computes the energy of input vector in dB using the SUM algorithm: // declare an Energy object, input vector, and output vector Energy egy; VectorFloat output; VectorFloat input(L"0, 1, 2"); // choose algorithm enrgy.setAlgorithm(Energy::SUM); // choose implementation egy.setImplementation(Energy::DB); // compute the energy of input data egy.compute(output, input);

Page 9 of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS class AlgorithmBase : // Processing: virtual boolean init(); virtual boolean apply(); // Configuration: virtual const String& className() const; virtual long getLeadingPad() const; virtual long getTrailingPad() const; virtual CMODE getOutputMode() const; virtual float getOutputSampleFrequency() const; virtual boolean setParser(); // Debugging: boolean displayStart(); boolean displayFinish(); boolean displayChannel(); boolean display(); } Interface contract allows extensibility to new algorithms; All algorithms are classes that implement this interface; Most have a default implementation.

Page 10 of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS boolean Energy::init() { } const String& className() const { return CLASS_NAME; } int GetLeadingPad() const { return 0; } int GetTrailingPad() const { return 0; } bool Apply(Vector output, Vector input) { // determine what channel to operate on … if (algorithm_d == SUM) { computeSum(output(0).makeVectorFloat(), input(0).getVectorFloat()); } … }

Page 11 of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS boolean Energy::computeSum(VectorFloat& output_a, const VectorFloat& input_a) { // compute the sum of squares Float e = input_a.sumSquare(); // compute the scale factor according to specified implementation float scaled_energy = scale(e, input_a.length()); // the length of the output vector should be 1 as it only contains the energy output_a.setLength(1); // assign the value of energy to the output output_a(0) = Integral::max(floor_d, scaled_energy); // exit gracefully return true; }

Page 12 of 38 Signal Processing Tools for Speech Recognition DEFINITIONS Algorithm: Input and output is an array of floating point numbers Correspond to basic DSP principles Recipe: Collection of algorithms which are run serially, output of A n-1 is the input to A n Named input and outputs Allows reuse of processing blocks between systems

Page 13 of 38 Signal Processing Tools for Speech Recognition HIERARCHY OF ALGORITHM CLASSES

Page 14 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL Design a front-end by creating a block diagram; Allows rapid prototyping of ideas. New modules can easily be added into the system Parameter file is then the input to a full speech recognition system

Page 15 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 16 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 17 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 18 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 19 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 20 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 21 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 22 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 23 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 24 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 25 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 26 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 27 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 28 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 29 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 30 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 31 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 32 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 33 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

Page 34 of 38 Signal Processing Tools for Speech Recognition RESPONSIBILITIES OF THE UTILITY Parses the file containing the recipe created in the configuration tool; Synchronizes different paths along the block flow diagram contained in the recipe; Prepares input and output data buffers for each algorithm; Schedules the sequence of required signal processing operations; Processes data through the recipe; Manages large collections of data files.

Page 35 of 38 Signal Processing Tools for Speech Recognition VERIFICATION STRATEGY The correctness: The implementation of each algorithm is verified manually or by using other tools such as MATLAB. Usability: Assessed and enhanced the usability of our tools through extensive user testing conducted over the course of several training sessions. Speech recognition experiments: The correctness of the tools was also verified by speech recognition experiments.

Page 36 of 38 Signal Processing Tools for Speech Recognition STATE-OF-THE-ART FEATURES Mel-frequency cepstral coefficients (MFCCs); Cepstral mean subtraction; Energy normalization; 1 st and 2 nd order differential features; These features are used by most commercial speech recognition systems.

Page 37 of 38 Signal Processing Tools for Speech Recognition EXPERIMENTAL RESULTS

Page 38 of 38 Signal Processing Tools for Speech Recognition CONCLUSION The front-end performs signal processing for speech recognition systems; The ISIP front-end is implemented on an extensible library of basic DSP building blocks; A block diagram interface is used to configure the front-end data flow; The tool’s usability was optimized through multiple training sessions with new users; The system’s correctness was verified through speech recognition experiments.