In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi.

in collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi State University SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION Presented by Richard Duncan Tablet PC Microsoft Corporation

of 38 Signal Processing Tools for Speech Recognition WHICH TWO ARE THE SAME PHONEME? We need to extract meaningful information from the signal for a speech recognition system to model

of 38 Signal Processing Tools for Speech Recognition WHICH TWO ARE THE SAME PHONEME? a: “ow”b: “aa”c: “ow”

of 38 Signal Processing Tools for Speech Recognition WHAT IS AN ACOUSTIC FRONT-END? It encapsulates the signal processing of a speech recognition system. It computes a sequence of feature vectors from an audio stream. These vectors are then processed by HMMs, neural networks, or other classifiers.

of 38 Signal Processing Tools for Speech Recognition WHY REINVENT THE WHEEL? A Front-end has many areas of complexity: Run-time efficiency File I/O Data management (framing) DSP algorithm complexity Algorithm re-use Our system abstracts the researcher/student from these mundane issues to so he or she can focus on the algorithms

of 38 Signal Processing Tools for Speech Recognition DATA FRAMING frame n frame n+1 window n window n+1 New dataShared data

of 38 Signal Processing Tools for Speech Recognition FEATURES OF ISIP FOUNDATION CLASSES Efficient memory management and tracking; System and I/O libraries that abstract details of the operating system; Math classes that provide basic linear algebra and efficient matrix manipulations; Generic data structures; Built-in unit tests to verify component correctness.

of 38 Signal Processing Tools for Speech Recognition DESIGN REQUIREMENTS A library of standard algorithms provides basic digital signal processing (DSP) functions; New algorithms can be added without modifying existing classes; A block diagram tool allows rapid prototyping without programming or recompiling; The same system is used for offline feature extraction, recognition, and general DSP work.

of 38 Signal Processing Tools for Speech Recognition BASIC DIGITAL PROCESSING FUNCTIONS This example shows how to realize the basic digital signal processing functions. It computes the energy of input vector in dB using the SUM algorithm: // declare an Energy object, input vector, and output vector Energy egy; VectorFloat output; VectorFloat input(L"0, 1, 2"); // choose algorithm enrgy.setAlgorithm(Energy::SUM); // choose implementation egy.setImplementation(Energy::DB); // compute the energy of input data egy.compute(output, input);

of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS class AlgorithmBase : // Processing: virtual boolean init(); virtual boolean apply(); // Configuration: virtual const String& className() const; virtual long getLeadingPad() const; virtual long getTrailingPad() const; virtual CMODE getOutputMode() const; virtual float getOutputSampleFrequency() const; virtual boolean setParser(); // Debugging: boolean displayStart(); boolean displayFinish(); boolean displayChannel(); boolean display(); } Interface contract allows extensibility to new algorithms; All algorithms are classes that implement this interface; Most have a default implementation.

of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS boolean Energy::init() { } const String& className() const { return CLASS_NAME; } int GetLeadingPad() const { return 0; } int GetTrailingPad() const { return 0; } bool Apply(Vector output, Vector input) { // determine what channel to operate on … if (algorithm_d == SUM) { computeSum(output(0).makeVectorFloat(), input(0).getVectorFloat()); } … }

of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS boolean Energy::computeSum(VectorFloat& output_a, const VectorFloat& input_a) { // compute the sum of squares Float e = input_a.sumSquare(); // compute the scale factor according to specified implementation float scaled_energy = scale(e, input_a.length()); // the length of the output vector should be 1 as it only contains the energy output_a.setLength(1); // assign the value of energy to the output output_a(0) = Integral::max(floor_d, scaled_energy); // exit gracefully return true; }

of 38 Signal Processing Tools for Speech Recognition DEFINITIONS Algorithm: Input and output is an array of floating point numbers Correspond to basic DSP principles Recipe: Collection of algorithms which are run serially, output of A n-1 is the input to A n Named input and outputs Allows reuse of processing blocks between systems

of 38 Signal Processing Tools for Speech Recognition HIERARCHY OF ALGORITHM CLASSES

of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL Design a front-end by creating a block diagram; Allows rapid prototyping of ideas. New modules can easily be added into the system Parameter file is then the input to a full speech recognition system

of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

of 38 Signal Processing Tools for Speech Recognition RESPONSIBILITIES OF THE UTILITY Parses the file containing the recipe created in the configuration tool; Synchronizes different paths along the block flow diagram contained in the recipe; Prepares input and output data buffers for each algorithm; Schedules the sequence of required signal processing operations; Processes data through the recipe; Manages large collections of data files.

of 38 Signal Processing Tools for Speech Recognition VERIFICATION STRATEGY The correctness: The implementation of each algorithm is verified manually or by using other tools such as MATLAB. Usability: Assessed and enhanced the usability of our tools through extensive user testing conducted over the course of several training sessions. Speech recognition experiments: The correctness of the tools was also verified by speech recognition experiments.

of 38 Signal Processing Tools for Speech Recognition STATE-OF-THE-ART FEATURES Mel-frequency cepstral coefficients (MFCCs); Cepstral mean subtraction; Energy normalization; 1 st and 2 nd order differential features; These features are used by most commercial speech recognition systems.

of 38 Signal Processing Tools for Speech Recognition EXPERIMENTAL RESULTS

of 38 Signal Processing Tools for Speech Recognition CONCLUSION The front-end performs signal processing for speech recognition systems; The ISIP front-end is implemented on an extensible library of basic DSP building blocks; A block diagram interface is used to configure the front-end data flow; The tool’s usability was optimized through multiple training sessions with new users; The system’s correctness was verified through speech recognition experiments.

In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi.

Similar presentations

Presentation on theme: "In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi.

Similar presentations

Presentation on theme: "In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi."— Presentation transcript:

Similar presentations

About project

Feedback