Presentation is loading. Please wait.

Presentation is loading. Please wait.

In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi.

Similar presentations


Presentation on theme: "In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi."— Presentation transcript:

1 in collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi State University SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION Presented by Richard Duncan Tablet PC Microsoft Corporation

2 Page 1 of 38 Signal Processing Tools for Speech Recognition WHICH TWO ARE THE SAME PHONEME? We need to extract meaningful information from the signal for a speech recognition system to model

3 Page 2 of 38 Signal Processing Tools for Speech Recognition WHICH TWO ARE THE SAME PHONEME? a: “ow”b: “aa”c: “ow”

4 Page 3 of 38 Signal Processing Tools for Speech Recognition WHAT IS AN ACOUSTIC FRONT-END? It encapsulates the signal processing of a speech recognition system. It computes a sequence of feature vectors from an audio stream. These vectors are then processed by HMMs, neural networks, or other classifiers.

5 Page 4 of 38 Signal Processing Tools for Speech Recognition WHY REINVENT THE WHEEL? A Front-end has many areas of complexity: Run-time efficiency File I/O Data management (framing) DSP algorithm complexity Algorithm re-use Our system abstracts the researcher/student from these mundane issues to so he or she can focus on the algorithms

6 Page 5 of 38 Signal Processing Tools for Speech Recognition DATA FRAMING frame n frame n+1 window n window n+1 New dataShared data

7 Page 6 of 38 Signal Processing Tools for Speech Recognition FEATURES OF ISIP FOUNDATION CLASSES Efficient memory management and tracking; System and I/O libraries that abstract details of the operating system; Math classes that provide basic linear algebra and efficient matrix manipulations; Generic data structures; Built-in unit tests to verify component correctness.

8 Page 7 of 38 Signal Processing Tools for Speech Recognition DESIGN REQUIREMENTS A library of standard algorithms provides basic digital signal processing (DSP) functions; New algorithms can be added without modifying existing classes; A block diagram tool allows rapid prototyping without programming or recompiling; The same system is used for offline feature extraction, recognition, and general DSP work.

9 Page 8 of 38 Signal Processing Tools for Speech Recognition BASIC DIGITAL PROCESSING FUNCTIONS This example shows how to realize the basic digital signal processing functions. It computes the energy of input vector in dB using the SUM algorithm: // declare an Energy object, input vector, and output vector Energy egy; VectorFloat output; VectorFloat input(L"0, 1, 2"); // choose algorithm enrgy.setAlgorithm(Energy::SUM); // choose implementation egy.setImplementation(Energy::DB); // compute the energy of input data egy.compute(output, input);

10 Page 9 of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS class AlgorithmBase : // Processing: virtual boolean init(); virtual boolean apply(); // Configuration: virtual const String& className() const; virtual long getLeadingPad() const; virtual long getTrailingPad() const; virtual CMODE getOutputMode() const; virtual float getOutputSampleFrequency() const; virtual boolean setParser(); // Debugging: boolean displayStart(); boolean displayFinish(); boolean displayChannel(); boolean display(); } Interface contract allows extensibility to new algorithms; All algorithms are classes that implement this interface; Most have a default implementation.

11 Page 10 of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS boolean Energy::init() { } const String& className() const { return CLASS_NAME; } int GetLeadingPad() const { return 0; } int GetTrailingPad() const { return 0; } bool Apply(Vector output, Vector input) { // determine what channel to operate on … if (algorithm_d == SUM) { computeSum(output(0).makeVectorFloat(), input(0).getVectorFloat()); } … }

12 Page 11 of 38 Signal Processing Tools for Speech Recognition ADDING NEW ALGORITHMS boolean Energy::computeSum(VectorFloat& output_a, const VectorFloat& input_a) { // compute the sum of squares Float e = input_a.sumSquare(); // compute the scale factor according to specified implementation float scaled_energy = scale(e, input_a.length()); // the length of the output vector should be 1 as it only contains the energy output_a.setLength(1); // assign the value of energy to the output output_a(0) = Integral::max(floor_d, scaled_energy); // exit gracefully return true; }

13 Page 12 of 38 Signal Processing Tools for Speech Recognition DEFINITIONS Algorithm: Input and output is an array of floating point numbers Correspond to basic DSP principles Recipe: Collection of algorithms which are run serially, output of A n-1 is the input to A n Named input and outputs Allows reuse of processing blocks between systems

14 Page 13 of 38 Signal Processing Tools for Speech Recognition HIERARCHY OF ALGORITHM CLASSES

15 Page 14 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL Design a front-end by creating a block diagram; Allows rapid prototyping of ideas. New modules can easily be added into the system Parameter file is then the input to a full speech recognition system

16 Page 15 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

17 Page 16 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

18 Page 17 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

19 Page 18 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

20 Page 19 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

21 Page 20 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

22 Page 21 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

23 Page 22 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

24 Page 23 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

25 Page 24 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

26 Page 25 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

27 Page 26 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

28 Page 27 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

29 Page 28 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

30 Page 29 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

31 Page 30 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

32 Page 31 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

33 Page 32 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

34 Page 33 of 38 Signal Processing Tools for Speech Recognition FRONT-END CONFIGURATION TOOL

35 Page 34 of 38 Signal Processing Tools for Speech Recognition RESPONSIBILITIES OF THE UTILITY Parses the file containing the recipe created in the configuration tool; Synchronizes different paths along the block flow diagram contained in the recipe; Prepares input and output data buffers for each algorithm; Schedules the sequence of required signal processing operations; Processes data through the recipe; Manages large collections of data files.

36 Page 35 of 38 Signal Processing Tools for Speech Recognition VERIFICATION STRATEGY The correctness: The implementation of each algorithm is verified manually or by using other tools such as MATLAB. Usability: Assessed and enhanced the usability of our tools through extensive user testing conducted over the course of several training sessions. Speech recognition experiments: The correctness of the tools was also verified by speech recognition experiments.

37 Page 36 of 38 Signal Processing Tools for Speech Recognition STATE-OF-THE-ART FEATURES Mel-frequency cepstral coefficients (MFCCs); Cepstral mean subtraction; Energy normalization; 1 st and 2 nd order differential features; These features are used by most commercial speech recognition systems.

38 Page 37 of 38 Signal Processing Tools for Speech Recognition EXPERIMENTAL RESULTS

39 Page 38 of 38 Signal Processing Tools for Speech Recognition CONCLUSION The front-end performs signal processing for speech recognition systems; The ISIP front-end is implemented on an extensible library of basic DSP building blocks; A block diagram interface is used to configure the front-end data flow; The tool’s usability was optimized through multiple training sessions with new users; The system’s correctness was verified through speech recognition experiments.


Download ppt "In collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi."

Similar presentations


Ads by Google