CMSC Assignment 1 Audio signal processing

Slides:

Advertisements

Similar presentations

Time-Frequency Analysis Analyzing sounds as a sequence of frames

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Speech and Audio Processing and Recognition

F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

A PRESENTATION BY SHAMALEE DESHPANDE

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

Representing Acoustic Information

Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.

EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

The sampling of continuous-time signals is an important topic It is required by many important technologies such as: Digital Communication Systems ( Wireless.

LE 460 L Acoustics and Experimental Phonetics L-13

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Craig Holmes Brad Klippstein Andrew Pottkotter Dustin Osborn.

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.

Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.

Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 

Wireless and Mobile Computing Transmission Fundamentals Lecture 2.

Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.

Implementing a Speech Recognition System on a GPU using CUDA

Jacob Zurasky ECE5526 – Spring 2011

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

The Discrete Fourier Transform

Speech Processing Using HTK Trevor Bowden 12/08/2008.

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.

PATTERN COMPARISON TECHNIQUES

Ch. 2 : Preprocessing of audio signals in time and frequency domain

Ch. 5: Speech Recognition

ARTIFICIAL NEURAL NETWORKS

Spoken Digit Recognition

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Cepstrum and MFCC Cepstrum MFCC Speech processing.

EE Audio Signals and Systems

CHAPTER 3 DATA AND SIGNAL

Linear Prediction.

Homework 1 (Due: 11th Oct.) (1) Which of the following applications are the proper applications of the short -time Fourier transform? Also illustrate.

Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

LECTURE 18: FAST FOURIER TRANSFORM

Digital Systems: Hardware Organization and Design

Measuring the Similarity of Rhythmic Patterns

LECTURE 18: FAST FOURIER TRANSFORM

Presentation transcript:

CMSC Assignment 1 Audio signal processing CMSC5707 Topics in A.I. CMSC Assignment 1 Audio signal processing Assignment 1 of CMSC5707 V4c

Task 1 (5%) Recording of the templates: Use your own sound recording device (e.g. mobile phone, windows-sound-recorder or http://www.goldwave.com/) to record the numbers 1,2,3,4 and name these files as s1A.wav, s2A.wav, s3A.wav and s4A.wav, respectively. Each word should last about 0.60.8 seconds and use http://format-factory.en.softonic.com/ to convert your file to .wav if necessary. (You may choose English or Cantonese or Mandarin to pronounce these words). These four files are called set A to be used as templates of our speech recognition system. You may use any sampling rate (Fs) and bits per second (bps) value. However, typical values are Fs=22050 Hz (or lower) and bps=16 bits per second. Assignment 1 of CMSC5707 V4c

Task 2 (5%) Recording for the testing data: Repeat the above recording procedures of the same four numbers: 1, 2, 3 and 4, and save the four files as : s1B.wav, s2B.wav, s3B.wav and s4B.wav , respectively. They are to be used as testing data in our speech recognition system. Assignment 1 of CMSC5707 V4c

Task 3 (5%) Plotting: Pick one wav file out of your sound files (e.g. x.wav), read the file and plot the time domain signal. (Hint: you may use “wavread”, “plot” in MATLAB or OCTAVE. Type “>help wavread” , “>help plot” in MATLAB to learn how to use them.) Plot x.wav and save it in a picture file “x.jpg”. Assignment 1 of CMSC5707 V4c

Task 4 (35%) Signal analysis: From “x.wav”, write a program to find the start (T1) and stop (T2) locations in time (ms) of your four recorded sounds automatically. Extract one segment called Seg1 (20 ms of your choice of location) of the voiced vowel part of x.wav between T1 and T2. Seg1 can be saved as an array in C++ or a vector in MATLAB / OCTAVE . You may choose the segment by manual inspection and hardcode the locations in your program. Find and plot the Fourier transform (energy against frequency) of Seg1. The energy is equal to |Square_root ([real]^2+[imaginary]^2)| . The horizontal axis is frequency and the vertical axis is energy. Label the axes of the plot. Save the plot as “fourier_x.jpg”. Find the pre-emphasis signal (pem_Seg1) of Seg1 if the pre-emphasis constant α is 0.945. Plot Seg1 and Pem_Seg1. Submit your program. Find the 10 LPC parameters if the order of LPC for Pem_seg1 is 10. You should write your autocorrelation code, but you may use the inverse function (inv) in MATLAB/OCTAVE to solve the linear matrix equation. Assignment 1 of CMSC5707 V4c

Task 5 (50%) Build a speech recognition system: You may use any Matlab/Octvae functions you like in this part. Use the tool at http://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab to extract the MFCC parameters (Mel-frequency cepstrum http://en.wikipedia.org/wiki/Mel-frequency_cepstrum) from your sound files. Each sound file (.wav) will give one set of MFCC parameters. See “A tutorial of using the htk-mfcc tool” in the appendix of how to extract MFCC parameters. Build a dynamic programming DP based four-numeral speech recognition system. Use set A as templates and set B as testing inputs. You may follow the following steps to complete your assignment. Assignment 1 of CMSC5707 V4c

MFCC parameter's extraction From http://en. wikipedia Very popular in music and speech analysis Fourier transform of (a windowed excerpt of) a signal. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows. logs of the powers at each of the mel frequencies. discrete cosine transform of the list of mel log powers, as if it were a signal. The MFCCs are the amplitudes of the resulting spectrum. Assignment 1 of CMSC5707 V4c

MFCC (inside MFCC.m) : Pre-emphasis the whole signal % Framing and windowing (frames as columns) frames = vec2frames( speech, Nw, Ns, 'cols', window, false ); % Magnitude spectrum computation (as column vectors) MAG = abs( fft(frames,nfft,1) ); % Triangular filterbank with uniformly spaced filters on mel scale H = trifbank( M, K, R, fs, hz2mel, mel2hz ); % size of H is M x K % Filterbank application to unique part of the magnitude spectrum FBE = H * MAG(1:K,:); % FBE( FBE<1.0 ) = 1.0; % apply mel floor % DCT matrix computation DCT = dctm( N, M ); % Conversion of logFBEs to cepstral coefficients through DCT CC = DCT * log( FBE ); % Cepstral lifter computation lifter = ceplifter( N, L ); % Cepstral liftering gives liftered cepstral coefficients CC = diag( lifter ) * CC; % ~ HTK's MFCCs Assignment 1 of CMSC5707 V4c

Step (a) of task5 Convert sound files in set A and set B into MFCCs parameters, so each sound file will give an MFCC matrix of size 13x70 (no_of_MFCCs_parameters x=13 and no_of_frame_segments=70). Because if the time shift is 10ms, a 0.7 seconds sound will have 70 frame segments, and there are 13 MFCC parameters for one frame. Here we use M (j,t), to represent the MFCC parameters, where ‘j’ is the index for MFCC parameters ranging from 1 to 13, ‘t’ is the index for time segment ranging from 1 to 70. Therefore a (13-parameter) sound segment at time index t is M(1:13,t). Assignment 1 of CMSC5707 V4c

Step(b) of task 5 Assume we have two short time segments (e.g. 25 ms each), one from the tth (t=28) segment of sound X (represented by 13 MFCCS parameters Mx(1:13,t=28), and another from the t’th (t’=32) time segment of sound Y (represented by MFCCS parameters My(1:13,t’=32). The distortion (dist) between these two segments is Note: The first row of the of the MFCCs (M(1,j)) matrix is the energy term and is not recommended to be used in the comparison procedures because it does not contain the relevant spectral information. So summation starts from j=2. Use dynamic programing to find the minimum accumulated distance (minimum accumulated score) between sound x and sound y. Assignment 1 of CMSC5707 V4c

Step (c) of task5 Build a speech recognition system: You should show a 4x4 comparison-matrix-table as the result. An entry to this matrix-table is the minimum accumulated distance between a sound in set A and a sound in set B. You may use the above steps to find the minimum accumulated distance for each sound pair (there should be 4x4 pairs, because there are four sound files in set A and four sound files in set B) and enter the comparison-matrix-table manually or by a program. Assignment 1 of CMSC5707 V4c

Task (d) of Step 5 Pick any one sound file from set A (e.g. the sound of ‘one’) and the corresponding sound file from set B (e.g. the sound of ‘one’), compare these two files using dynamic programing , plot the optimal path on the accumulated matrix diagram . Assignment 1 of CMSC5707 V4c

What to submit : All your programs with a readme file showing how to run them All sound files of your recordings The picture files The 4x4 comparison-matrix-table of the speech recognition system Zip all (as student_number.zip) and submit it to cmsc5707.14@gmail.com Assignment 1 of CMSC5707 V4c

Step 6 Pick any one sound file from set A (e.g. the sound of ‘one’) and the corresponding sound file from set B (e.g. the sound of ‘one’), compare these two files using dynamic programing , plot the optimal path on the accumulated matrix diagram . Assignment 1 of CMSC5707 V4c