Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Slides:



Advertisements
Similar presentations
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Digital Signal Processing – Chapter 11 Introduction to the Design of Discrete Filters Prof. Yasser Mostafa Kadah
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Top Level System Block Diagram BSS Block Diagram Abstract In today's expanding business environment, conference call technology has become an integral.
Dynamic Time Warping Applications and Derivation
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
A PRESENTATION BY SHAMALEE DESHPANDE
Application of Digital Signal Processing in Computed tomography (CT)
Representing Acoustic Information
Self-Calibrating Audio Signal Equalization Greg Burns Wade Lindsey Kevin McLanahan Jack Samet.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Dual-Channel FFT Analysis: A Presentation Prepared for Syn-Aud-Con: Test and Measurement Seminars Louisville, KY Aug , 2002.
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Pre-Class Music Paul Lansky Six Fantasies on a Poem by Thomas Campion.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
GG313 Lecture 24 11/17/05 Power Spectrum, Phase Spectrum, and Aliasing.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Predicting Voice Elicited Emotions
The Discrete Fourier Transform
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
CS654: Digital Image Analysis Lecture 11: Image Transforms.
Lecture 19 Spectrogram: Spectral Analysis via DFT & DTFT
Lecture 6 Periodic Signals, Harmonics & Time-Varying Sinusoids
Ch. 2 : Preprocessing of audio signals in time and frequency domain
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Vocoders.
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Speech Database/Tool System And Preliminary Accent study.
Leigh Anne Clevenger Pace University, DPS ’16
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
EE513 Audio Signals and Systems
LECTURE 18: FAST FOURIER TRANSFORM
Digital Systems: Hardware Organization and Design
Govt. Polytechnic Dhangar(Fatehabad)
Measuring the Similarity of Rhythmic Patterns
Keyword Spotting Dynamic Time Warping
LECTURE 18: FAST FOURIER TRANSFORM
Auditory Morphing Weyni Clacken
Presentation transcript:

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead), Geeta Bothe, Mahesh Sooryambylu, Ravi Ray, Sreeram Vancheeswaran IBM India Customer: Jonathan Leet (DPS 2013) Instructor: Dr. Charles Tappert

Common Passphrase Background: four possible types of passphrases 1. User-specified phrase, like the user's name 2. Specified phrase common to all users “My name is” from phrase “My name is user’s name” 3. Random phrase displayed on the computer screen 4. Random phrase that can vary at the user's discretion Advantages of a Common Passphrase Simplifies the segmentation problem Allows for careful selection of common phrase to optimize variety of phonetic units for their authentication value Facilitates testing for imposters Permits the measurement of true voice authentication biometric performance Avoids potential experimental flaws 2

Software Used: Audacity & Matlab Audacity Open source audio editing software supports sound recording and editing. Supports resampling and stereo to mono conversion Available all platforms: Windows, Linux, Mac Matlab Signal Processing Toolbox provides industry-standard algorithms and apps for analog and digital signal processing Supports visualizing signals in time and frequency domains, FFT computation for spectral analysis, resampling, and other signal processing techniques. 3

System Architecture 4 Collection and management of Speech Samples in repository Preprocessing and spectrogram Generation Mel Filter Banks and MFCC calculation Automatic segmentation “My name is” portion Automatic Segmentation of phonemes using DTW Feature Vector extraction Pace’s Biometric Authentication System will obtain performance results from the feature vectors

Voice Sample  Spectrogram using Matlab Input speech Sample (Mono, Samples/sec) 5 Voice stream collected into 1024 frames Samples are read sliding stream by 512 bytes, maintaining overlap Represent samples of a frame One Frame ~ 23ms since Frame size = 44100/1024 Length of one frame = 1000ms/frame size

Voice Sample  Spectrogram using Matlab Represent component frequencies of a frame after applying FFT Frequency Vs Time data Voiceprint Systems CS Spring Batch6 Represent the complete spectral data available for processing Spectrogram constructed out of the above values

Mel-Frequency bands space filters appropriately 7 Corresponds to frequency transform performed by the cochlea of human ear. Mel filters are shown below, 13 lower bands are used for processing.

Segmenting “My Name Is” Speech Waveform indicating the voiced and unvoiced segments Energy vs Zero Crossing plotted for same speech sample Non-voiced segments captures high zero crossing rate(red) and low energy(green) values Voiced segments indicate low zero crossing rate and high energy values Voiceprint Systems CS Spring Batch8 Higher frequency components of ‘z’ sound will have higher energy compared to the other phonemes Diagram shows the automatically Marked Spectrum in Matlab Vertical lines demarcate speech beginning and end of ‘z’

Seven sound units of “My name is” 9

Discrete Time Warp (TDW) Algorithm Segments a Sample into Seven Sounds DTW operates on spectrographic data: amp x freq x time To segment a speech sample into the seven sound units, a sample’s time sequence is "warped" non-linearly against a manually sound segmented sample. Voiceprint Systems CS Spring Batch10 Sample warp path represents the cost matrix and the warped path for the two time series represented long the axes If the warp path passes through D(i, j) then the sample Xi is warped to the point Yi. If there is a vertical section in the warp path, a single point in series X is warped to multiple points of series Y. The decision to find the next point in the warp W(i, j) is:

Feature Extraction Features measurements reduce data & characterize speaker The features extracted: Energy mean and variance in each frequency band over the entire utterance (~13*2 = 26 features) Energy mean in each frequency band within each of the 7 phonetic sounds (~13*7 = 91 features) Voice Fundamental Frequency (F0) – not completed Voice Formant Frequencies (F1-F3) – not completed Feature extractor output is a fixed-length vector appropriate as input to Pace University Biometric Authentication System Note: 13 is the number of frequency bands 11

System Performance 12 Feature SetPerformance Features from entire phrase98.05% Features from seven sounds98.95% Performance was measured on 20 sample utterances from each of 30 speakers, manually segmented into the seven sounds. Receiver Operating Characteristic (ROC) curves were obtained to find the Equal Error Rate (EER) and system performance from two feature sets.