University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Automatic.

Slides:



Advertisements
Similar presentations
Chapter 19 Fast Fourier Transform
Advertisements

FINITE WORD LENGTH EFFECTS
Low-Complexity Transform and Quantization in H.264/AVC
doc.: IEEE <doc#>
DCSP-13 Jianfeng Feng Department of Computer Science Warwick Univ., UK
Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical.
System Integration and Performance
1 Photometric Stereo Reconstruction Dr. Maria E. Angelopoulou.
Digital Filter Banks The digital filter bank is set of bandpass filters with either a common input or a summed output An M-band analysis filter bank is.
The Fast Fourier Transform (and DCT too…)
University of Eastern Finland School of Computing P.O. Box 111 FIN Joensuu Tel fax Bluetooth Mikko.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-MST -based.
University of Eastern Finland School of Computing P.O. Box 111 FIN Joensuu FINLAND Tel fax K-means*:
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Demonstration.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-means example.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Comparison.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
Random Swap EM algorithm for GMM and Image Segmentation
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
CENG536 Computer Engineering Department Çankaya University.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Representing Acoustic Information
Numerical algorithms for power system protection Prof. dr. sc. Ante Marušić, doc. dr. sc. Juraj Havelka University of Zagreb Faculty of Electrical Engineering.
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Lecture 1 Signals in the Time and Frequency Domains
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Foundations of Computer Science Computing …it is all about Data Representation, Storage, Processing, and Communication of Data 10/4/20151CS 112 – Foundations.
EE302 Lesson 19: Digital Communications Techniques 3.
Modeling speech signals and recognizing a speaker.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Implementing a Speech Recognition System on a GPU using CUDA
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
Jacob Zurasky ECE5526 – Spring 2011
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.
Basics of Neural Networks Neural Network Topologies.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Peter Tummeltshammer, Martin Delvai
1 M 277 (60 h) Mathematics for Computer Sciences Bibliography  Discrete Mathematics and its applications, Kenneth H. Rosen  Numerical Analysis, Richard.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
CS434/534: Topics in Networked (Networking) Systems Network OS Abstraction: From Data to Function Store; Wireless Foundation: Frequency-Domain Analysis.
CS 591 S1 – Computational Audio
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Embedded Systems Design
Fast Fourier Transform
3. Applications to Speaker Verification
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Fixed-point Analysis of Digital Filters
Presentation transcript:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Automatic Speaker Recognition for Series 60 Mobile Devices University of Joensuu, Department of Computer Science Specom’2004, Sep 20, 2004 Juhani Saastamoinen, Evgeny Karpov, Ville Hautamäki, and Pasi Fränti

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Background •Project in National FENIX programme –New Methods and Applications in Speech Technology •7 research institutes •Project partners: NRC, Lingsoft, National Bureau of Investigation, etc. •Joensuu: Speaker Recognition •

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Research Group Pasi Fränti Professor Juhani Saastamoinen Project manager Evgeny Karpov Project researcher Ville Hautamäki Project researcher Tomi Kinnunen Researcher Ismo Kärkkäinen Clustering algorithms PUMS project

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Application Scenarios Speaker Verification Speaker Identification Speaker Recognition Whose voice is this?Is this Bob’s voice? (Claim) + Verification Imposter! ? Identification

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Project Goal Port speaker recognition to Series 60 mobile phone

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Symbian Phones •Series 60 phone features: –16 MB ROM –8 MB RAM –176 x 208 display –ARM-processor –No floating-point unit!!! Series 80 Series 60 UIQ

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Symbian OS •Defined by Symbian consortium •Based on EPOC •Operating system for mobile phones –Real-time system –Long uptime required •Multitasking, multithreading

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Problems of Porting •Usual considerations when porting to phone –GUI event driven program(ming) –Platform specific programming model –Real-time system, exceptions •Application specific porting problems –Number crunching without floating point unit!!! –Signal processing numerically challenging

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Identification System Speaker Recognition: Classify input speech based on existing profiles Signal Processing Feature Extraction Speaker Modelling: Create speaker profile Feature Vectors Speech Audio Add speaker profiles during training Read and use all profiles during recognition Decision Speaker Profile Database

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax MFCC Signal Processing Time windowing DFTAbs Filter bank Log DCT Digital speech signal frame Feature vector Pre- emphasis •pre-emph. coeff. 0.97, Hamm window, 30 triangular mel-filters, base-2 logarithm, output 12 MFCC's

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Fixed-Point Implementation •Numerical analysis needed for fixed- point arithmetic implementation •Truncation and re-scaling to avoid overflows in the converted algorithm •Minimize information loss caused by computation in fixed-point arithmetic –Minimize relative error

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax FFT, Fixed-Point •Frequency spectrum of speech –Biggest source of numerical error –Butterflies have multiplications –Layers repeat truncation errors •Fixed number of bits per element –32, native integer size in many systems •Reference implementation: FFTGEN –

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax FFTGEN (16/16) •Multiplication: 32 x 32 -bit result must fit in 32 bits: truncate input •FFTGEN: Truncate inputs to 16/16 bits 32-bit multiplication result FFT layer inputFFT Twiddle FactorX X 16-bit integer FFT layer output (part of it) Crop-off for next layer: 16 bits! 16-bit integer 16 used bits16 crop-off bits

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Info Preserving FFT (22/10) •Approximate DFT operator F with G •Increase ||F-G||, preserve more signal information –minimize maximum relative error in scaled sine values with respect to scale; 980 good for FFT sizes up to 1024 –Truncate multiplication inputs to 22/10 bits (signal/op) 22 used bits 10 crop-off bits 32-bit multiplication result X 32-bit integer, 22 bits used16-bit integer, 10 bits used 32-bit integer FFT layer inputFFT Twiddle FactorX FFT layer output (part of it) Crop-off for next layer: 10 bits

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax FFT Spectrum, Fixed-Point original TIMIT signal TIMIT signal x 4 16/16 abs values22/10 abs values •x-axis: fixed-point FFT element abs. values •y-axis: correct FFT element abs. values

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Scale of Error in Proposed FFT 16/1622/10 Log10 of relative error in FFT elements 16/1622/10 average standard deviation

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax •Compute complex absolute values using maximum coordinate and coordinate ratio •Suppose |x| > |y| for z = x + i y, then •Interpret the (squared) y/x by t •Approx. square root by a polynomial P(t) •Constant time algorithm (vs. Newton) Magnitude Spectrum, Fixed-Point

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Logarithm, Fixed-Point •Use base 2 instead of base 10 –corresponds to output multiplication •Standard technique: –Return problem to interval [1,2) –Use linear interpolation from values stored in a look-up table –8 bits used for indexing the look-up table values

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Rest of System, Fixed-Point •No improvement needed in VQ/GLA •Should apply similar technique as with FFT to other signal processing –Pre-emphasis, utilize full 32 bits –Time windowing, use less bits in windowing function –FB, use less bits in frequency responses –DCT, use less bits for the cosines

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Effect of Signal Processing •TIMIT data sets, varying number of speakers (N) •For each N repeat (6x, 5x, 2x) train/recognize cycles (eliminate GLA initial solution randomness) •FFTGEN: FFT with 16/16 multiplication •Fixed-point: use proposed 22/10 FFT •Mixed: floating-point DSP, fixed-point GLA/VQ

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Effect of Signal Quality •GSM/PC data: 16 aligned dual recordings •All computations in floating-point arith. •Signal recorded with laptop and PC mic gives average recognition rate 100% •Signal recorded with Nokia 3660 results in average recognition rate 84,9%

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Conclusion •Speaker identification was ported to Symbian Series 60 mobile phone •22/10 bit usage in multiplication proposed instead of “standard” 16/16 •Experiments indicate that recognition accuracy improves from 68% to 95%