Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Automatic.

Similar presentations


Presentation on theme: "University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Automatic."— Presentation transcript:

1 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Automatic Speaker Recognition for Series 60 Mobile Devices University of Joensuu, Department of Computer Science Specom’2004, Sep 20, 2004 Juhani Saastamoinen, Evgeny Karpov, Ville Hautamäki, and Pasi Fränti

2 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Background •Project in National FENIX programme –New Methods and Applications in Speech Technology •7 research institutes •Project partners: NRC, Lingsoft, National Bureau of Investigation, etc. •Joensuu: Speaker Recognition •http://cs.joensuu.fi/pages/pums

3 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Research Group Pasi Fränti Professor Juhani Saastamoinen Project manager Evgeny Karpov Project researcher Ville Hautamäki Project researcher Tomi Kinnunen Researcher Ismo Kärkkäinen Clustering algorithms PUMS project

4 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Application Scenarios Speaker Verification Speaker Identification Speaker Recognition Whose voice is this?Is this Bob’s voice? (Claim) + Verification Imposter! ? Identification

5 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Project Goal Port speaker recognition to Series 60 mobile phone

6 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Symbian Phones •Series 60 phone features: –16 MB ROM –8 MB RAM –176 x 208 display –ARM-processor –No floating-point unit!!! Series 80 Series 60 UIQ

7 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Symbian OS •Defined by Symbian consortium •Based on EPOC •Operating system for mobile phones –Real-time system –Long uptime required •Multitasking, multithreading

8 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Problems of Porting •Usual considerations when porting to phone –GUI event driven program(ming) –Platform specific programming model –Real-time system, exceptions •Application specific porting problems –Number crunching without floating point unit!!! –Signal processing numerically challenging

9 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Identification System Speaker Recognition: Classify input speech based on existing profiles Signal Processing Feature Extraction Speaker Modelling: Create speaker profile Feature Vectors Speech Audio Add speaker profiles during training Read and use all profiles during recognition Decision Speaker Profile Database

10 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi MFCC Signal Processing Time windowing DFTAbs Filter bank Log DCT Digital speech signal frame Feature vector Pre- emphasis •pre-emph. coeff. 0.97, Hamm window, 30 triangular mel-filters, base-2 logarithm, output 12 MFCC's

11 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Fixed-Point Implementation •Numerical analysis needed for fixed- point arithmetic implementation •Truncation and re-scaling to avoid overflows in the converted algorithm •Minimize information loss caused by computation in fixed-point arithmetic –Minimize relative error

12 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi FFT, Fixed-Point •Frequency spectrum of speech –Biggest source of numerical error –Butterflies have multiplications –Layers repeat truncation errors •Fixed number of bits per element –32, native integer size in many systems •Reference implementation: FFTGEN –http://www.jjj.de/fft/fftgen.tgz

13 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi FFTGEN (16/16) •Multiplication: 32 x 32 -bit result must fit in 32 bits: truncate input •FFTGEN: Truncate inputs to 16/16 bits 32-bit multiplication result FFT layer inputFFT Twiddle FactorX X 16-bit integer FFT layer output (part of it) Crop-off for next layer: 16 bits! 16-bit integer 16 used bits16 crop-off bits

14 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Info Preserving FFT (22/10) •Approximate DFT operator F with G •Increase ||F-G||, preserve more signal information –minimize maximum relative error in scaled sine values with respect to scale; 980 good for FFT sizes up to 1024 –Truncate multiplication inputs to 22/10 bits (signal/op) 22 used bits 10 crop-off bits 32-bit multiplication result X 32-bit integer, 22 bits used16-bit integer, 10 bits used 32-bit integer FFT layer inputFFT Twiddle FactorX FFT layer output (part of it) Crop-off for next layer: 10 bits

15 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi FFT Spectrum, Fixed-Point original TIMIT signal TIMIT signal x 4 16/16 abs values22/10 abs values •x-axis: fixed-point FFT element abs. values •y-axis: correct FFT element abs. values

16 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Scale of Error in Proposed FFT 16/1622/10 Log10 of relative error in FFT elements 16/1622/10 average-0.775-2.118 standard deviation0.7970.590

17 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi •Compute complex absolute values using maximum coordinate and coordinate ratio •Suppose |x| > |y| for z = x + i y, then •Interpret the (squared) y/x by t •Approx. square root by a polynomial P(t) •Constant time algorithm (vs. Newton) Magnitude Spectrum, Fixed-Point

18 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Logarithm, Fixed-Point •Use base 2 instead of base 10 –corresponds to output multiplication •Standard technique: –Return problem to interval [1,2) –Use linear interpolation from values stored in a look-up table –8 bits used for indexing the look-up table values

19 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Rest of System, Fixed-Point •No improvement needed in VQ/GLA •Should apply similar technique as with FFT to other signal processing –Pre-emphasis, utilize full 32 bits –Time windowing, use less bits in windowing function –FB, use less bits in frequency responses –DCT, use less bits for the cosines

20 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Effect of Signal Processing •TIMIT data sets, varying number of speakers (N) •For each N repeat (6x, 5x, 2x) train/recognize cycles (eliminate GLA initial solution randomness) •FFTGEN: FFT with 16/16 multiplication •Fixed-point: use proposed 22/10 FFT •Mixed: floating-point DSP, fixed-point GLA/VQ

21 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Effect of Signal Quality •GSM/PC data: 16 aligned dual recordings •All computations in floating-point arith. •Signal recorded with laptop and PC mic gives average recognition rate 100% •Signal recorded with Nokia 3660 results in average recognition rate 84,9%

22 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Conclusion •Speaker identification was ported to Symbian Series 60 mobile phone •22/10 bit usage in multiplication proposed instead of “standard” 16/16 •Experiments indicate that recognition accuracy improves from 68% to 95%


Download ppt "University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Automatic."

Similar presentations


Ads by Google