Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Recognition.

Similar presentations


Presentation on theme: "University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Recognition."— Presentation transcript:

1

2 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Recognition Research in Joensuu Speech and Image Processing Unit (SIPU) http://cs.joensuu.fi/sipu/ http://cs.joensuu.fi/sipu/ Puheteknologian talviseminaari Pasi Fränti Joensuu 10.3.2006

3 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Goals for PUMS season 3 (1/2) 1.Usability of automatic speaker identification in forensic applications 2.Compatibility with large databases 3.Automatization of LTAS + fusion with MFCC. 4.Voice activity detection

4 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Goals for PUMS season 3 (2/2) 5.Speaker verification in real (noisy) environment 6.Prototype for access control 7.Solving technical requirements for prototype in elevator. 8.Usability for detecting sound sources in general 9.Key word search (using HTK or Lingsoft Recognizer)

5 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Research Group Pasi Fränti Professor Juhani Saastamoinen, PhLic Tomi Kinnunen, PhD (Singapore) Ville Hautamäki, MSc Ismo Kärkkäinen, MSc PUMS personnel Marko Tuononen, BSc Doctoral researchers Collaborators Rosa Gonzalez-Hautamäki, MSc Ilja Sidoroff Victoria Yanulevskaya Evgeny Karpov, MSc (NRC)

6 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 1. Applicability to forensic applications •Automatic speaker recognition study has been done. •Results are not reported but actions taken within tasks 3 and 4. •Material can be found in Kinnunen’s PhD thesis [4] and Niemi-Laitinen’s presentation.

7 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 2. Support for large databases - Not yet done -

8 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 3. LTAS and other features •Automatic calculation of LTAS done. Integration to WinSprofiler in progress. Reporting in progress. •Benefit of LTAS is merely its speed and ease of use: no difficult control parameters. •No additional benefit to recognition accuracy. MFCC includes the same information. •Could be used for preliminary pruning in case of large datasets.

9 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Noise robustness of F0 feature Results reported in [3, 5]

10 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 4. Voice activity detection •Software for speech segmentation (VoiceGrep). •Command line version for Linux. •Windows version in WinSprofiler. •Testing done in SIPU laboratory. –Labtec® pc mic 333, 44,1 kHz –Recordings were emphasized 24 dB by Audacity voice editor

11 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 4a. Test material and results •Material –4 hours in total. –Bad quality recordings: 11 bits data, of which 4-5 informatio, and the rest noise. –VoiceGrep made 168 detections: –56 speech (33%) –112 non-speech (67%) •Material included 71 real speech segments: –Average segment length 16 s. –VoiceGrep found 25 of these (35 %)

12 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 4b. VoiceGrep overall results

13 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 4c. VoiceGrep example (Correct detection) Start of the speech is detected correctly End of the speech is missed Play sample #1

14 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Door openingRunning water WalkingDoor 4d. VoiceGrep example (false detections) Play sample #2Play sample #3

15 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 4e. VoiceGrep example (missed speech segment) Door Speech and walking Door Play sample #4

16 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 4f. Entire data set (4 hours) Speech segments Result of VoiceGrep Data

17 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 5. Speaker verification in noisy environment •Systematic testing of the effective parameters has been reported in [1]. •Applicability of speaker verification in real environment has been reported in [2] and in Kinnunen’s PhD thesis [5]. •Additional testing will be done if enough time.

18 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 5a. Text-dependent verification in access control • Utilizing time series information improves recognition. • Best result if everyone has their own password.

19 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 6. Prototype for access control Microphone Motion detector Emergency button

20 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 7. Calling elevator (technical requirements) •Communication with OPC-server: –Implemented with Matrikon server. •Program logic to elevator implemented: –Reads variables from OPC-server. –Interprets and shows elevator status. –Includes recording logic. •Speaker and voice related stuff: –Not yet implemented. –Main window does not show anything yet.

21 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 8. Usability for detecting sound sources in general - Not yet done -

22 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi 9. Keyword search - Not yet done -

23 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Publications (season 3) 1.J. Saastamoinen, Z. Fiedler, T. Kinnunen and P. Fränti, "On factors affecting MFCC-based speaker recognition accuracy", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 503-506, October 2005. 2.H. Gupta, V. Hautamäki, T. Kinnunen and P. Fränti, "Field evaluation of text-dependent speaker recognition in an access control application", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 551-554, October 2005. 3.T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0 Modeling for Text-Independent Speaker Recognition" Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 567-570, October 2005.

24 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Theses (season 3) Opinnäytetyöt 4.T. Kinnunen, "Optimizing Spectral Feature Based Text Independent Speaker Recognition”, PhD thesis, University of Joensuu, June 2005. 5.R. Gonzalez-Hautamäki, "Fundamental Frequency Estimation and Modeling for Speaker Recognition”, MSc thesis, University of Joensuu, July 2005.

25 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Verification Speaker Identification Speaker Recognition Whose voice is this?Is this Bob’s voice? (Claim) + Verification Imposter! ? Identification Applications scenarios

26 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Software 1: Console program

27 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Software 2: WinSprofiler

28 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Software 3: Symbian Port to Symbian OS with Series 60 UI platform

29 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Software 4: Door SProfiler Opening laboratory door by speaking

30 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Software 5: Lift SProfiler (to appear in season 4 perhaps…)

31 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Future development (1) VAD WinSprofiler Windows (JoY) Mobile Series 60 (JoY) SRLIB: MSE GMM MFCC VQ DB support LTAS F0 extraction fusion by weighted MSE Keyword search Software integration

32 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Future development (2) Classifier fusion srlib DB Access control Speech analyzer tool Forensic applications Segmentation VAD common speaker recognition app. interface Verification Calling elevator Keyword search Call center Applications

33 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Future development (3) •Implement and integrate F0, maybe also other formants (F1, F2). •Automatic voiced/unvoiced segmentation. •User enrollment. •Use of sequence information (triplets). •Development of WinSprofiler software to the direction of voice profiler and speech analyzer tool! Technical development

34 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi OPC server Machine room CAN Ethernet TCP/IP Microphone Display OPC client LiftCaller SRLIB 3.0 Approach detection DCOM Lift car & hardware Our PC GW box Future development (4) Elevator prototype

35 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Vision 1: Teleconferencing Unkonwn Bob Minna Alice VPN Paul Speaker Recognition Speaker Recognition Speaker Recognition Speaker Recognition Speaker Recognition Alice Bob Minna Unknown Verified & allowed Not registered

36 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Vision 2: Call-center • Speech is the main tool for people in call-center • Voice login of personell •Removes the need for manual entry

37 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Vision 3: Language recognition •Related problem to speaker recognition – the same research groups usually study both problems. •Not trivial to solve. •Studied a lot for Asian languages, even for rare languages that do not have any ”written form”.

38 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Vision 4: Medical applications •Doctor use voice to record summary of patient meetings. •Access by keyword search. •Annotation. •Authentication of speaker.

39 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Thank for you patience! Questions?


Download ppt "University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Recognition."

Similar presentations


Ads by Google