University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Recognition.

Slides:



Advertisements
Similar presentations
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
Advertisements

Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Automatic.
University of Eastern Finland School of Computing P.O. Box 111 FIN Joensuu Tel fax Bluetooth Mikko.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-MST -based.
University of Eastern Finland School of Computing P.O. Box 111 FIN Joensuu FINLAND Tel fax K-means*:
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Demonstration.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-means example.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Comparison.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Chapter 6A Operating System Basics PART II.
Operating System.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Interactive Technologies Web page design version2.
Random Swap EM algorithm for GMM and Image Segmentation
Interactive Technologies Web page design. Home –About us –research intros People –Professor –Post doc, Phd and research staff –Master students –Past members.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Overview to the new Software.  New NVR integration (SRN-470D / SRN-1670D)  Net i - Ware integration (SNS-SFXXX  All door command button  Video popup.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
Development of protocols WP4 – T4.2 Torino, March 9 th -10 th 2006.
Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:
A PRESENTATION BY SHAMALEE DESHPANDE
Advisor: Prof. Tony Jebara
1 Effectively Managing Global Engineering Licenses Kimberley A. Dillman IT Solution Architect – Engineering Delphi Corporation
Hands-on: Capturing an Image with AccessData FTK Imager
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Kinect Player Gender Recognition from Speech Analysis
Requirements Engineering
Speech Technology Center Solutions for Mobile Phones.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Speaker Recognition By Afshan Hina.
Practical AT session 3 WP4-D4.2. Prepared by: Shams Eldin Mohamed Ahmed Hassan Speech, Text and Braille AT.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
A study on Prediction on Listener Emotion in Speech for Medical Doctor Interface M.Kurematsu Faculty of Software and Information Science Iwate Prefectural.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
EWatch: A Wearable Sensor and Notification Platform Paper By: Uwe Maurer, Anthony Rowe, Asim Smailagic, Daniel P. Siewiorek Presenter: Ke Gao.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Device- dependent Runs only on specific type of computer Types of Operating Systems What are some characteristics of operating systems? Next p
Speaker Verification System in a Security Application HŪDATBrian Bash Thomas Jonell Dustin Williams Advisor Dr. Les Thede.
E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
1 Session Number Presentation_ID © 2002, Cisco Systems, Inc. All rights reserved. Using the Cisco TAC Website for Security and Virtual Private Network.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Controlling Computer Using Speech Recognition (CCSR) Creative Masters Group Supervisor : Dr: Mounira Taileb.
Performance Comparison of Speaker and Emotion Recognition
Designing a Voice Activated Compartmentalized Safe with Speech Processing using Matlab Preliminary Design Review Amy Anderson Ernest Bryant Mike Joyner.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Computer main parts HARDWARE It corresponds to all physical and tangible parts of a computer: your electrical, electronic, electromechanical and mechanical.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Paper Prototyping Source: Paper Prototyping a method of brainstorming, designing, creating, testing, refining and communicating.
1 Session Number Presentation_ID © 2002, Cisco Systems, Inc. All rights reserved. Using the Cisco TAC Web Site for Network Security and Virtual Private.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Automated Testing April 2001WISQA Meeting Ronald Utz, Automated Software Testing Analyst April 11, 2001.
/16 Final Project Report By Facializer Team Final Project Report Eagle, Leo, Bessie, Five, Evan Dan, Kyle, Ben, Caleb.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Describe applications and services. Objective Course Weight 5%
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Preliminary project assignment Smart house Natural User Interface for Business NUIT4B.
Presented By Bhargav (08BQ1A0435).  Images play an important role in todays information because A single image represents a thousand words.  Google's.
Using Speech Recognition to Predict VoIP Quality
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Case Study: Real Life Unified Architecture Value
Speech Technology Center Solutions
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Presenter: Shih-Hsiang(士翔)
What's New in eCognition 9
Presentation transcript:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition Research in Joensuu Speech and Image Processing Unit (SIPU) Puheteknologian talviseminaari Pasi Fränti Joensuu

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Goals for PUMS season 3 (1/2) 1.Usability of automatic speaker identification in forensic applications 2.Compatibility with large databases 3.Automatization of LTAS + fusion with MFCC. 4.Voice activity detection

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Goals for PUMS season 3 (2/2) 5.Speaker verification in real (noisy) environment 6.Prototype for access control 7.Solving technical requirements for prototype in elevator. 8.Usability for detecting sound sources in general 9.Key word search (using HTK or Lingsoft Recognizer)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Research Group Pasi Fränti Professor Juhani Saastamoinen, PhLic Tomi Kinnunen, PhD (Singapore) Ville Hautamäki, MSc Ismo Kärkkäinen, MSc PUMS personnel Marko Tuononen, BSc Doctoral researchers Collaborators Rosa Gonzalez-Hautamäki, MSc Ilja Sidoroff Victoria Yanulevskaya Evgeny Karpov, MSc (NRC)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Applicability to forensic applications •Automatic speaker recognition study has been done. •Results are not reported but actions taken within tasks 3 and 4. •Material can be found in Kinnunen’s PhD thesis [4] and Niemi-Laitinen’s presentation.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Support for large databases - Not yet done -

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax LTAS and other features •Automatic calculation of LTAS done. Integration to WinSprofiler in progress. Reporting in progress. •Benefit of LTAS is merely its speed and ease of use: no difficult control parameters. •No additional benefit to recognition accuracy. MFCC includes the same information. •Could be used for preliminary pruning in case of large datasets.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Noise robustness of F0 feature Results reported in [3, 5]

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Voice activity detection •Software for speech segmentation (VoiceGrep). •Command line version for Linux. •Windows version in WinSprofiler. •Testing done in SIPU laboratory. –Labtec® pc mic 333, 44,1 kHz –Recordings were emphasized 24 dB by Audacity voice editor

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax a. Test material and results •Material –4 hours in total. –Bad quality recordings: 11 bits data, of which 4-5 informatio, and the rest noise. –VoiceGrep made 168 detections: –56 speech (33%) –112 non-speech (67%) •Material included 71 real speech segments: –Average segment length 16 s. –VoiceGrep found 25 of these (35 %)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax b. VoiceGrep overall results

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax c. VoiceGrep example (Correct detection) Start of the speech is detected correctly End of the speech is missed Play sample #1

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Door openingRunning water WalkingDoor 4d. VoiceGrep example (false detections) Play sample #2Play sample #3

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax e. VoiceGrep example (missed speech segment) Door Speech and walking Door Play sample #4

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax f. Entire data set (4 hours) Speech segments Result of VoiceGrep Data

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker verification in noisy environment •Systematic testing of the effective parameters has been reported in [1]. •Applicability of speaker verification in real environment has been reported in [2] and in Kinnunen’s PhD thesis [5]. •Additional testing will be done if enough time.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax a. Text-dependent verification in access control • Utilizing time series information improves recognition. • Best result if everyone has their own password.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Prototype for access control Microphone Motion detector Emergency button

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Calling elevator (technical requirements) •Communication with OPC-server: –Implemented with Matrikon server. •Program logic to elevator implemented: –Reads variables from OPC-server. –Interprets and shows elevator status. –Includes recording logic. •Speaker and voice related stuff: –Not yet implemented. –Main window does not show anything yet.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Usability for detecting sound sources in general - Not yet done -

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Keyword search - Not yet done -

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Publications (season 3) 1.J. Saastamoinen, Z. Fiedler, T. Kinnunen and P. Fränti, "On factors affecting MFCC-based speaker recognition accuracy", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, , October H. Gupta, V. Hautamäki, T. Kinnunen and P. Fränti, "Field evaluation of text-dependent speaker recognition in an access control application", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, , October T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0 Modeling for Text-Independent Speaker Recognition" Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, , October 2005.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Theses (season 3) Opinnäytetyöt 4.T. Kinnunen, "Optimizing Spectral Feature Based Text Independent Speaker Recognition”, PhD thesis, University of Joensuu, June R. Gonzalez-Hautamäki, "Fundamental Frequency Estimation and Modeling for Speaker Recognition”, MSc thesis, University of Joensuu, July 2005.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Verification Speaker Identification Speaker Recognition Whose voice is this?Is this Bob’s voice? (Claim) + Verification Imposter! ? Identification Applications scenarios

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 1: Console program

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 2: WinSprofiler

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 3: Symbian Port to Symbian OS with Series 60 UI platform

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 4: Door SProfiler Opening laboratory door by speaking

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 5: Lift SProfiler (to appear in season 4 perhaps…)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Future development (1) VAD WinSprofiler Windows (JoY) Mobile Series 60 (JoY) SRLIB: MSE GMM MFCC VQ DB support LTAS F0 extraction fusion by weighted MSE Keyword search Software integration

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Future development (2) Classifier fusion srlib DB Access control Speech analyzer tool Forensic applications Segmentation VAD common speaker recognition app. interface Verification Calling elevator Keyword search Call center Applications

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Future development (3) •Implement and integrate F0, maybe also other formants (F1, F2). •Automatic voiced/unvoiced segmentation. •User enrollment. •Use of sequence information (triplets). •Development of WinSprofiler software to the direction of voice profiler and speech analyzer tool! Technical development

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax OPC server Machine room CAN Ethernet TCP/IP Microphone Display OPC client LiftCaller SRLIB 3.0 Approach detection DCOM Lift car & hardware Our PC GW box Future development (4) Elevator prototype

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 1: Teleconferencing Unkonwn Bob Minna Alice VPN Paul Speaker Recognition Speaker Recognition Speaker Recognition Speaker Recognition Speaker Recognition Alice Bob Minna Unknown Verified & allowed Not registered

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 2: Call-center • Speech is the main tool for people in call-center • Voice login of personell •Removes the need for manual entry

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 3: Language recognition •Related problem to speaker recognition – the same research groups usually study both problems. •Not trivial to solve. •Studied a lot for Asian languages, even for rare languages that do not have any ”written form”.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 4: Medical applications •Doctor use voice to record summary of patient meetings. •Access by keyword search. •Annotation. •Authentication of speaker.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Thank for you patience! Questions?