Presentation on theme: "Geoffrey Zweig Microsoft Research 4/2/2009"— Presentation transcript:
1 Geoffrey Zweig Microsoft Research 4/2/2009 EE 516 Lecture 1Geoffrey ZweigMicrosoft Research4/2/2009
2 Our Topics Introducing today! From JHU 2002 SuperSID Final Presentation – Reynolds et al.
3 Topic Coverage By Day Data Representations and Models (4/23) Vector QuantizationGaussian MixturesThe EM AlgorithmSpeaker Identification (5/7)Language Identification (5/7)Hidden Markov Models (5/14)Dynamic ProgrammingBuilding a Speech Recognizer (5/14)
4 Language Identification – Why Do it? Multi-lingual societyApplications should be able to deal with anyoneBusinessesAutomated help systemsReservations, account access, etc.TravelAirport KiosksTrain stationsGovernmentFunds research to identify languagesRuns evaluations in it
5 English Acoustic Model How Do You Do it?English Acoustic ModelFrench Acoustic ModelTamil Acoustic Model…Output LikeliestGaussian Mixture Models - 4/23
6 How Do You Do It? (2) “p ih n s” – probably English… “k r p s t” – probably Czech…Simple HMMs – 5/14Language Models – 4/30After Zissman 1996
7 How Do You Do It (3) Same methods multiple times Acero et al., Chapter 44/23Same methods multiple timesAfter Zissman 1996
8 How Do You Do It? (4)Run a complete speech recognizer in each languageAnd we will see several other ways, and combinations!After Zissman 1996
9 Gauging Progress – The NIST Evaluations National Institute of Standards and TechnologyHas sponsored benchmark tests in multiple language processing areas for over a decadeTopic Detection & TrackingContent ExtractionVideo AnalysisSpeech RecognitionLanguage IdentificationSpeaker IdentificationMachine TranslationCoordination with site funding by Defense Advanced Research Projects Agency (DARPA)Along with business interest, the driving force in advancing the State-of-the-Art
11 Language Identification - How Well Can It Be Done – Who Salutes? OrganizationLocationBeijing Naphoo Technology Company+ChinaBrno University of TechnologyCzech RepublicGeorgia Institute of TechnologyUSAGroupe des Ecoles des Telecommunication, Ecole Nationale Superieure des TelecommunicationsFranceIBMIKERLAN Technological Research CenterSpainInstitut de Recherche en Informatique de ToulouseInstitute for Infocomm ResearchSingaporeInstitute of Acoustics, Chinese Academy of Sciences+Institut National de Recherche sur les Transports et Leur SecuriteInternational Computer Science Institute (USA)Laboratoire d'Informatique pour la Mecanique et les Sciences de l'IngenieurMIT Lincoln LaboratoryNanyang Technological UniversityPolitecnico di TorinoItalySpescom DatavoiceSouth AfricaTelefonica I & DTNO Human FactorsThe NetherlandsTsinghua UniversityUniversidad Autnoma de MadridUniversity of the Basque CountryUniversity of StellenboschUniversity of Science and Technology of China+From NIST 2007 LRE Website
12 How Well Can it Be Done – What Languages? From NIST 2007 LRE Website
13 How Well Can It Be Done? – Testing Conditions 26 languages and dialectsTelephone speechMultiple duration conditions3, 10, 30 secondsDetection Error Tradeoff (DET) Curves used to measure performance
14 How Well Can it Be Done – Some Numbers From NIST 2007 LRE Website
15 Language Identification Project Build a language ID system with the Call Friend Data setImplement several of the main techniquesSet up a demo on your laptop that will recognize someone’s language
16 Flavors of Speaker Recognition Our Focus!From JHU 2002 SuperSID Final Presentation – Reynolds et al.
17 Speaker Recognition – Why Do It? Personal ApplicationsVoice-print passwordsVoic transcription – who left that message?Business ApplicationsCalling your bankGovernmentIs that Osama calling from Pakistan?Prison call monitoringAutomated parolee calling – is he where you think?
18 How Do You Do It? The most basic approach: More recently: Gaussian Mixture Models - 4/23More recently:Support vector machines operating on GMMs (!)
19 How Do You Do It? (2) Also use high-level information! From JHU 2002 SuperSID Final Presentation – Reynolds et al.
20 How Well Can It Be Done – Who Salutes? From NIST 2008 SRE Presentation, Martin & Greenberg
21 More SalutesFrom NIST 2008 SRE Presentation, Martin & Greenberg
22 From EuropeFrom NIST 2008 SRE Presentation, Martin & Greenberg
23 More From EuropeFrom NIST 2008 SRE Presentation, Martin & Greenberg
24 U.S. EntriesFrom NIST 2008 SRE Presentation, Martin & Greenberg
25 How Well Can It Be Done – Testing Conditions Conditions for different amounts of data10 sec.3-5 minutes8 minutesSeparate channel and summed channel conditionsEnglish-speakers, non-English speakers, multilingual speakers
27 Speaker Verification Project Implement a Speaker-ID systemTemplate basedGMM basedSVM basedVector space modelDemonstrate it:NIST data, e.g EvaluationYour own voice – implement on laptop
28 Speech Recognition Project Implement an HMM based recognition systemUse, e.g., Phonebook isolated word data data set or Aurora digit setWrite features with existing front-endBuild your own HMM trainer/decoderSet it up on your laptop for online word recognition (?!)
29 Highlights of Syllabus Required Texts:Huang, Acero, Hon: Spoken Language ProcessingDeng and O’Shaughnessy, Speech ProcessingEE516 Reader, at Professional Copy ‘n Print, University WayGrading:Projects: 50%Final Exam: 30%Homework 20%Projects:Small team or individualTeams are self-formingPresentation times TBDRead ahead & pick an area!!!Talk to relevant instructorSuggest deciding no later than 4/30Office Hours at end of class and by appointmentPlease sign in on list!
Your consent to our cookies if you continue to use this website.