Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geoffrey Zweig Microsoft Research 4/2/2009

Similar presentations


Presentation on theme: "Geoffrey Zweig Microsoft Research 4/2/2009"— Presentation transcript:

1 Geoffrey Zweig Microsoft Research 4/2/2009
EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

2 Our Topics Introducing today!
From JHU 2002 SuperSID Final Presentation – Reynolds et al.

3 Topic Coverage By Day Data Representations and Models (4/23)
Vector Quantization Gaussian Mixtures The EM Algorithm Speaker Identification (5/7) Language Identification (5/7) Hidden Markov Models (5/14) Dynamic Programming Building a Speech Recognizer (5/14)

4 Language Identification – Why Do it?
Multi-lingual society Applications should be able to deal with anyone Businesses Automated help systems Reservations, account access, etc. Travel Airport Kiosks Train stations Government Funds research to identify languages Runs evaluations in it

5 English Acoustic Model
How Do You Do it? English Acoustic Model French Acoustic Model Tamil Acoustic Model Output Likeliest Gaussian Mixture Models - 4/23

6 How Do You Do It? (2) “p ih n s” – probably English…
“k r p s t” – probably Czech… Simple HMMs – 5/14 Language Models – 4/30 After Zissman 1996

7 How Do You Do It (3) Same methods multiple times
Acero et al., Chapter 4 4/23 Same methods multiple times After Zissman 1996

8 How Do You Do It? (4) Run a complete speech recognizer in each language And we will see several other ways, and combinations! After Zissman 1996

9 Gauging Progress – The NIST Evaluations
National Institute of Standards and Technology Has sponsored benchmark tests in multiple language processing areas for over a decade Topic Detection & Tracking Content Extraction Video Analysis Speech Recognition Language Identification Speaker Identification Machine Translation Coordination with site funding by Defense Advanced Research Projects Agency (DARPA) Along with business interest, the driving force in advancing the State-of-the-Art

10 For Example, Progress in Speech Recognition

11 Language Identification - How Well Can It Be Done – Who Salutes?
Organization Location Beijing Naphoo Technology Company+ China Brno University of Technology Czech Republic Georgia Institute of Technology USA Groupe des Ecoles des Telecommunication, Ecole Nationale Superieure des Telecommunications France IBM IKERLAN Technological Research Center Spain Institut de Recherche en Informatique de Toulouse Institute for Infocomm Research Singapore Institute of Acoustics, Chinese Academy of Sciences+ Institut National de Recherche sur les Transports et Leur Securite International Computer Science Institute (USA) Laboratoire d'Informatique pour la Mecanique et les Sciences de l'Ingenieur MIT Lincoln Laboratory Nanyang Technological University Politecnico di Torino Italy Spescom Datavoice South Africa Telefonica I & D TNO Human Factors The Netherlands Tsinghua University Universidad Autnoma de Madrid University of the Basque Country University of Stellenbosch University of Science and Technology of China+ From NIST 2007 LRE Website

12 How Well Can it Be Done – What Languages?
From NIST 2007 LRE Website

13 How Well Can It Be Done? – Testing Conditions
26 languages and dialects Telephone speech Multiple duration conditions 3, 10, 30 seconds Detection Error Tradeoff (DET) Curves used to measure performance

14 How Well Can it Be Done – Some Numbers
From NIST 2007 LRE Website

15 Language Identification Project
Build a language ID system with the Call Friend Data set Implement several of the main techniques Set up a demo on your laptop that will recognize someone’s language

16 Flavors of Speaker Recognition
Our Focus! From JHU 2002 SuperSID Final Presentation – Reynolds et al.

17 Speaker Recognition – Why Do It?
Personal Applications Voice-print passwords Voic transcription – who left that message? Business Applications Calling your bank Government Is that Osama calling from Pakistan? Prison call monitoring Automated parolee calling – is he where you think?

18 How Do You Do It? The most basic approach: More recently:
Gaussian Mixture Models - 4/23 More recently: Support vector machines operating on GMMs (!)

19 How Do You Do It? (2) Also use high-level information!
From JHU 2002 SuperSID Final Presentation – Reynolds et al.

20 How Well Can It Be Done – Who Salutes?
From NIST 2008 SRE Presentation, Martin & Greenberg

21 More Salutes From NIST 2008 SRE Presentation, Martin & Greenberg

22 From Europe From NIST 2008 SRE Presentation, Martin & Greenberg

23 More From Europe From NIST 2008 SRE Presentation, Martin & Greenberg

24 U.S. Entries From NIST 2008 SRE Presentation, Martin & Greenberg

25 How Well Can It Be Done – Testing Conditions
Conditions for different amounts of data 10 sec. 3-5 minutes 8 minutes Separate channel and summed channel conditions English-speakers, non-English speakers, multilingual speakers

26 How Well Can It Be Done?

27 Speaker Verification Project
Implement a Speaker-ID system Template based GMM based SVM based Vector space model Demonstrate it: NIST data, e.g Evaluation Your own voice – implement on laptop

28 Speech Recognition Project
Implement an HMM based recognition system Use, e.g., Phonebook isolated word data data set or Aurora digit set Write features with existing front-end Build your own HMM trainer/decoder Set it up on your laptop for online word recognition (?!)

29 Highlights of Syllabus
Required Texts: Huang, Acero, Hon: Spoken Language Processing Deng and O’Shaughnessy, Speech Processing EE516 Reader, at Professional Copy ‘n Print, University Way Grading: Projects: 50% Final Exam: 30% Homework 20% Projects: Small team or individual Teams are self-forming Presentation times TBD Read ahead & pick an area!!! Talk to relevant instructor Suggest deciding no later than 4/30 Office Hours at end of class and by appointment Please sign in on list!


Download ppt "Geoffrey Zweig Microsoft Research 4/2/2009"

Similar presentations


Ads by Google