Presentation is loading. Please wait.

Presentation is loading. Please wait.

The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department.

Similar presentations

Presentation on theme: "The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department."— Presentation transcript:

1 The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department

2 A. ThermoFluids B. Electrical Eng C. Mechanics D. Structures E. Management F. Information Engineering Division CUED: 6 Divisions 130 1100 450 Academic Staff Undergrads Postgrads Control Lab Signal Processing Lab Computational and Biological Learning Lab Machine Intelligence Lab Speech Group Vision Group Medical Imaging Group 1. CUED Organisation 4 Staff Bill Byrne Mark Gales Phil Woodland Steve Young 9 RA’s 12 PhD’s 2

3 2. Speech Group Overview 3 Primary research interests in speech processing –4 members of Academic Staff –9 Research Assistants/Associates –12 PhD students PhD Projects in Fundamental Speech Technology Development (10-15 students) Funded Projects in Recognition/Translation/Synthesis (5-10 RAs) MPhil in Computer Speech, Text and Internet Technology Computer Laboratory NLIP Group HTK Software Tools Development Computer Speech and Language International Community

4 Principal Staff and Research Interests 4 Dr Bill Byrne Statistical machine translation Automatic speech recognition Cross-lingual adaptation and synthesis Dr Mark Gales Large vocabulary speech recognition Speaker and environment adaptation Kernel methods for speech processing Professor Phil Woodland Large vocabulary speech recognition/meta-data extraction Information retrieval from audio ASR and SMT integration Professor Steve Young Statistical dialogue modelling Voice conversion

5  data driven semantic processing  statistical modelling Research Interests  data driven techniques  voice transformation  HMM-based techniques  large vocabulary systems [Eng, Chinese, Arabic ]  acoustic model training and adaptation  language model training and adaptation  rich text transcription & spoken document retrieval  fundamental theory of statistical modelling and pattern processing 5  statistical machine translation  finite state transducer framework

6 Example Current and Recent Projects Global Autonomous Language Exploitation –DARPA GALE funded (collab with BBN, LIMSI, ISI …) HTK Rich Audio Trancription Project (finished 2004) –DARPA EARS funded CLASSIC: Computational Learning in Adaptive Systems for Spoken Conversation –EU (collab with Edinburgh, France Telecom,,…) EMIME: Effective Multilingual Interaction in Mobile Environments -EU (collab with Edinburgh, IDIAP, Nagoya Institute of Technology … ) R 2 EAP: Rapid and Reliable Environment Aware Processing -TREL funded Also active collaborations with IBM, Google, Microsoft, … 6

7 3. Rich Audio Transcription Project 7 New algorithms Natural Speech Rich Transcript English/Mandarin DARPA-funded project –Effective Affordable Reusable Speech-to-text (EARS) program Transform natural speech into human readable form –Need to add meta-data to the ASR output –For example speaker-terms/handle disfluencies

8 Rich Text Transcription okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know ASR Output Speaker1: / okay carl {F uh} do you exercise / Speaker2: / {DM yeah actually} {F um} i belong to a gym down here / / gold’s gym / / and {F uh} i try to exercise five days a week {F um} / / and now and then [REP i’ll + i’ll] get it interrupted by work or just full of crazy hours {DM you know } / Meta-Data Extraction (MDE) Markup Speaker1: Okay Carl do you exercise? Speaker2: I belong to a gym down here, Gold’s Gym, and I try to exercise five days a week and now and then I’ll get it interrupted by work or just full of crazy hours. Final Text 8

9 4. Statistical Machine Translation 9 Process involves collecting parallel (bitext) corpora –Align at document/sentence/word level Use statistical approaches to obtain most probable translation Aim is to translate from one language to another –For example translate text from Chinese to English

10 GALE: Integrated ASR and SMT 10 Member of the AGILE team (lead by BBN) The DARPA Global Autonomous Language Exploitation (GALE) program has the aim of developing speech and language processing technologies to recognise, analyse, and translate speech and text into readable English. Primary languages for STT/SMT: Chinese and Arabic

11 5. Statistical Dialogue Modelling Speech Understanding Speech Generation Dialogue Manager System WaveformsWords/ConceptsDialogue Acts 11 Use a statistical framework for all stages

12 Legend: ASR: Automatic Speech recognition NLU: Natural Language Understanding DM: Dialogue Management NLG: Natural Language Generation TTS: Text To Speech st: Input Sound Signal ut: Utterance Hypotheses ht: Conceptual Interpretation Hypotheses at: Action Hypotheses wt: Word String Hypotheses rt: Speech Synthesis Hypotheses X: possible elimination of hypotheses CLASSiC: Project Architecture stst Speech Input ASRNLU DM NLGTTS Context t-1 utut htht atat wtwt rtrt 1-Best Signal Selection x x x x x x Speech output http://classic-project.orgSee

13 6. EMIME: Speech-to-Speech Translation 13 Personalised speech-to-speech translation –Learn characteristics of a users speech –Reproduce users speech in synthesis Cross-lingual capability –Map speaker characteristics across languages Unified approach for recognition and synthesis –Common statistical model; hidden Markov models –Simplifies adaptation (common to both synthesis and recognition) Improve understanding of recognition/synthesis http://emime.orgSee

14 7. R 2 EAP: Robust Speech Recognition 14 Current ASR performance degrades with changing noise Major limitation on deploying speech recognition systems

15 Aims of the project 1.To develop techniques that allow ASR system to rapidly respond to changing acoustic conditions; 2.While maintaining high levels of recognition accuracy over a wide range of conditions; 3.And be flexible so they are applicable to a wide range of tasks and computational requirements. Project started in January 2008 – 3 year duration Close collaboration with TREL Cambridge Lab. –Common development code-base – extended HTK –Common evaluation sets –Builds on current (and previous) PhD studentships –Monthly joint meetings Project Overview 15

16 Approach – Model Compensation 16 Model compensation schemes highly effective BUT Slow compared to feature compensation scheme Need schemes to improve speed while maintaining performance Also automatically detect/track changing noise conditions

17 To date 5 Research studentships (partly) funded by Toshiba –Shared software - code transfer both directions –Shared data sets - both (emotional) synthesis and ASR –6 monthly reports and review meetings Students and topics Hank Liao (2003-2007): Uncertainty decoding for Noise Robust ASR Catherine Breslin (2004-2008): Complementary System Generation and Combination Zeynep Inanoglu (2004-2008): Recognition and Synthesis of Emotion Rogier van Dalen (2007-2010): Noise Robust ASR Stuart Moore (2007-2010): Number Sense Disambiguation Very useful and successful collaboration 8. Toshiba-CUED PhD Collaborations 17

18 9. HTK Version 3.0 Development HTK is a free software toolkit for developing HMM-based systems 1000’s of users worldwide widely used for research by universities and industry 1989 – 1992 1993 – 1999 2000 – date V1.0 – 1.4 V1.5 – 2.3 V3.0 – V3.4 Initial development at CUED Commercial development by Entropic Academic development at CUED  Development partly funded by Microsoft and DARPA EARS Project  Primary dissemination route for CU research output 18 2004 - date: the ATK Real-time HTK-based recognition system

19 10. Summary 19 Speech Group works on many aspects of speech processing Large vocabulary speech recognition Statistical machine translation Statistical dialogue systems Speech synthesis and voice conversion Statistical machine learning approach to all applications World-wide reputation for research CUED systems have defined state-of-the-art for the past decade Developed a number of techniques widely used by industry Hidden Markov Model Toolkit (HTK) Freely-available software, 1000’s of users worldwide State-of-the –art features (discriminative training, adaptation …) HMM Synthesis extension (HTS) from Nagoya Institute of Technology

Download ppt "The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department."

Similar presentations

Ads by Google