The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory

Slides:



Advertisements
Similar presentations
Software Re-engineering
Advertisements

Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Personalised Web Services for Activity-Based Mobile Learning Wichai Eamsinvattana PhD student, started Oct 2006 Supervised by Dr.Vania Dimitrova.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Profiles Construction Eclipse ECESIS Project Construction of Complex UML Profiles UPM ETSI Telecomunicación Ciudad Universitaria s/n Madrid 28040,
Speech-to-Speech Translation Hannah Grap Language Weaver, Inc.
Eugene Syriani and Huseyin Ergin University of Alabama Software Modeling Lab Software Engineering Group Department of Computer Science College of Engineering.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers Athanassios Hatzis, Phil Green,
1 Quality of Service Issues Network design and security Lecture 12.
Information Society Technologies Third Call for Proposals Norbert Brinkhoff-Button DG Information Society European Commission Key action III: Multmedia.
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Project Supervisor: Dr. Sanath Jayasena Project Coordinator: Mr. Shantha Fernando Athukorala A.U.B Dissanayake C.P. Kumara M.G.C.P. Priyadarshana G.V.J.
Mafijul Islam, PhD Software Systems, Electrical and Embedded Systems Advanced Technology & Research Research Issues in Computing Systems: An Automotive.
Machine Translation II How MT works Modes of use.
Hidden Information State System A Statistical Spoken Dialogue System M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu and S. Young Cambridge.
Introduction to Computational Linguistics
Requirements Analysis 1. 1 Introduction b501.ppt © Copyright De Montfort University 2000 All Rights Reserved INFO2005 Requirements Analysis Introduction.
Week 1.
From Model-based to Model-driven Design of User Interfaces.
Probabilistic Adaptive Real-Time Learning And Natural Conversational Engine Seventh Framework Programme FP7-ICT
Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Introduction to Automatic Speech Recognition
Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
STARDUST – Speech Training And Recognition for Dysarthric Users of Assistive Technology Mark Hawley et al Barnsley District General Hospital and University.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
IPSOM Indexing, Integration and Sound Retrieval in Multimedia Documents.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Performance Comparison of Speaker and Emotion Recognition
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
© 2013 by Larson Technical Services
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
LREC – Workshop on Crossing media for Improved Information Access, Genova, Italy, 23 May Cross-Media Indexing in the Reveal-This System Murat Yakici,
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
G. Anushiya Rachel Project Officer
Natural Language Processing and Speech Enabled Applications
The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory
Sentiment analysis algorithms and applications: A survey
The CU Speech Group Machine Intelligence Laboratory
Deep Exploration and Filtering of Text (DEFT)
3.0 Map of Subject Areas.
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Huawei CBG AI Challenges
Presentation transcript:

The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department

Computational and Biological Learning Lab 1. CUED Organisation 130 1100 450 Academic Staff Undergrads Postgrads CUED: 6 Divisions A. ThermoFluids B. Electrical Eng C. Mechanics D. Structures E. Management Control Lab Signal Processing Lab Computational and Biological Learning Lab Machine Intelligence Lab F. Information Engineering Division Speech Group Vision Medical Imaging 4 Staff Bill Byrne Mark Gales Phil Woodland Steve Young 9 RA’s 12 PhD’s 2

2. Speech Group Overview Funded Projects in Recognition/Translation/Synthesis (5-10 RAs) MPhil in Computer Speech, Text and Internet Technology Computer Laboratory NLIP Group PhD Projects in Fundamental Speech Technology Development (10-15 students) Computer Speech and Language HTK Software Tools Development International Community Primary research interests in speech processing 4 members of Academic Staff 9 Research Assistants/Associates 12 PhD students 3

Principal Staff and Research Interests Dr Bill Byrne Statistical machine translation Automatic speech recognition Cross-lingual adaptation and synthesis Dr Mark Gales Large vocabulary speech recognition Speaker and environment adaptation Kernel methods for speech processing Professor Phil Woodland Large vocabulary speech recognition/meta-data extraction Information retrieval from audio ASR and SMT integration Professor Steve Young Statistical dialogue modelling Voice conversion 4

Synthesis Translation Dialogue Recognition Machine Learning Research Interests data driven techniques voice transformation HMM-based techniques Synthesis statistical machine translation finite state transducer framework Translation data driven semantic processing statistical modelling Dialogue large vocabulary systems [Eng, Chinese, Arabic ] acoustic model training and adaptation language model training and adaptation rich text transcription & spoken document retrieval Recognition fundamental theory of statistical modelling and pattern processing Machine Learning 5

Example Current and Recent Projects Global Autonomous Language Exploitation DARPA GALE funded (collab with BBN, LIMSI, ISI …) HTK Rich Audio Trancription Project (finished 2004) DARPA EARS funded CLASSIC: Computational Learning in Adaptive Systems for Spoken Conversation EU (collab with Edinburgh, France Telecom,,…) EMIME: Effective Multilingual Interaction in Mobile Environments EU (collab with Edinburgh, IDIAP, Nagoya Institute of Technology … ) R2EAP: Rapid and Reliable Environment Aware Processing TREL funded Also active collaborations with IBM, Google, Microsoft, … 6

3. Rich Audio Transcription Project New algorithms Rich Transcript Natural Speech English/Mandarin DARPA-funded project Effective Affordable Reusable Speech-to-text (EARS) program Transform natural speech into human readable form Need to add meta-data to the ASR output For example speaker-terms/handle disfluencies http://mi.eng.cam.ac.uk/research/projects/EARS/index.html See 7

Rich Text Transcription ASR Output okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know Meta-Data Extraction (MDE) Markup Speaker1: / okay carl {F uh} do you exercise / Speaker2: / {DM yeah actually} {F um} i belong to a gym down here / / gold’s gym / / and {F uh} i try to exercise five days a week {F um} / / and now and then [REP i’ll + i’ll] get it interrupted by work or just full of crazy hours {DM you know } / Final Text Speaker1: Okay Carl do you exercise? Speaker2: I belong to a gym down here, Gold’s Gym, and I try to exercise five days a week and now and then I’ll get it interrupted by work or just full of crazy hours. 8

4. Statistical Machine Translation Aim is to translate from one language to another For example translate text from Chinese to English Process involves collecting parallel (bitext) corpora Align at document/sentence/word level Use statistical approaches to obtain most probable translation 9

GALE: Integrated ASR and SMT Member of the AGILE team (lead by BBN) The DARPA Global Autonomous Language Exploitation (GALE) program has the aim of developing speech and language processing technologies to recognise, analyse, and translate speech and text into readable English. Primary languages for STT/SMT: Chinese and Arabic http://mi.eng.cam.ac.uk/research/projects/AGILE/index.html See 10

5. Statistical Dialogue Modelling Speech Understanding Generation Dialogue Manager System Waveforms Words/Concepts Dialogue Acts Use a statistical framework for all stages 11

1-Best Signal Selection CLASSiC: Project Architecture st Speech Input ASR NLU DM NLG TTS Context t-1 ut ht at wt rt 1-Best Signal Selection x Speech output Legend: ASR: Automatic Speech recognition NLU: Natural Language Understanding DM: Dialogue Management NLG: Natural Language Generation TTS: Text To Speech st: Input Sound Signal ut: Utterance Hypotheses ht: Conceptual Interpretation Hypotheses at: Action Hypotheses wt: Word String Hypotheses rt: Speech Synthesis Hypotheses X: possible elimination of hypotheses http://classic-project.org See

6. EMIME: Speech-to-Speech Translation Personalised speech-to-speech translation Learn characteristics of a users speech Reproduce users speech in synthesis Cross-lingual capability Map speaker characteristics across languages Unified approach for recognition and synthesis Common statistical model; hidden Markov models Simplifies adaptation (common to both synthesis and recognition) Improve understanding of recognition/synthesis http://emime.org See 13

7. R2EAP: Robust Speech Recognition Current ASR performance degrades with changing noise Major limitation on deploying speech recognition systems 14

Project Overview Aims of the project To develop techniques that allow ASR system to rapidly respond to changing acoustic conditions; While maintaining high levels of recognition accuracy over a wide range of conditions; And be flexible so they are applicable to a wide range of tasks and computational requirements. Project started in January 2008 – 3 year duration Close collaboration with TREL Cambridge Lab. Common development code-base – extended HTK Common evaluation sets Builds on current (and previous) PhD studentships Monthly joint meetings http://mi.eng.cam.ac.uk/~mjfg/REAP/index.html See 15

Approach – Model Compensation Model compensation schemes highly effective BUT Slow compared to feature compensation scheme Need schemes to improve speed while maintaining performance Also automatically detect/track changing noise conditions 16

8. Toshiba-CUED PhD Collaborations To date 5 Research studentships (partly) funded by Toshiba Shared software - code transfer both directions Shared data sets - both (emotional) synthesis and ASR 6 monthly reports and review meetings Students and topics Hank Liao (2003-2007): Uncertainty decoding for Noise Robust ASR Catherine Breslin (2004-2008): Complementary System Generation and Combination Zeynep Inanoglu (2004-2008): Recognition and Synthesis of Emotion Rogier van Dalen (2007-2010): Noise Robust ASR Stuart Moore (2007-2010): Number Sense Disambiguation Very useful and successful collaboration 17

9. HTK Version 3.0 Development HTK is a free software toolkit for developing HMM-based systems 1000’s of users worldwide widely used for research by universities and industry 1989 – 1992 1993 – 1999 2000 – date V1.0 – 1.4 V1.5 – 2.3 V3.0 – V3.4 Initial development at CUED Commercial development by Entropic Academic development at CUED Development partly funded by Microsoft and DARPA EARS Project Primary dissemination route for CU research output 2004 - date: the ATK Real-time HTK-based recognition system http://htk.eng.cam.ac.uk See 18

10. Summary Speech Group works on many aspects of speech processing Large vocabulary speech recognition Statistical machine translation Statistical dialogue systems Speech synthesis and voice conversion Statistical machine learning approach to all applications World-wide reputation for research CUED systems have defined state-of-the-art for the past decade Developed a number of techniques widely used by industry Hidden Markov Model Toolkit (HTK) Freely-available software, 1000’s of users worldwide State-of-the –art features (discriminative training, adaptation …) HMM Synthesis extension (HTS) from Nagoya Institute of Technology http://mi.eng.cam.ac.uk/research/speech See 19