Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced NLP: Speech Research and Technologies

Similar presentations


Presentation on theme: "Advanced NLP: Speech Research and Technologies"— Presentation transcript:

1 Advanced NLP: Speech Research and Technologies
Julia Hirschberg CS 6998 11/24/2018

2 Spoken Natural Language Processing
NLP/Computational Linguistics historically text-oriented Speech research domain of EE and Linguistics 1980s: efforts to bring together by DARPA Today: applications motivate collaboration Automatic Speech Recognition (ASR) Text/Concept-to-Speech (TTS/CTS) Spoken Dialogue Systems (SDS), Speech-to-Speech Translation, Speech Search/Data Mining 11/24/2018

3 Studying Speech is Different
Understanding input and generating output are more complicated ASR errors and lack of formatting cues TTS/CTS naturalness issues But there is also more information to take advantage of Pitch variation, loudness, rate, voice quality Filled pauses, self-repairs 11/24/2018

4 Labeled Waveform and F0 Contour
11/24/2018

5 Current Approaches Corpus-based studies Hand-labeled data (ToBI etc.)
Tools: Analysis (pitch tracks, spectrograms….) ASR toolkits TTS systems Machine learning Laboratory studies Evaluation 11/24/2018

6 Prosodic Generation for TTS
Corpus-based approaches Train prosodic variation on large labeled corpora using machine learning techniques Accent and phrasing decisions Associate prosodic labels with simple features of transcripts To do: Contour variation TTS default prosodic assignment developed to be independent of domains and tasks. Uses simple text analysis to vary phrasing, accent, possibly pitch range. While hand-built rule-sets are still used for particular application domains, most systems have moved toward automatically trained prosodic assignment systems. 11/24/2018

7 Timing and backchanneling Disfluencies? Emotion and ‘personality’
Personalized voices Work in spoken language generation is only beginning as a serious topic of research and development. Along the way there are large questions to answer, both for dialogue and monologue generation: 11/24/2018

8 Concept to Speech Decisions in TTS depend on text analysis
Concept-to-Speech (CTS) systems should be able to do better System knows what it wants to say and can specify how But…. Still need labeled corpora to train on CTS features may be hard to label (focus, given/new,…) How to decide how to realize these? In principle, the information TTS systems lack to support natural prosodic assignment is readily available to CTS systems. So the initial hope in the NLG community was that prosodic assignment would be a simple problem. It’s proven however fairly hard. Why? 11/24/2018

9 Prosody in ASRU Little success in improving ASR transcription
More promise in other areas: Improving rejection Shrinking search space Automatic topic segmentation for browsing/retrieval Identifying ‘salient’ words in turns Disambiguating speech/dialogue acts: okay 11/24/2018

10 Recognizing communicative ‘problems’
ASR errors User corrections ‘Aware’ turns ‘Problematic’ dialogues Disfluencies and self-repairs Recognizing speaker emotion 11/24/2018

11 My Research Meaning of intonational contours:
Rise/fall/rise (L*+H L-H%) A: Did you take out the garbage? B: Sort of. A: Sort of! High rise questions (H* H-H%) This is the chicken Chermula? I’m from Skokie? 11/24/2018

12 Compositional theory of intonational meaning (w/Pierrehumbert)
Intonational disambiguation across languages: Spanish, Italian and English (w/Avesani & Prieto) William isn’t drinking because he’s unhappy Disfluencies: self-repairs (w/Nakatani) I want to go to Ba- Baltimore. Cue phrases (w/Litman) Now let’s go to work. Get a3 and a4 for disambig gw for other 11/24/2018

13 Accent and strict/sloppy interpretations of ellipsis (w/Ward)
People who live in Los Angeles adore it’s beaches and so do people who live in New York 11/24/2018

14 Accent and given/new (w/Terken) The ball touches the circle.
The ball touches the triangle. The ball touches the cone. The square touches the ball. Intonation and discourse structure (w/Grosz & Nakatani) Boston Directions Corpus Automatic assignment of accent and phrasing for TTS (w/Wang, Sproat, Koehn, Abney, Collins, Rambow) 11/24/2018

15 ToBI prosodic labeling conventions w/many)
Prosody in dialogue systems (w/Litman & Swerts): generation and understanding (TOOT) Audio browsing and retrieval: SCAN and SCANMail (w/many) 11/24/2018

16 CS 6998 Requirements: Class Participation:
Questions for class discussion Helping lead a class Lab exercises Project Literature review Data collection and/or analysis from a corpus 11/24/2018

17 Building a system or system component (e. g
Building a system or system component (e.g. a preprocessor to assign intonation in a generation system) 11/24/2018

18 Next Week Read Hirschberg 2003 and ToBI conventions
Make sure you have access to supplementary readings if you need them Bring 3 discussion questions to class Check access on cs servers to corpora and /proj/nlp/tools/mathTools/ Xwaves (solaris and linux) esps531.sol, esps531.linux (also downloadable from KTH) wavesurfer (win, linux, mac) available at KTH 11/24/2018

19 Projects: Start thinking about what area you want to work in for your project and what type of project you’d like to do 11/24/2018


Download ppt "Advanced NLP: Speech Research and Technologies"

Similar presentations


Ads by Google