1 www.ntnu.no 2008-05-29 Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.

Slides:

Advertisements

Similar presentations

Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS.

Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.

Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.

Assistive Technology Training Online (ATTO) University at Buffalo – The State University of New York USDE# H324M Write:Outloud.

Communicating with Robots using Speech: The Robot Talks (Speech Synthesis) Stephen Cox Chris Watkins Ibrahim Almajai.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.

Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist.

Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,

An Elitist Approach to Articulatory-Acoustic Feature Classification in English and in Dutch Steven Greenberg, Shawn Chang and Mirjam Wester International.

2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.

Phonetics and Phonology.

1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,

Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.

Towards a definition of GestBase - an open database of gestures Milan Rusko Institute of Informatics of the Slovak Academy of Sciences, Bratislava.

Speech Recognition Final Project Resources

Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.

STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.

Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

1 Introducing The Buckeye Speech Corpus Kyuchul Yoon English Division, Kyungnam University March 21, 2008 School of English,

Graphophonemic System – Phonics

The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.

LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.

University of Maribor Faculty of Electrical Engineering and Computer Science AST ’04, July 7-9, 2004 Slovenian Lexica and Corpora in the Scope of the LC-STAR.

Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.

AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.

Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.

Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.

Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

A Fully Annotated Corpus of Russian Speech

Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)

Speech and Music Retrieval INST 734 Doug Oard Module 12.

A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.

The Games Corpus Design, implementation and annotation Agustín Gravano Spoken Language Processing Group Columbia University.

LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,

Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.

© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.

© 2013 by Larson Technical Services

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE.

Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.

Key Stage 2 Portfolio. Llafaredd / Oracy Darllen / Reading Ysgrifennu / Writing Welsh Second Language.

Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.

How can speech technology be used to help people with disabilities?

Joel Priestley, Text Laboratory Oxford, April 2016

3.0 Map of Subject Areas.

Audio Books for Phonetics Research

Hands-on tutorial: Using Praat for analysing a speech corpus

King Saud University, Riyadh, Saudi Arabia

Transcription Workshop HIST 499

Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen

Presentation transcript:

Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated Norwegian Broadcast News Speech Corpus

Rundkast at LREC 2008, Marrakech Overview Purpose of Rundkast An overview of the database Rundkast Structure of annotation Orthographic transcription Broad phonetic annotation

Rundkast at LREC 2008, Marrakech Purpose of Rundkast Databases of broadcast news can be used for a number of research topics in speech technology such as: Supplement to existing databases of read speech for training and testing automatic speech recognition and speaker adaptation. Research on recognition of spontaneous speech. Research on automatic indexing of audio data. Research on topic and/or speaker segmentation. Research on speech/non-speech detection (e.g. background music). International research cooperation involving speech technology for broadcast news applications. A corpus of this kind is necessary for language technology research, but has not been available for Norwegian

Rundkast at LREC 2008, Marrakech Overview of Rundkast Database of 77 hours radio broadcast news from the Norwegian Broadcasting Corporation (NRK): Read and spontaneous speech, as well as spontaneous dialogs and multipart discussions There is large variation between speakers, speaking styles and topics Speaker turns may be rapid and several speakers may talk simultaneously The quality of the recordings include studio and telephone (mobile, satellite etc) Frequent occurrences of background noise, jingles, music and audio illustrations Funded by the Norwegian University of Science and Technology (NTNU)

Rundkast at LREC 2008, Marrakech Structure of annotation Rundkast is hierarchically organized and orthographically annotated: Name of programme, type and date Name of speaker (if known) and dialect (5 regions) Type of speech: spontaneity, channel, recording quality Segmented in speaker turns of app. 2-5 seconds Orthographic transcription (standard Norwegian) Labels for noise (speaker noise, background noise etc.) Labels for pronunciation mistakes, foreign words, unintelligible speech etc. ~70 hrs work per hour of recording Transcriber used for annotation: ”standard”-tool

Rundkast at LREC 2008, Marrakech Hierarchy of annotation levels [i] blah blah...more blah...[lp] speaker 1speaker 2 no speaker speaker 1 reportfillernontrans report one episode file [b-]noisy blah[-b]... annotation level: levels: 1=section, 2=speaker turn, and 3=segment

Rundkast at LREC 2008, Marrakech Orthographic transcription The lowest level in the annotation hierarchy, segments, are transcribed orthographically. Orthographic transcription of spoken language is a challenge, especially for Norwegian. Using dialect also in official circumstances is more and more accepted. The majority of RUNDKAST is not compliant to any standard pronunciation. The aim of the conventions for the orthographic transcription in RUNDKAST is to minimize uncertainty about pronunciations and facilitate consistency.

Rundkast at LREC 2008, Marrakech Orthographic transcription: Main conventions Words are transcribed with the written forms closest to actual pronunciations. A limited number of interjections are allowed. Text codes are used to mark mispronunciations, truncations, and unknown words. Numbers and symbols are written out as words. Abbreviations are not used. Punctuation marks are restricted to comma, period, and question mark. Space is used between spelled letters, also when acronyms have spelled pronunciation. Capital letters are used in proper names, spellings, and acronyms, but not at the start of sentences.

Rundkast at LREC 2008, Marrakech Example annotation in Transcriber

Rundkast at LREC 2008, Marrakech Broad phonetic annotation Part of the data were to be phonetically annotated –Use for low-level experiments in ASR (new methods), smaller Norwegian counterpart to TIMIT –Auto-segmentation for e.g. unit selection TTS Annotation to be based on existing standards – with necessary adjustments Exploit experience and specifications from development of Norwegian speech synthesis databases ”Suitable” level of detail: Acoustic boundaries should be labeled, but more phonemic than phonetic Consistency of utmost importance!

Rundkast at LREC 2008, Marrakech Broad phonetic annotation: Selected data 10 speakers (5 male and 5 female) Amount of speech per speaker: –app 5 min ”planned” speech and 1 min spontaneous speech –discard noisy parts (as far as possible) –from more than one programme –use turn segmentation from orthographic annotation All in all 1 hour of speech Approximately 1000 hours of work

Rundkast at LREC 2008, Marrakech Broad phonetic annotation: Main principles The annotation is mainly phonemic using the phoneme symbols closest to the perceived sound Acoustic boundaries should be marked; some acoustically motivated symbols are included A transcription as close as possible to the citation form is preferred Norwegian standard SAMPA is preferred –Some English phonemes included as well as dialect variants –Example: 3 variants of the /r/-sound /r/ (tap/trill) /R/ (uvular fricative) /r\/ (approximant)

Rundkast at LREC 2008, Marrakech Broad phonetic annotation: Annotation procedure 1.Conversion of orthographic transcription to a format suitable for automatic transcription. 2.Automatic segmentation with a phonotypical transcription using a speech recognizer. 3.Manual correction of both segments and labels by four phonetics students using Praat. 4.Format check. 5.Control of all annotation by one supervisor.

Rundkast at LREC 2008, Marrakech Broad phonetic annotation: Comments on deviations Always cases of uncertainty, need a log for these. Problem: will the log be read? Solution: Codes for deviations! Additional Praat tier for deviations Synchronous with the phoneme tier Easy to utilize automatically Examples: –creaky voice –unexpected voiced/unvoiced –uncertain boundary or symbol... in addition a log file with whatever deviations left

Rundkast at LREC 2008, Marrakech Example annotation in Praat

Rundkast at LREC 2008, Marrakech Concluding remarks Availability: –Planned to be included for non-commercial use in a future Norwegian language bank –Will complement other corpora also intended to be included To be validated by Spex Planned use at NTNU: SIRKUS project –Investigation in new paradigms for ASR –Low-level phone recognition experiments initially multi-linguality aspects –Spoken information retrieval