We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byShemar Spence
Modified over 2 years ago
© 2007 IBM Corporation 1 Speech Transcription for Broadcast Activities: The science, the art, and business realities Sara H. Basson Michael Picheny Bhuvana Ramabhadran IBM T.J Watson Research Center
© 2007 IBM Corporation 2 Agenda Captioning and Transcription: The need The options Automated speech transcription: state of the art Is it ready for prime time? – samples from network transcripts Quality control Near-term solutions The future
© 2007 IBM Corporation 3 Lack of Captioning and Transcription – The Problem Proliferation of multimedia information Audio: not always the medium of choice –Violates accessibility 22,000,000 Americans listed as deaf or hard of hearing Aging users US Federal Gov’t: 2001 amendment to Section 508 of the Rehabilitation Act: mandates that information that federal agencies provide to the public or to their employees be accessible. Time for editing (= cost of captioning) decreases as speech recognition accuracy improves.
© 2007 IBM Corporation 4 Transcription of Audio Material: It’s the Law Telecommunications Act of 1996: 100% of new English-language programming must be captioned by 2006 100% of Spanish-language programming must be captioned by 2010
© 2007 IBM Corporation 5 Transcription Contrasted with Other Speech Recognition Closed Captioning General dictation Call center data mining Government intelligence applications Unconstrained Speech Conversational Large Vocabulary High Resource Telephone, Broadcast,Speeches TranscriptionTransactionEmbedded “For mortgage rates, say or press 1…” “Please say your tracking number…” Name Dialer More constrained More directed Large Vocabulary Lower Resource Telephone Direction giving in car Spoken commands in car Phrase translation on a PDA Most constrained Most directed Smaller Vocabulary Lowest Resource Embedded in a device
© 2007 IBM Corporation 6 Audio requiring transcription/captioning Webcasts Podcasts Television programming Movies Digitized lectures e-Learning materials Corporate training Meetings Conferences Tourist information Medical transcription Legal transcription Call center data = Strong accessibility requirement (user demand, and corporate/legal mandates)
© 2007 IBM Corporation 7 Speech Recognition Challenges Over Time Connected Digit Sequences (TI Digits) TIMIT Acoustic-Phonetic Continuous Speech Corpus Broadcast News (BN) Speech in Noisy Environments (SPINE) Switchboard (SWB) Telephone conversations (about 70 topics) MALACH Corpus Increasing complexity
IBM Research © 2007 IBM Corporation 8 Progress in Base Technology Research Progress in Conversational Speech Progress in IBM Speech Products IBM Superhuman Speech Project NIST Benchmarks IBM Embedded Via Voice in Car IBM Websphere Voice Server - Telephony The NIST benchmark uses different test datasets each year, focusing on conversational speech. Human Performance – Conversational Telephony Base speech recognition technology has improved steadily over the last 15 years. Current error rates are low enough for many practical applications. Average error rates for 10 simple tasks (digits, name dialing, etc.) In-car tests are performed at several speed/noise levels.
© 2007 IBM Corporation 9 MALACH: A challenging speech corpus Emotional speech young man they ripped his teeth and beard out they beat him Disfluencies A- a- a- a- band with on- our- on- our- arm Multimedia digital archive: 116,000 hours of interviews with over 52,000 survivors, liberators, rescuers and witnesses of the Nazi Holocaust, recorded in 32 languages. Goal: improved access to large multilingual spoken archives Challenges: Frequent interruptions: CHURCH TWO DAYS these were the people who were to go to march TO MARCH and your brother smuggled himself SMUGGLED IN IN IN IN
© 2007 IBM Corporation 10 Named Entity Detection in Segmentation Person Location 31 named entity tags: Organization Country Cardinal number Money Date Duration Age Ordinal number Percentage Animal Plant Substance Occupation Disease …
© 2007 IBM Corporation 11 Captioning audio: What are the options? StenographersCost, availability Automatic speech recognitionPerformance for speaker independent, any topic, multiple speakers, noisy backgrounds….. OptionsIssues Captioning and transcribing audio material: Additional Advantages Text-based search vs. audio-based search Reading text: faster than listening to the auditory equivalent Second language learners Individuals with certain learning disabilities
© 2007 IBM Corporation 12 Understandability….ASR vs. stenocaptioning: Manageable errors ASR: a picture perfect landing for the space shuttle atlantis this morning the shuttle touched down at the kennedy space center in florida about six twenty one this morning IN ending a twelve day mission TRUTH: a picture perfect landing for the space shuttle atlantis this morning the shuttle touched down at the kennedy space center in florida about six twenty one this morning ** ending a twelve day mission ASR: since the diet drug combination FEN fen was pulled off the market some dieters **** been looking for something that would work as well we will see what's in the works TRUTH: since the diet drug combination PHEN fen was pulled off the market some dieters HAVE been looking for something that would work as well we will see what's in the works
© 2007 IBM Corporation 13 Understandability….ASR vs. stenocaptioning: Distracting/confusing ASR: ** TOOK IT makes a lot of FOLKS and also ** THAT e. mail volleys more than twice pick up the phone TRUTH O. K. THAT makes a lot of SENSE and also IF AN e. mail volleys more than twice pick up the phone ASR: STAY connected through e. mail has become very common in a lot of homes IN on the job but ********* on how it's used it can be terrific FOR disastrous we will look at some e. mail problems THAT possible solutions TRUTH: STAYING connected through e. mail has become very common in a lot of homes AND on the job but DEPENDING on how it's used it can be terrific OR disastrous we will look at some e. mail problems AND possible solutions ASR: so they do not have to make their own interpretation makes a lot of THINGS another tip TO write an e. mail IS WHAT IT a news paper article in other words state the most pertinent information first we always say in the news business do not bury the lead TRUTH: so they do not have to make their own interpretation makes a lot of SENSE another tip TOO write an e. mail AS YOU WOULD a news paper article in other words state the most pertinent information first we always say in the news business do not bury the lead
© 2007 IBM Corporation 14 Text and punctuation
© 2007 IBM Corporation 15 Quality control for broadcast captioning Thursday, July 05, 2007 Closed Captions On Ohio TV: 24/7 Gibberish Dished To The Disabled
© 2007 IBM Corporation 16 Quality control for Broadcast Captioning Q: Do captions have to meet accuracy requirements, such as having only so many spelling errors per program? A: At present, captions are not required to meet any particular quality or accuracy standards. The Federal Communications Commission concluded that program providers have incentives to offer high quality captions, in keeping with the overall quality of the programs they offer. The FCC also concluded that it would be difficult to develop and monitor quality standards at this time. However, viewers may let video providers know whether they are satisfied with the captions through purchases of advertised products, subscriptions to program services, or contacts with providers concerning the programs. The above information has been excerpted from the FCC guidelines and the Captioned Media Program of the National Association of the Deaf.
© 2007 IBM Corporation 17 Using ASR for captioning….incrementally…UK Media and re-speaking
© 2007 IBM Corporation 18 Using ASR for Broadcast Captioning..incrementally…Protitle Live System Enables creation of subtitles in all major languages, using speech recognition Functions Correction in real time Validation in real time Timing Total cycle time between 2 to 7 seconds 5 seconds on average Economics - Re-speaking: 1/10 th the cost of real time stenographer
© 2007 IBM Corporation 19 Using ASR for Broadcast captioning…incrementally…Real-time editing Assume: speaker obtains 80 percent ASR accuracy when speaking at a rate of 150 words a minute Editor needs to correct 15 words in a minute to increase the accuracy to 90 percent. –by choosing the 15 most important errors, some of the remaining 15 errors may not detract significantly from understanding. In classrooms in the UK and in other countries disabled students have people taking notes for them who are trying to type or write much faster than 15 words/minute to record as much as possible. If instead of trying to record everything, the speaker used speech recognition, the note taker need only type the corrections. People can read four or more times faster than somebody speaks. Therefore: possible to do ‘something else’ when reading words displayed at speaking speeds Real time editing can be separated into three activities: –Finding the error and highlighting it –Entering the correction –Replacing the error with the correction Using foot pedals to move the highlight to the exact position and triggering the replacement could enable the hands to remain free for entering the corrections. Source: Professor M. Wald, Southampton University
© 2007 IBM Corporation 20 Automated measures of accuracy Proposal from the WGBH National Center for Accessible Media (NCAM) Use language-processing tools to develop an automated caption accuracy assessment system for real-time captions on live news programming Can text-based data mining and speech-to-text technologies produce meaningful data about stenocaption accuracy? –Explore the capabilities of data mining software agents to identify discrepancies between errors contained within stenocaption data sets and speech-to-text data sets, and generate a caption accuracy analysis of the data set under review. Through these methods, goal is to: Improve the ability of the television community to monitor and maintain the quality of live captioning they offer to viewers who are deaf or hard of hearing Ease the current burden on caption viewers to document and advocate for comprehensible captions.
© 2007 IBM Corporation 21 Future vision… Automatic Speech Transcription for less regulated arenas –Captioning podcasts, lectures, meetings, presentations… Easier tools to modify and customize Easier and more cost-effective mechanisms to deliver Understanding quality control issues - - what is accuracy, what is the cost of an error Back-up options More pervasive usage Higher quality deliverables
© 2007 IBM Corporation 22
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
ACCESSIBLE TECHNOLOGIES FOR SPEECH MANAGEMENT “Making media accessible to all” ITU workshop – Geneva October 2013.
NATIONAL IT AUTHORITY MODULE 5 PROCESS HANDLING SKILLS AND KNOWLEDGE.
Captioning Basics Beth Case Program Manager for Digital, Emerging, and Assistive Technologies University of Louisville
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
CAREERS IN COURT REPORTING AND CAPTIONING © 2016 National Court Reporters Association crTakeNote.com How to get started in a career with infinite possibilities.
Media Accessibility Crystal Gold, M.S. Assistant Director Multimedia Development Services, UCTS University College, IUPUI.
The information contained herein is CONFIDENTIAL and is not to be used or distributed in any manner without the express consent of Global Tel*Link Introducing.
International Telecommunication Union Committed to connecting the world ITU/EBU Workshop Accessibility to Broadcasting and IPTV ACCESS for ALL, 23 – 24.
Accessible Media Using Video and Audio to meet the needs of a diverse populations Presented by Kaela Parks.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Court Reporting A Great Career Starts Here. A Tradition of Responsibility Profession Dates back to 4 th century BCE –The “&” symbol we use today is one.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
Captioning Basics VLC Professional Development Center.
Screen Reader A program that combines sound and picture to help explain what is on the computer screen. Scenario: Mark has very low vision and has troubling.
1 “ Speech ” EMPOWERED COMPUTING Greenfield Business Centre, 20 th September, 2006.
Randall’s ESL Cyber Listening Lab Randall S. Davis
© 2017 SlidePlayer.com Inc. All rights reserved.