Download presentation
Presentation is loading. Please wait.
Published byBeverly Todd Modified over 8 years ago
1
Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)
2
Brief Biography 1995Ph. D. (Information Science), Kyoto Univ. 1995Associate Professor, Kyoto Univ. 1995-96Visiting Researcher, Bell Labs., USA 2003-Professor, Kyoto Univ. 2003-06IEEE SPS Speech TC member 2006-Technical Consultant, The House of Representative, Japan Published 150~ papers in automatic speech recognition (ASR) and its applications Web http://www.ar.media.kyoto-u.ac.jp/~kawahara/http://www.ar.media.kyoto-u.ac.jp/~kawahara/
3
Contents 1.Review of ASR technology 2.ASR system for the Japanese Diet 3.Next-generation transcription system of the Japanese Diet
4
Trend of ASR style Informal Formal onemultiple Number of speakers Formalpresentation Classroom lectures Phone conversation Business meetings Reading/Re-speaking Broadcast news Spontaneousspeech Parliament
5
Review of ASR technology (1/2) Broadcast News [world-wide] –Professional anchors, mostly reading manuscripts –Accuracy over 90% Public speaking, oral presentations [Japan] –Ordinary people making fluent speech –Accuracy ~80% (close-talking mic.) Classroom lectures [world-wide] –More informal speaking –Accuracy ~60% (pin mic.)
6
Review of ASR technology (2/2) Telephone conversations [US] –Ordinary people, speaking casually –Accuracy 60% 85% Business meetings [Europe/US] –Ordinary people, speaking less formally –Accuracy 70% (close mic.), 60% (distant mic.) Parliamentary meetings [Europe/Japan] –Politicians speaking formally –EU: plenary sessions: 90% –Japan: committee meetings: 85%
7
Deployment of ASR in Parliaments & Courts Some countries –Steno-mask & Voice writing –Re-speaking Commercial dictation software Some local autonomies in Japan –Direct recognition of politicians’ speech Japanese Courts –ASR for efficient retrieval from recorded sessions Japanese Parliaments (=Diet) –to introduce ASR; direct recognition of politicians’ speech –Mostly in committee meetings …interactive, spontaneous, sometimes excited
8
Language-specific Issues in Japanese Need to convert kana (phonetic symbol) to kanji Conversion ambiguous many homonym (ex.) KAWAHARA ( カワハラ ) → 河原 (not 川原 ) –Very hard to type-in real-time –Only limited stenographers using special keyboards can Difference in verbatim-style and transcript-style (ex.) おききしたいのですが ききたい(のです) –Re-speaking is not so simple –need to rephrase in many cases
9
ASR Architecture Signal processing Acoustic model Language model Dictionary Recognition Engine (decoder) P(W/X) ∝ P(W) ・ P(X/W) P(W) X P(P/W) P(W) P(X/P) P(X/W) /a, i, u, e, o…/ 京都 ky o: t o 京都 + の + 天気 output: W=argmax P(W/X) Depend on input condition Depend on application
10
Current Status of ASR Problems unsolved –Spontaneous/conversational speech –Noisy environments Including distant microphones Solutions ad-hoc –Collect large-scale “matched” data (corpus) Same acoustic environment, speakers (10hours~) Cover same topics, vocabulary (~M words) –Prepare dedicated acoustic & language models Huge cost in development & maintenance
11
Contents 1.Review of ASR technology 2.ASR system for the Japanese Diet 3.Next-generation transcription system of the Japanese Diet
12
ASR Research in Kyoto Univ. Since 1960s, one of the pioneers Development of free software Julius Research in spontaneous speech recognition –1999- Oral presentations –2001- TV discussions –2004- Classroom lectures –2003- Parliamentary meetings
13
Free ASR Software: Julius Developed since 1997 in Kyoto-U & other sites Open-source multi-platform (Linux, Mac, Windows, iPhone) Open architecture –Independent from acoustic & language models Ported to many languages Ported to many applications (telephony, robot…) Standard model for Japanese Widely-used research platform http://julius.sourceforge.jp
14
Corpus of Parliamentary Meetings Cover all major committees and plenary sessions 200 hours, 2.4M words Faithful transcripts of utterances including fillers, which are aligned with official minutes { えー } それでは少し、今 { そのー } 最初に大臣からも、 { そ のー } 貯蓄から投資へという流れの中に { ま } 資するんじゃ ないだろうかとかいうような話もありましたけれども、 { だけど / だけれども } 、 { まあ } あなたが言うと本当にうそ らしくなる { んで / ので }{ ですね、えー } もう少し { ですね、 あのー } これは { あー } 財務大臣に { えー } お尋ねをしたいん です { が } 。 { ま } その { あの } 見通しはどうかということでありますけれ ども、これについては、 { あのー } 委員御承知の { その } 「改 革と展望」の中で { ですね } 、我々の今 { あのー } 予測可能な 範囲で { えー } 見通せるものについてはかなりはっきりと 書かせていただいて ( い ) るつもりでございます。
15
Cover pronunciation variations Cover poor articulation Cover disfluencies & colloquial expressions ASR modules oriented for Spontaneous Speech Signal processing Acoustic model Language model Dictionary Recognition Engine (decoder) P(W/X) ∝ P(W) ・ P(X/W) P(W) X P(X/W) Corpus Innovative techniques
16
ASR Performance Accuracy –Word accuracy 85% (Character accuracy 87% ) Plenary sessions 90% Committee meetings 80 ~ 87% –90% seems almost perfect –No commercial software can achieve!! Real-time factor 1-3 –Latency in 10 min.
17
Related Techniques Noise suppression & dereverberation –Not serious once matched training data available Speaker change detection –Preferred –Current technology level seems not sufficient Auto-edit –Filler removal easy –Colloquial expression replacement non-trivial –Period insertion still research stage
18
Contents 1.Review of ASR technology 2.ASR system for the Japanese Diet 3.Next-generation transcription system of the Japanese Diet
19
The House of Representatives in Japan 2005: terminated recruiting stenographers 2006: investigated ASR technology for the new transcription system 2007: developed a prototype system and made preliminary evaluations 2008: system design 2009: system implementation 2010: trial and deployment
20
ASR system: Kyoto Univ. model integrated to NTT engine Signal processing Acoustic model Language model Dictionary Recognition Engine (decoder) P(W/X) ∝ P(W) ・ P(X/W) P(W) X P(P/W) P(W) P(X/P) P(X/W) /a, i, u, e, o…/ 京都 ky o: t o 京都 + の + 天気 NTT Corp. Kyoto Univ. House
21
Issues in Post-Editor For efficient correction of ASR errors and cleaning transcript into document-style Easy reference to original speech (+video) –by time, by utterance, by character (cursor) –Can speed up & down speech-replay Word-processor interface (screen editor); not line editor –to concentrate on making correct sentences –Serious misunderstanding between system developers and stenographers!!
22
System Evaluation (@Kyoto) Subjects : 18 students Post-editing ASR outputs is more efficient than typing from scratch, regardless of the accuracy Those hard for ASR are also hard for human 3 4 5 6 7 8 9 10 50556065707580859095 ASR accuracy edit time (min) Type from scratch Post-edit ASR output
23
System Evaluation (@Kyoto) Subjective evaluation correlates with ASR accuracy Threshold in 75% to have ASR preferred 1 2 3 4 5 6 7 50556065707580859095 ASR accuracy Usability score of ASR
24
System Evaluation (@House) Subjects: 8 stenographers System: proto-type ASR-based system reduced the edit time, compared with current short-hand system –78 min. 68 min. (for 5 min. segment) Threshold in ASR accuracy of 80% –75% degradation in edit time; a half say negative in using ASR
25
Side effect of ASR-based system Everything (text/speech/video) digitized and hyper-linked Efficient search & retrieval Less burden? may work on longer segments?? Significantly less special training needed compared with current short-hand system
26
Conclusions ASR of parliamentary meetings is feasible, given a large collection of data –~100 hour speech –~1G word text (minutes) –Accuracy 85-90% Effective post-processing is still under investigation Automatic translation research is also ongoing
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.