Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Similar presentations


Presentation on theme: "Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)"— Presentation transcript:

1 Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

2 Brief Biography 1995Ph. D. (Information Science), Kyoto Univ. 1995Associate Professor, Kyoto Univ. 1995-96Visiting Researcher, Bell Labs., USA 2003-Professor, Kyoto Univ. 2003-06IEEE SPS Speech TC member 2006-Technical Consultant, The House of Representative, Japan Published 150~ papers in automatic speech recognition (ASR) and its applications Web http://www.ar.media.kyoto-u.ac.jp/~kawahara/http://www.ar.media.kyoto-u.ac.jp/~kawahara/

3 Contents 1.Review of ASR technology 2.ASR system for the Japanese Diet 3.Next-generation transcription system of the Japanese Diet

4 Trend of ASR style Informal Formal onemultiple Number of speakers Formalpresentation Classroom lectures Phone conversation Business meetings Reading/Re-speaking Broadcast news Spontaneousspeech Parliament

5 Review of ASR technology (1/2) Broadcast News [world-wide] –Professional anchors, mostly reading manuscripts –Accuracy over 90% Public speaking, oral presentations [Japan] –Ordinary people making fluent speech –Accuracy ~80% (close-talking mic.) Classroom lectures [world-wide] –More informal speaking –Accuracy ~60% (pin mic.)

6 Review of ASR technology (2/2) Telephone conversations [US] –Ordinary people, speaking casually –Accuracy 60%  85% Business meetings [Europe/US] –Ordinary people, speaking less formally –Accuracy 70% (close mic.), 60% (distant mic.) Parliamentary meetings [Europe/Japan] –Politicians speaking formally –EU: plenary sessions: 90% –Japan: committee meetings: 85%

7 Deployment of ASR in Parliaments & Courts Some countries –Steno-mask & Voice writing –Re-speaking  Commercial dictation software Some local autonomies in Japan –Direct recognition of politicians’ speech Japanese Courts –ASR for efficient retrieval from recorded sessions Japanese Parliaments (=Diet) –to introduce ASR; direct recognition of politicians’ speech –Mostly in committee meetings …interactive, spontaneous, sometimes excited

8 Language-specific Issues in Japanese Need to convert kana (phonetic symbol) to kanji Conversion ambiguous  many homonym (ex.) KAWAHARA ( カワハラ ) → 河原 (not 川原 ) –Very hard to type-in real-time –Only limited stenographers using special keyboards can Difference in verbatim-style and transcript-style (ex.) おききしたいのですが  ききたい(のです) –Re-speaking is not so simple –need to rephrase in many cases

9 ASR Architecture Signal processing Acoustic model Language model Dictionary Recognition Engine (decoder) P(W/X) ∝ P(W) ・ P(X/W) P(W) X P(P/W) P(W) P(X/P) P(X/W) /a, i, u, e, o…/ 京都 ky o: t o 京都 + の + 天気 output: W=argmax P(W/X) Depend on input condition Depend on application

10 Current Status of ASR Problems unsolved –Spontaneous/conversational speech –Noisy environments Including distant microphones Solutions ad-hoc –Collect large-scale “matched” data (corpus) Same acoustic environment, speakers (10hours~) Cover same topics, vocabulary (~M words) –Prepare dedicated acoustic & language models Huge cost in development & maintenance

11 Contents 1.Review of ASR technology 2.ASR system for the Japanese Diet 3.Next-generation transcription system of the Japanese Diet

12 ASR Research in Kyoto Univ. Since 1960s, one of the pioneers Development of free software Julius Research in spontaneous speech recognition –1999- Oral presentations –2001- TV discussions –2004- Classroom lectures –2003- Parliamentary meetings

13 Free ASR Software: Julius Developed since 1997 in Kyoto-U & other sites Open-source  multi-platform (Linux, Mac, Windows, iPhone) Open architecture –Independent from acoustic & language models  Ported to many languages  Ported to many applications (telephony, robot…) Standard model for Japanese Widely-used research platform http://julius.sourceforge.jp

14 Corpus of Parliamentary Meetings Cover all major committees and plenary sessions 200 hours, 2.4M words Faithful transcripts of utterances including fillers, which are aligned with official minutes { えー } それでは少し、今 { そのー } 最初に大臣からも、 { そ のー } 貯蓄から投資へという流れの中に { ま } 資するんじゃ ないだろうかとかいうような話もありましたけれども、 { だけど / だけれども } 、 { まあ } あなたが言うと本当にうそ らしくなる { んで / ので }{ ですね、えー } もう少し { ですね、 あのー } これは { あー } 財務大臣に { えー } お尋ねをしたいん です { が } 。 { ま } その { あの } 見通しはどうかということでありますけれ ども、これについては、 { あのー } 委員御承知の { その } 「改 革と展望」の中で { ですね } 、我々の今 { あのー } 予測可能な 範囲で { えー } 見通せるものについてはかなりはっきりと 書かせていただいて ( い ) るつもりでございます。

15 Cover pronunciation variations Cover poor articulation Cover disfluencies & colloquial expressions ASR modules oriented for Spontaneous Speech Signal processing Acoustic model Language model Dictionary Recognition Engine (decoder) P(W/X) ∝ P(W) ・ P(X/W) P(W) X P(X/W) Corpus Innovative techniques

16 ASR Performance Accuracy –Word accuracy 85% (Character accuracy 87% ) Plenary sessions 90% Committee meetings 80 ~ 87% –90% seems almost perfect –No commercial software can achieve!! Real-time factor 1-3 –Latency in 10 min.

17 Related Techniques Noise suppression & dereverberation –Not serious once matched training data available Speaker change detection –Preferred –Current technology level seems not sufficient Auto-edit –Filler removal  easy –Colloquial expression replacement  non-trivial –Period insertion  still research stage

18 Contents 1.Review of ASR technology 2.ASR system for the Japanese Diet 3.Next-generation transcription system of the Japanese Diet

19 The House of Representatives in Japan 2005: terminated recruiting stenographers 2006: investigated ASR technology for the new transcription system 2007: developed a prototype system and made preliminary evaluations 2008: system design 2009: system implementation 2010: trial and deployment

20 ASR system: Kyoto Univ. model integrated to NTT engine Signal processing Acoustic model Language model Dictionary Recognition Engine (decoder) P(W/X) ∝ P(W) ・ P(X/W) P(W) X P(P/W) P(W) P(X/P) P(X/W) /a, i, u, e, o…/ 京都 ky o: t o 京都 + の + 天気 NTT Corp. Kyoto Univ.  House

21 Issues in Post-Editor For efficient correction of ASR errors and cleaning transcript into document-style Easy reference to original speech (+video) –by time, by utterance, by character (cursor) –Can speed up & down speech-replay Word-processor interface (screen editor); not line editor –to concentrate on making correct sentences –Serious misunderstanding between system developers and stenographers!!

22 System Evaluation (@Kyoto) Subjects : 18 students Post-editing ASR outputs is more efficient than typing from scratch, regardless of the accuracy  Those hard for ASR are also hard for human 3 4 5 6 7 8 9 10 50556065707580859095 ASR accuracy edit time (min) Type from scratch Post-edit ASR output

23 System Evaluation (@Kyoto) Subjective evaluation correlates with ASR accuracy Threshold in 75% to have ASR preferred 1 2 3 4 5 6 7 50556065707580859095 ASR accuracy Usability score of ASR

24 System Evaluation (@House) Subjects: 8 stenographers System: proto-type ASR-based system reduced the edit time, compared with current short-hand system –78 min.  68 min. (for 5 min. segment) Threshold in ASR accuracy of 80% –75%  degradation in edit time; a half say negative in using ASR

25 Side effect of ASR-based system Everything (text/speech/video) digitized and hyper-linked  Efficient search & retrieval Less burden?  may work on longer segments?? Significantly less special training needed compared with current short-hand system

26 Conclusions ASR of parliamentary meetings is feasible, given a large collection of data –~100 hour speech –~1G word text (minutes) –Accuracy 85-90% Effective post-processing is still under investigation Automatic translation research is also ongoing


Download ppt "Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)"

Similar presentations


Ads by Google