Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Update on Transcription of Fisher Phase II Data Owen Kimball, Chia-lin Kao, Tresi Arvizo, John Makhoul.

Similar presentations


Presentation on theme: "11 Update on Transcription of Fisher Phase II Data Owen Kimball, Chia-lin Kao, Tresi Arvizo, John Makhoul."— Presentation transcript:

1 11 Update on Transcription of Fisher Phase II Data Owen Kimball, Chia-lin Kao, Tresi Arvizo, John Makhoul

2 22 Current Transcription Effort  Transcribing 1400 hours of Fisher data, including –~1560 calls from Phase I collection –~6840 calls from Phase II (more recent) collection  Phase II collection replaces original 40 topics with expanded set of 69 (http://www.ldc.upenn.edu/Fisher/new_topics.html)  As before, WordWave transcribing, BBN post processing  From the 1400 hours, LDC will hold back calls that include any speaker or phone number that overlaps with test sets that NIST has defined

3 33 New Transcription Guidelines  Eliminated incorrect forms (e.g. British spellings) from dictionary used to filter transcripts  Changes to Style Guide to clarify items that led to inconsistencies –Primarily to increase efficiency of manual post processing  Added [BN] and [/BN] for sustained background noise  Changes to punctuation guidelines to support better future Rich Transcription research –Clarification of double dash (“ -- ”) for discontinuities –Ellipsis (“…”) to indicate continued speaking across interrupt

4 44 Sample Transcript, Revised Style Guide R: Yeah. And then when you're reading it, you know, it's like, okay, um, you know, people -- people still view things different. L: Right. R: You know? We could be reading the same thing and -- and see it two different ways and... L: Oh, obviously. R:... he shouldn't have said that. [LAUGH] But -- and see I don't -- I don't get the newspaper at all. I just -- L: Yeah. Unfortunately I have to say I don't really either. R: I don't --... L: I used to. R:... I don't even have time to even sit down and... L: [LAUGH] R:... you know, really read a newspaper, you know? [LAUGH] L: Right. R: [SIGH] Everything has gotten to be so quick that you can't, you know --?

5 55 Current Status  Sent 492 hours of processed transcripts to LDC on 12/2/04  LDC released 465 hours of this in Feb 05  As of 3/15/05 –1288 hours (7734 conversations) received from WordWave –1055 hours (6631 conversations) post processed by BBN  WordWave is committed to finishing by end of March 05  BBN has reserved EARS funding to finish post processing –Will send to LDC as soon as all transcripts processed –Hoping for mid-April 05


Download ppt "11 Update on Transcription of Fisher Phase II Data Owen Kimball, Chia-lin Kao, Tresi Arvizo, John Makhoul."

Similar presentations


Ads by Google