Automatic Transcription Reconstruction System (ATRS) Serguei Pakhomov Michael Schonwetter Joan Bachenko Lernout & Hauspie Healthcare Systems Group "I can't.

Automatic Transcription Reconstruction System (ATRS) Serguei Pakhomov Michael Schonwetter Joan Bachenko Lernout & Hauspie Healthcare Systems Group "I can't believe it’s not literal!"

Outline of Talk Start Demo Processing Define Problem Describe ATRS Components Display ATRS Demo Results

Start Demo Processing

Medical Transcription Operation Partial Transcriptions are the commercial product of the operation –Partial Transcripts are plentiful –Can be paired with speech files Human Transcriptionist Partial TranscriptionTel. Speech

Sample Partial Transcription

Literal Transcription Generation

Summary Problems Addressed –Partial Transcriptions are Available but Inadequate –Literal Transcriptions are Essential –Human Generated Literal Transcriptions are: Expensive & Error Prone Suggested Solution –Recycle Partial Transcriptions with ASR to Generate Semi-Literal Transcriptions

ATRS I/O –Inputs: Partial transcript (RTF) Digitized telephony speech (8KHz Mulaw) –Outputs semi-literal transcript in its several variants speech-text alignment for assisting in generating literal truth Partial Transcription Speech ATRS Semi-Literal for AM Semi-Literal for LM Aligned Semi-Literal with Digitized Speech

Description of DPTRS APFSM Dictionary Partial Transcript Recognizer Rec. Output Integrator Semi- Literal Transcript Speech

Supporting Models Probabilistic Finite State Model (PFSM) Filled Pause Model Background Model Augmented Probabilistic Finite State Model (APFSM)

Filled Pause Model Training Corpus with natural Filled Pauses (FP) FP distribution extractor FP distribution model Partial transcription corpus with no FP’s FP distributor Partial transcription corpus with artificial FP’s Language modeling software Filled Pause Model

Background Model Literal Transcriptions Corpus Partial Transcriptions Corpus Difference Extractor Corpus of phrases spoken but not transcribed (Out Of Transcription(OOT) corpus) Language modeling software Background Model

Generate Dictionary Reduce phonetic confusability –limit entries to those items in the transcription (and supporting models). Dynamically generate pronunciations –for items in the partial transcript which are out of vocabulary.

Recognition Pass Dictation processed by recognition engine using: –APFSM –Custom Dictionary –SI Acoustic Model

Integration Recognizer output (HYP) is compared to Partial transcript (REF). –For Acoustic Modeling: –Matches –Substitutions: Use REF portion –Insertions: Filled-Pauses, Punctuation –For Language Modeling: –Matches –Substitutions: Use REF portion –Insertions: Use ALL Insertions

Integrator 12345 REFHYP LABELAMLMsemi-lit thatthatMATCHthatthat she she MATCHsheshe bemeSUBSTITUTIONbebe treated treated MATCHtreatedtreated for-- DELETION--for twelvetwelve MATCHtwelvetwelve weeksweeks MATCHweeksweeks --ahINSERTIONahah -- periodINSERTIONperiodperiod --onINSERTION--on threethree MATCHthreethree --excuseINSERTION--excuse --meINSERTION--me plantar plantar MATCHplantarplantar warts wartsMATCHwartswarts

Semi-Literal Transcript

Results: Compare to Literal Transcripts (n=774) –Alignment of Partial vs. Literal –Alignment of Semi-Lit vs. Literal yields 4.4% (absolute) better alignment

View Demo Results

Contact Information Contact Info –Serguei Pakhomov Spakhomov@LHSL.com –Michael Schonwetter Mschonwetter@LHSL.com –Joan Bachenko Joan-B@LHSL.com

Automatic Transcription Reconstruction System (ATRS) Serguei Pakhomov Michael Schonwetter Joan Bachenko Lernout & Hauspie Healthcare Systems Group "I can't.

Similar presentations

Presentation on theme: "Automatic Transcription Reconstruction System (ATRS) Serguei Pakhomov Michael Schonwetter Joan Bachenko Lernout & Hauspie Healthcare Systems Group "I can't."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Transcription Reconstruction System (ATRS) Serguei Pakhomov Michael Schonwetter Joan Bachenko Lernout & Hauspie Healthcare Systems Group "I can't.

Similar presentations

Presentation on theme: "Automatic Transcription Reconstruction System (ATRS) Serguei Pakhomov Michael Schonwetter Joan Bachenko Lernout & Hauspie Healthcare Systems Group "I can't."— Presentation transcript:

Similar presentations

About project

Feedback