Download presentation
Presentation is loading. Please wait.
Published byPierce Short Modified over 9 years ago
1
Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study Manny Rayner Geneva University (joint work with Beth Ann Hockey and Gwen Christian)
2
Structure of talk Background: Regulus and MedSLT Grammar-based language models and statistical language models
3
What is MedSLT? Open Source medical speech translation system for doctor-patient dialogues Medium-vocabulary (400-1500 words) Grammar-based: uses Regulus platform Multi-lingual: translate through interlingua
4
MedSLT Open Source medical speech translator for doctor – patient examinations Main system unidirectional (patient answers non-verbally, e.g. nods or points) – Also experimental bidirectional system Two main purposes – Potentially useful (could save lives!) – Vehicle for experimenting with underlying Regulus spoken dialogue engineering toolkit
5
Regulus: central goals Reusable grammar-based language models – Compile into recognisers Infrastructure for using them in applications – Speech translation – Spoken dialogue Multilingual Efficient development environment Open Source
6
$25 (paperback edition) from amazon.com The full story…
7
What kind of applications? Grammar-based is – Good on in-coverage data – Good for complex, structured utterances Users need to – Know what they can say – Be concerned about accuracy Good target applications – Safety-critical – Medium vocabulary (~200 – 2000 words)
8
In particular… Clarissa – NASA procedure assistant for astronauts – ~250 word vocabulary, ~75 command types MedSLT – Multilingual medical speech translator – ~400 – ~1000 words, ~30 question types SDS – Experimental in-car system from Ford Research – First prize, Ford internal demo fair, 2007 – ~750 words
9
Key technical ideas Reusable grammar resources Use grammars for multiple purposes – Parsing – Generation – Recognition Appropriate use of statistical methods
10
Reusable grammar resources Building a good grammar from scratch is very challenging Need a methodology for rational reuse of existing grammar structure Use small corpus of examples to extract structure from a large resource grammar
11
General UG EBL Specialization UG to CFG Compiler R E G U L U S Application Specific UG CFG Grammar Recognizer NUANCENUANCE The Regulus picture Lexicon Training Corpus PCFG Grammar CFG to PCFG Compiler (P)CFG to Recogniser Compiler Operationality Criteria
12
The general English grammar Loosely based on SRI Core Language Engine grammar Compositional semantics (4 different versions) ~200 unification grammar rules ~75 features Core lexicon, ~ 450 words (Also resource grammars for French, Spanish, Catalan, Japanese, Arabic, Finnish, Greek)
13
General grammar domain-specific grammar “Macro-rule learning” Corpus-based process Remove unused rules and lexicon items Flatten parsed examples to remove structure Simpler structure less ambiguity smaller search space
14
when do you get headaches PP V PRO V N NP NBAR NP VBAR VP S S UTTERANCE EBL example (1)
15
when do you get headaches PP V PRO V N NP NBAR NP VBAR VP S S UTTERANCE EBL example (2)
16
when do you get headaches PP V PRO V N NP VBAR S UTTERANCE EBL example (3) Main new rules: S PP VBAR VBAR NP NP N
17
Using grammars for multiple purposes Parsing – Surface words logical form Generation – Logical form surface words Recognition – Speech surface words
18
Building a speech translator Combine Regulus-based components – Source-language recognizer (speech words) – Source-language parser (words logical form) – Transfer from source to target, via interlingua (logical form logical form) – Target-language generator (logical form words) – (3 rd party text to speech)
19
Adding statistical methods Two different ways to use statistical methods: Statistical tuning of grammar Intelligent help system
20
Impact of statistical tuning (Regulus book, chapter 11) Base recogniser – MedSLT with English recogniser – Training corpus: 650 utterances – Vocabulary: 429 surface words Test data: – 801 spoken and transcribed utterances
21
Vary vocabulary size Add lexical items (11 different versions) Total vocabulary 429 – 3788 surface words New vocabulary not used in test data Expect degradation in performance – Larger search space – New possibilities just a distraction
22
Impact of statistical tuning for different vocabulary sizes Vocabulary size Sem Error Rate
23
Intelligent help system Need robustness somewhere Add a backup statistical recogniser Use it to advise the user – Approximate match with in-coverage examples – Show user similar things they could say Original paper: Gorrell, Lewin and Rayner, ICSLP 2002
24
MedSLT experiments (Chatzichrisafis et al, HLT workshop 2006) French English version of system Basic questions – How quickly do novices become experts? – Can people adapt to limited coverage? Let subjects use system several times, and track performance
25
Experimental Setup Subjects – 8 medical students, no previous knowledge of system Scenario – Experimenter simulates headache – Subject must diagnose it – 3 sessions, 3 tasks per session Instruction – ~20 min instructions & video (headset, push-to-talk) – All other instruction from help system
26
Results - # Interactions Interactions
27
Results – Time/Diagnosis
28
Questionnaire results I quickly learned how to use the system.4.4 System response times were generally satisfactory.4.5 When the system did not understand me, the help system usually showed me another way to ask the question.4.6 When I knew what I could say, the system usually recognized me correctly.4.3 I was often unable to ask the questions I wanted.3.8 I could ask enough questions that I was sure of my diagnosis.4.3 This system is more effective than non-verbal communication using gestures.4.3 I would use this system again in a similar situation.4.1
29
Summary After 1.5 hours of use, subjects complete task in average of 4 minutes – System implementers average 3 minutes All coverage learned from help system Subjects’ impressions very positive
30
A few words about interlingua Coverage in different languages diverges if left to itself – Want to enforce uniform coverage Many-to-many translation – “N 2 problem” Solution: translate through interlingua – Tight interlingua definition
31
Interlingua grammar Think of interlingua as a language Define using Regulus – Mostly for constraining representations – Also get a surface form “Semantic grammar” – Not linguistic, all about domain constraints
32
Example of interlingua “YN-QUESTION pain become-better sc-when [ you sleep PRESENT] PRESENT” [[utterance_type, ynq], [symptom, pain], [event, become_better], [tense, present], [sc, when], [clause, [[utterance_type, dcl], [pronoun, you], [action, sleep], [tense, present]]]]
33
Constraints from interlingua Source language sentences licensed by grammar may not produce valid interlingua Interlingua can act as a knowledge source to improve language modelling
34
Structure of talk Background: Regulus and MedSLT Grammar-based language models and statistical language models
35
Language models Two kinds of language models Statistical (SLM) – Trainable, robust – Require a lot of corpus data Grammar-based (GLM) – Require little corpus data – Brittle
36
Compromises between SLM and GLM Put weights on GLM (CFG PCFG) – Powerful technique, see earlier – Doesn’t address robustness Put GLMs inside SLMs (Wang et al, 2002) Use GLM to generate training data for SLM (Jurafsky et al 1995, Jonson 2005)
37
Generating SLM training data with a GLM Optimistic view – Need only small seed corpus, to build GLM – Will be robust, since finally an SLM Pessimistic view – “Something for nothing” – Data for GLM could be used directly to build an SLM Hard to decide – Don’t know what data went into GLM – Often just in grammar writer’s head
38
Regulus permits comparison Use Regulus to build GLM Data-driven process with explicit corpus Same corpus can be used to build SLM Comparison is meaningful
39
Two ways to build SLM Direct – Seed corpus SLM Indirect – Seed corpus GLM corpus SLM
40
Parameters for indirect method Size of generated corpus – Can generate any amount of data Method of generating corpus – CFG versus PCFG Filtering – Use interlingua to filter generated corpus
41
CFG versus PCFG generation CFG – Use plain GLM to do random generation PCFG – Use seed corpus to weight GLM rules – Weights then used in random generation
42
Interlingua filtering Impossible to make GLM completely tight Many in-coverage sentences make no sense Some of these don’t produce valid interlingua Use interlingua grammar as filter
43
Example: CFG generated data what attacks of them 're your duration all day have a few sides of the right sides regularly frequently hurt where 's it increased what previously helped this headache have not any often ever helped are you usually made drowsy at home what sometimes relieved any gradually during its night 's this severity frequently increased before helping when are you usually at home how many kind of changes in temperature help a history
44
Example: PCFG generated data does bright light cause the attacks are there its cigarettes does a persistent pain last several hours is your pain usually the same before were there them when this kind of large meal helped joint pain do sudden head movements usually help to usually relieve the pain are you thirsty does nervousness aggravate light sensitivity is the pain sometimes in the face is the pain associated with your headaches
45
Example: PCFG generated data with interlingua filtering does a persistent pain last several hours do sudden head movements usually help to usually relieve the pain are you thirsty does nervousness aggravate light sensitivity is the pain sometimes in the face have you regularly experienced the pain do you get the attacks hours is the headache pain better are headaches worse is neck trauma unchanging
46
Experiments Start with same English seed corpus – 948 utterances Generate GLM recogniser Generate different types of training corpus – Train SLM from each corpus Compare recognition performance – Word Error Rate (WER) – Sentence Error Rate (SER) McNemar sign test on SER to get significance
47
Experiment 1: different methods VersioncorpusWERSER GLM94821.96%50.62% SLM, seed corpus94827.74%58.40% SLM, CFG, no filter428149.0%88.4% SLM, CFG, filter428144.68%85.68% SLM, PCFG, no filter428125.98%65.31% SLM, PCFG, filter428125.81%63.70%
48
Experiment 1: significant differences GLM >> all SLMs seed corpus >> all generated corpora PCFG generation >> CFG generation filtered > not filtered However, generated corpora are small…
49
Experiment 2: different sizes of corpus VersioncorpusWERSER GLM94821.96%50.62% SLM, seed corpus94827.74%58.40% SLM, PCFG, no filter16 61924.84%62.47% SLM, PCFG, filter16 61923.80%59.51% SLM, PCFG, no filter497 79824.38%59.88% SLM, PCFG, filter497 79823.76%57.16%
50
Experiment 2: significant differences GLM >> all SLMs large corpus > small corpus large unfiltered generated corpus ~ seed corpus – SER for large unfiltered corpus about the same large filtered generated corpus ~/> seed corpus – SER for large filtered corpus better, but not significant filtered > not filtered
51
Experiment 3: like 2, but only in-coverage data VersioncorpusWERSER GLM9487.00%22.37% SLM, seed corpus94814.40%42.02% SLM, PCFG, no filter16 61914.13%46.11% SLM, PCFG, filter16 61912.76%40.86% SLM, PCFG, no filter497 79812.35%40.66% SLM, PCFG, filter497 79811.25%36.19%
52
Experiment 3: significant differences GLM >> all SLMs large corpus > small corpus large unfiltered generated corpus ~/> seed corpus – SER for large unfiltered corpus better, not significant large filtered generated corpus > seed corpus filtered > not filtered
53
Using GLMs to make SLMs: conclusions Regulus lets us evaluate fairly Indirect method for building SLM only slightly better than direct one GLM better than all SLM variants – Especially clear on in-coverage data PCFG generation much better than CFG
54
Summary MedSLT – Potentially useful tool for doctors in future – Good test-bed for research now Using GLMs to build SLMs – Example of how Regulus lets us evaluate a grammar-based method objectively
55
For more information Regulus websites http://sourceforge.net/projects/regulus/ http://www.issco.unige.ch/projects/regulus/ Rayner, Hockey and Bouillon “Putting Linguistics Into Speech Recognition” (CSLI Press, June 2006) http://sourceforge.net/projects/regulus/ http://www.issco.unige.ch/projects/regulus/
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.