Presentation on theme: "BabyTalk: Generating English Summaries of Clinical Data"— Presentation transcript:
1BabyTalk: Generating English Summaries of Clinical Data Ehud ReiterUniv of Aberdeen, CS Dept
2Structure Background: data-to-text Babytalk project Results of first evaluationCurrent work
3What is data-to-textGoal: generate English summaries of non-linguistic dataNumerical weather predictionsMedical recordsStatisticsEtc
4Simple Example: Weather Forecasts Input: numerical weather predictionsFrom supercomputer running a numerical weather simulationOutput: textual weather forecastWe’ve developed several systemsTwo used commercially (oil rig, road gritting)Users prefer some gen texts to human texts!Demo of pollen system on our webpageSo have others (FoG, MultiMeteo, …)
5Pollen forecastsGrass pollen levels for Tuesday have decreased from the high levels of yesterday with values of around 4 to 5 across most parts of the country. However, in South Eastern areas, pollen levels will be high with values of 6.
6Other data-text apps Medical: to-be-discussed Assistive technology: help blind people access statistical dataFinancial: summarise stock-market dataEducation: Summarise assessment results, help write storiesEngineering: Sum. gas-turbine dataEtc
7Why is data-to-text useful The world is drowning in dataNLP researchers talk about problems of too much text, but data problems are worseTexts are at least read by someone (writer)Most data is automatically collected and never looked at by a human
8Data overload Sensor recording 2 bytes/second Simulations 170KB/day63MB/yearMillions of sensors in hospitals, jet engines, …SimulationsWeather: 30MB for one day in one UK county, from one modelClimate models: petabytes of dataToo much data, need better tools for utilising!
9Decision Support Data often used for decision support Medical: help doctors make decisionsWeather: helps staff on offshore oil rigs plan their operationsEngineering: help plan maintenanceEtcOften under time pressureMake a decision in 3 min, here is 30MB of data to help you
10Using data for decision support AlarmingTrigger alarm if value exceeds thresholdOr other such simple ruleWorks, doesn’t get full value from dataVisualisationShow data to experts visuallyPeople like this, unclear how much it helps, especially when massive amount of data
11Using data for decision support Knowledge-based systemsFeed data into an expert system which makes recommendations based on itCan work in some contexts, but problemsDomain experts dislike being told what to doOften key data not available to KBSCan be brittle, fragile
12Data-text for decision support Idea: use KBS, NLP tech to generate a short text summary of a data setIntermediate between KBS and visualisationUse domain reasoning to highlight key info, infer causal links, add background knowBut stick to describing data, don’t tell experts what to do!
13Data-text for decision support vs alarms: deeper infovs visualisationJust key facts, not everythingSupplemented with causal links, etcvs KBSMore acceptable to usersMore robust, since not useless if missing some key data or knowledge
14Data-text for decision support Above is still somewhat speculativeBut people in many domains are interested in exploring the concept to see if it worksEsp since current situation is so bad!Of course other uses of data-to-textAssistive technology, education
15Language and World How does language relate to the world? Data-to-text is a great way of exploring thisThe real reason I got into this…
16BabyTalkGoal: Summarise clinical data about premature babies in neonatal ICUInput: sensor data; records of actions/observations by medical staffOutput: multi-para texts, summariseBT45: 45 mins data, for doctors (completed)BT-Nurse: 12 hrs data, for nursesBT-Family: 24 hrs data, for parentsBT-Clan: 24 hrs data, for other friends, familyBt-Doc: several hrs data, for doctors
20Input: Action Records FullDescriptor Time SETTING;VENTILATOR;FiO2 (36%)10.30MEDICATION;Morphine10.44ACTION;CARE;TURN/CHANGE POSITION;SUPINEACTION;RESPIRATION;HAND-BAG BABYSETTING;VENTILATOR;FiO2 (60%)10.47ACTION;RESPIRATION;INTUBATE
21BT45 texts Human corpus text At 1046 the baby is turned for re-intubation and re-intubation is complete by 1100 the baby being bagged with 60% oxygen between tubes. During the re-intubation there have been some significant bradycardias down to 60/min, but the sats have remained OK. The mean BP has varied between 23 and 56, but has now settled at 30. The central temperature has fallen to 36.1°C and the peripheral temperature to 33.7°C. The baby has needed up to 80% oxygen to keep the sats up.Computer-generated textBy 11:00 the baby had been hand-bagged a number of times causing 2 successive bradycardias. She was successfully re-intubated after 2 attempts. The baby was sucked out twice.At 11:02 FIO2 was raised to 79%.
22Babytalk architecture Signal analysis: patterns, trendsData interpretation: based on medical knowledge (like expert sys)Doc planning: select and structure events to be mentionedMicroplanning: choose words, syntactic structures, referring expRealisation: generate actual text
23Signal Analysis Detect trends, patterns, events, etc Detect artefacts Blood oxygen levels increasingDownward spike in heart rateDetect artefactsChanges due to sensor problemsPlenty of algorithms exist for thisWill not further discuss here
24Data Abstraction Detect higher-level events in the data Sequence of bradycardias (downward spikes in HR)Determine medical importanceBradycardia more important if simultaneous desaturation (downward spike in SO)Medical KBS
25Data Abs: Links Between Events Infer links between eventsBlood O2 falls, therefore O2 level in incubator is increasedHR up because baby is being handledMorphine given as part of the intubation procedureVery imp, much of value added of textHelps readers build good mental model of what is happening to the baby
26Document Planning First NLP stage Decide what events to mention Decide how these are ordered and organised
27Content Determination First approach: Include most medically important eventsAlso include moderately important events which are linked to very important eventsDoesn’t always work
28Problem: Continuity Omitting intermediate events confuses readers Example: TcPO2 suddenly decreased to 8.1. SaO2 increased to 92. TcPO2 suddenly decreased to 9.3There is a gradual rise in TcPO2 between the sudden fallsThis is less important medicallyBut important for reader’s comprehension
29Document Structure How do we order/group events By timeBy medical importanceBy body subsystem (eg, respiration)Initially focused on time, but users want more emphasis on subsystemEg, first a “scene” about respiration, then a “scene” about thermoregulationNot constant shifting between two
30Doc Planning: Narrative High-level analysis: need to do a better job of generating a “story” from the dataLink events togetherInclude events needed for story progression even if not important“Scene” structureQualitative observation by users
31Microplannig Second NLP stage Choose words and syntactic structure to express informationAggregationReference
32Challenge: Time Need to communicate temporal info Enough so that readers can interpret the dataNot too much, text becomes unreadableImagine story with “At John left home. At he met Mary in the pub. At 10.39…”
33Tenses Use Reichenbach model Usually worked, sometimes failed Speech time: time of report being readEvent time: time of event being describedReference time: determined using a salience modelSimilar to resolving anaphoric referenceUsually worked, sometimes failedNeed better model for reference time
34What does event time mean? Sometimes explicit time given for eventSupposed to be start time of event, sometimes misinterpretedEx:”After three attempts, at a peripheral venous line was inserted successfully.”13.53 refers to time of first (failed) attemptStart of LINE-INSERT-ATTEMPTS eventReaders interpret as time of final (succ) attemptNeed better linguistic model of timeLinguistic temporal ontology (Moens Steedman)?
35Lexical ChoiceNeed mechanism to map domain events (instances in a Protégé ontology) to linguistic structuresUse JESS rulesLexical info from Verbnet, NIH lexiconEngineering challengeRelate to Sheffield work on NLG/ontologies
36Vague language Human texts are full of vague language Ex: There is a momentary bradycardiaWhat does “momentary” mean?Our models of this are very crude, need to be improved!
37Realisation Last NLG stage Generate actual text, once choices made Use Aberdeen simplenlg packageWill not further discuss here
38BT45 EvaluationShowed 35 medical professionals 24 scenarios in 3 conditions (8 of each)Visualisation of medical dataTextual summary (manually written)Textual summary (from BT45)Asked to make a treatment decisionLimited to 3 minutesMeasured correctness (against gold stan)Off-ward, using historical dataSo no other knowledge about baby
39Free-text commentsComments were not solicited, but were recorded if madeMost important wereBetter layout (eg, bullet lists)Continuity (as mentioned before)
40Decision-Support results No sig difference in time takenAvg decision-quality (scale -1 to 1)Human texts: 0.39Computer texts: 0.34Visualisation: 0.33Human sig better than comp, visualNo sig diff comp, visual
41Results by subject type Analysis by type of subjectsHuman texts especially good for junior nurses (ie, least experienced subjects)
42Results by scenario Each scenario had a main target action 8 different onesComputer texts as good as human texts for five of these; worse for threeNo action, manage temperature, monitor equipmentThese relate to specific problems in the system, which can be fixed
43Target Actions with Poor Perf No action: Needs high-level summary, not blow-by-blow event descriptionManage Temperature: Two temp channels, need to describe togetherMonitor equipment: Need to mention (not ignore) sensor artefacts
44SummaryGood performance with human texts shows textual presentation is effectiveAlso seen in previous studyBabytalk as good as visualisation, could make better by addressing above issuesEven now giving users BabyTalk text as supplement to visualisations could help
45Current Work BT-Nurse: shift summaries for nurses Use live data from current babiesEvaluate on ward, using babies that subjects (nurses) actually looking afterFocus on info relevant to nurse shift planning, not real-time decision supportLonger time period (12 hrs)Need more sensor abstractionLonger texts (multi-page)
46Current Work BT-Family: information for parents Estimate how stressed parents are, use this to control content, phrasingHigh stress means less contentRelate to Sheffield work on personality??Express information in language which parents can understand, not medicalese
47Current Work BT-Clan: Information for friends, family Social networking perspective: encourage useful support, minimise hassle of dealing with numerous inquiriesParents decide what to tell peopleIntentional deceit: if granny is frail, don’t tell her bad newsInfo about parents as well as baby
48Research agenda Detecting complex events in the data Integration with medical guidelinesBetter use of vague languageBetter storiesRole of text in interactive multimodal information presentation systemTry in domain of assisted living