Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University.

Similar presentations


Presentation on theme: "Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University."— Presentation transcript:

1 Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University

2 Outline Background: the CLEF project Patient records as data-encoded patient histories Role of NLG in CLEF Intuitive querying with natural language Generating tailored reports from CLEF data

3 Background: the CLEF project CLEF (Clinical E-Science Framework) is an MRC- funded project aiming at providing a repository of well organised data-encoded clinical histories Aim: to provide the framework for a new type of medical research: in silico experiments Partners: NLP: OU, Sheffield Medical informatics: Manchester Electronic Health Records: Royal Marsden Hospital, UCL Privacy/confidentiality: Cambridge

4 Collect clinical information from multiple sites Analyse, structure and integrate it Make it available, using GRID tools To authorised clinicians and e- Health scientists In a secure and ethical collaborative framework GRID

5 The CLEF repository Chronicle Repository Organised data on individual patients Data from : Referral letters Review notes Lab results Nurse notes Hospital admission notes Hospital discharge notes Treatment notes Surgery reports

6 The CLEF Chronicle Representing the story of a patient over time

7 time The story of an illness Human: 1382 Mass: 1666 locus Pain: 5735 locus Radio: 1812 plans Chemo: 6502 plans treats locus target attends Ulcer: 1945 finding Cancer: 1914 finding Breast: 1492 locus Clinic: 4096 reason Biopsy: 1066 reason Clinic: 1024 plans Clinic: 2010 plans reason

8 000 abnormality 457 23425 12002 33201 33511 050metast atic lymphnode count 449 23425 11996 33201 33511 00oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve 1cancer449 23425 11993 33201 33511 00oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve stage1cancer449 23425 11989 33201 33511 000 abnormality 446 23425 11984 33201 33511 000 enlargement 446 23425 11982 33201 33511 000 enlargement 446 23425 11980 33201 33511 000lymphaden opathy 446 23425 11979 33201 33511 000recurr ent cancer446 23425 11978 33201 33511 000 abnormality 446 23425 11959 33201 33511 000abnormality446 23425 11955 33201 33511 00oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve cancer443 23425 11948 33201 33511 000cancer287 23425 11944 33201 33511 000cancer131 23425 11940 33201 33511 00oestrogen receptor +ve invasive tubular adeno 05.8BRCA 1 +ve primary cancer123425 11936 33201 33511 NodesIn volved NodesCo unted TumourMark er HistologyGrade mmSi ze Genot ype Clinical Course Existe nce StatusName EventEn dDate EventSta rtDate IDSimID Problems ~200 unsuccessfulcompletedrelapse treatment package34729823425124753320133512 completedchemotherapy cycle225 23425123833320133512 completedchemotherapy cycle224 23425123823320133512 completedpacked red cell transfusion222 23425123813320133512 deferredchemotherapy cycle222 23425123793320133512 completedchemotherapy cycle221 23425123783320133512 completedpacked red cell transfusion219 23425123773320133512 deferredchemotherapy cycle219 23425123753320133512 completedchemotherapy cycle217 23425123733320133512 completedchemotherapy cycle216 23425123503320133512 completedradiotherapy cycle214 23425123493320133512 completedradiotherapy cycle213 23425123483320133512 completedchemotherapy course22521823425123173320133512 completedradiotherapy course21521123425123163320133512 incomplete excisioncompletedradical mastectomy197 23425122903320133512 successfulcompletedprimary treatment package20519723425122873320133512 startedhormone anatagonist therapy044923425119973320133512 complete excisioncompletedlumpectomy449 23425119903320133512 successfulcompletedprimary treatment package45744923425119873320133512 OutcomeStatusNameEventEndDateEventStartDateIDSimID Interventions 15 completedexamination465 23425120313320133511 completedtesting465 23425120223320133511 completedXray465 23425120213320133511 completedexamination457 23425120133320133511 completedexamination457 23425120123320133511 completedexamination457 23425120103320133511 completedtesting457 23425120013320133511 completedXray457 23425120003320133511 completedexcision biopsy449 23425119943320133511 completedhistopathology449 23425119923320133511 completedexcision biopsy449 23425119913320133511 completedcancer staging449 23425119883320133511 completedexamination446 23425119763320133511 completedexamination446 23425119743320133511 completedexamination446 23425119733320133511 completedexamination446 23425119713320133511 completedtesting446 23425119533320133511 completedXray446 23425119513320133511 completedXray443 23425119473320133511 completedXray287 23425119433320133511 completedXray131 23425119393320133511 StatusNameEventEndDateEventStartDateIDSimID Investigations ~100 dailyepirubicin23425128183320133512 dailydoxorubicin23425124793320133512 daily5-fluorouracil23425123203320133512 dailycyclophosphamide23425123193320133512 dailyepirubicin23425123183320133512 RegimeNameIDSimID Drugs ~5 clinicmammography screeningscheduled0023425122293320133511 clinicmammography screeningcompleted841 23425122223320133511 clinicfollow upcompleted737 23425121963320133511 clinicfollow upcompleted633 23425121523320133511 clinicfollow upcompleted545 23425121083320133511 clinicfollow upcompleted489 23425120643320133511 clinicfollow upcompleted465 23425120203320133511 clinicinitial treatment planningcompleted449 23425119863320133511 clinicmammography screeningcompleted443 23425119463320133511 clinicmammography screeningcompleted131 23425119383320133511 LocationTypeStatusEventEndDateEventStartDateIDSimID Consults ~10 Loci bone metabolism23479114143322572593 Lbrain23479113193322572593 Llung23479112943322572593 Rlung23479112923322572593 brain23479112683322572593 Raxilla23479110903322572593 spleen23479110723322572593 liver23479110703322572593 abdomen23479110653322572593 Raxillary lymphnodes23479110623322572593 ESR concentration23479110603322572593 Creatinine concentration23479110583322572593 Alkaline Phosphatase concentration23479110563322572593 Bilirubin concentration23479110543322572593 GGT concentration23479110523322572593 platelet count23479110503322572593 leucocyte count23479110483322572593 haemoglobin concentration23479110463322572593 blood23479110443322572593 chest23479110423322572593 Rbreast23479110363322572593 LateralityNameIDSimID ~20~600 2342511955PROBLEMHAS_FINDING2342511953INVESTIGATION3320133511 2342511954LOCUSHAS_TARGET2342511953INVESTIGATION3320133511 2342511950CONSULTRECOMMENDED_BY2342511953INVESTIGATION3320133511 2342511936PROBLEMINDICATED_BY2342511953INVESTIGATION3320133511 PATIENTHAS_LOCUS2342511952LOCUS3320133511 2342511950CONSULTRECOMMENDED_BY2342511951INVESTIGATION3320133511 2342511936PROBLEMINDICATED_BY2342511951INVESTIGATION3320133511 2342511985CONSULTARRANGED2342511950CONSULT3320133511 2342511937LOCUSHAS_LOCUS2342511948PROBLEM3320133511 2342511948PROBLEMHAS_FINDING2342511947INVESTIGATION3320133511 2342511949CONSULTARRANGED2342511946CONSULT3320133511 2342511937LOCUSHAS_LOCUS2342511944PROBLEM3320133511 2342511944PROBLEMHAS_FINDING2342511943INVESTIGATION3320133511 2342511937LOCUSHAS_TARGET2342511943INVESTIGATION3320133511 2342511945CONSULTARRANGED2342511942CONSULT3320133511 2342511937LOCUSHAS_LOCUS2342511940PROBLEM3320133511 2342511940PROBLEMHAS_FINDING2342511939INVESTIGATION3320133511 2342511937LOCUSHAS_TARGET2342511939INVESTIGATION3320133511 2342511941CONSULTARRANGED2342511938CONSULT3320133511 PATIENTHAS_LOCUS2342511937LOCUS3320133511 2342511937LOCUSHAS_LOCUS2342511936PROBLEM3320133511 Item2IDItem2TypeRelationItem1IDItem1TypeSimID Relations A typical cancer patient

9 The role of NLG an intuitive query interface to provide efficient access to aggregated data-encoded patient histories for: Assisting in diagnosis and treatment Identifying patterns in treatment Selecting subjects for clinical trials generating reports from the data-encoded histories, for clinicians to use at the point of care.

10 Intuitive querying of the CLEF repository

11 What does the CLEF database provide Evidence from about 20,000 patient records, comprising 3.5 million record components (about 5GB of data). These are all in the area of cancer. 162 queriable fields various text-only records (non-queriable) Two types of data: Structured Extracted from narratives by IE Queriable data is encoded according to various medical terminologies (SNOMED, ICD, UMLS) There are approximately 19,500 different medical codes currently used in the database (a relatively small subset of SNOMED and ICD)

12 Queriable data Structured data: Demographics: Age, gender, postal district, ethnical group, occupation Laboratory findings: 32 types of haematology findings 51 types of chemistry findings Cytology reports Histopathology reports Imaging studies: Radiology procedure, site, diagnosis, morphology, topography, report, indication, department Treatments: Prescription drugs Chemotherapy protocol IV chemotherapy Radiotherapy Surgical procedures Diagnoses Clinical diagnosis Cause(s) of death Data extracted from narratives

13 Query interface requirements Designed for: casual and moderate users, who are familiar with the semantic domain of the repository but not with its technical implementation Typically clinicians or medical researchers Should be able to: Allow the construction of complex queries with nested structures and temporal expressions Minimise the risk of ambiguities Offer good coverage of the data types in the CLEF database Should be used with: Minimal training No prior knowledge of medical terminologies, formal querying languages, databases

14 Typical queries How many patients with AML have had a normal count after two cycles of treatment? How many patients with primary breast cancer have relapsed in the last five years? What is the median time between first drug treatment for metastatic breast cancer and death? In breast cancer patients, what is the incidence of lymphoedema of the arm that persists more than two years after primary surgical treatment? What is the average number of x-rays for patients with prostate cancer? What is the average time between first treatment for cervical cancer and death for patients aged less than 60 at death compared with those aged over 60? How many patients between the ages of 40 and 60 when they were first diagnosed with lung cancer had a platelet count higher than 300 but a white cell count lower than 3 before the 4th cycle of any course of chemotherapy they received during treatment?

15 Querying alternatives SQL: Not appropriate for the typical CLEF user Requires deep knowledge of the database structure and content, medical terminologies used in the database Graphical interfaces: Have to cope with large number of parameters Nested structures and temporal restrictions are difficult to express Natural Language interfaces: More natural and more expressive than formal querying languages, but… Sensitive to errors in composition, spelling, vocabulary Normally understand only a subset of natural language Complex queries are difficult to process It is difficult to trace the source of errors in the result

16 The CLEF approach Similar to Natural Language interfaces, however the user edits the conceptual meaning of a query instead of its surface text Allows users to easily construct non-ambiguous queries Guides the users towards constructing correct queries only (queries compatible with the content of the database) It is semi-database independent but very domain specific Based on the Conceptual Authoring (aka WYSIWYM ) technique (Power and Scott, 1998) The query is presented to the user as an interactive text, and it is edited by making selections on various components of the query Each selection triggers a text re-generation process which results into a new feedback text containing the selection the user made

17 Query editing

18 Modelling queries There are 4 distinct sections of a query: A description of the subjects (in terms of demographics information and basic diagnosis) A description of treatments that the subjects received A description of laboratory findings An outcome section (what do we want from the group of patients we have just described) Each query element can be expressed as a conjunction or disjunction of same-type query elements, e.g.,: Cancer of the breast and of the lung Patients who received chemotherapy and radiotherapy Some query elements can be temporally related to each other, e.g.,: Patients who received chemotherapy within 5 months of surgery Patients alive 5 years after the diagnosis

19 Constraining user choices At each step, users are only given correct choices Choices are context dependent Patients diagnosed with [some cancer] in [some body part] User selects [some cancer] => squamous cell carcinoma The interface restricts the choices available for [some body part] to those sites where squamous cell carcinoma can develop

20 Dealing with ambiguities Once a query is constructed, there is only one way it can be interpreted – there is no disambiguation task to be performed … but users may be misled into constructing a different query than they intend to

21 Answer generation The answer set consists of an age/gender breakdown of the patients that fulfil the query requirements Each additional clinical feature is combined with the age/gender breakdown to provide more detailed information 3 types of rendering: Text Charts Table

22 Evaluation Research questions: Can the WYSIWYM query formulation method be easily learned by users of CLEF? Is it easier to formulate CLEF queries in SQL or with the WYSIWYM query formulation method? Are the interactive feedback texts ambiguous?

23 Evaluation results show that… The CLEF Conceptual Authoring query interface works! The method is easily acquired. Investigation shows that it is much easier to use than current alternatives (viz. SQL ). The feedback texts tend to be easily understood It is a viable solution to the querying the CLEF repository. However ….

24 Unresolved issues Are the queries we currently support really the ones users will want to ask? Does the query interface provide sufficient data coverage?

25 Generating reports from the CLEF repository

26 The context We aim at generating reports from the data- encoded Electronic Patient Records Our reports are aimed at clinicians for use at the point of care Various types of report work on the same input (roughly the same content) but express information from different viewpoints We address the problem of conceptual restatement in generating summarised reports

27 Typical input 000 abnormality 457 23425 12002 33201 33511 050metast atic lymphnode count 449 23425 11996 33201 33511 00oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve 1cancer449 23425 11993 33201 33511 00oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve stage1cancer449 23425 11989 33201 33511 000 abnormality 446 23425 11984 33201 33511 000 enlargement 446 23425 11982 33201 33511 000 enlargement 446 23425 11980 33201 33511 000lymphaden opathy 446 23425 11979 33201 33511 000recurr ent cancer446 23425 11978 33201 33511 000 abnormality 446 23425 11959 33201 33511 000abnormality446 23425 11955 33201 33511 00oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve cancer443 23425 11948 33201 33511 000cancer287 23425 11944 33201 33511 000cancer131 23425 11940 33201 33511 00oestrogen receptor +ve invasive tubular adeno 05.8BRCA 1 +ve primary cancer123425 11936 33201 33511 NodesIn volved NodesCo unted TumourMark er HistologyGrade mmSi ze Genot ype Clinical Course Existe nce StatusName EventEn dDate EventSta rtDate IDSimID Problems ~200 unsuccessfulcompletedrelapse treatment package34729823425124753320133512 completedchemotherapy cycle225 23425123833320133512 completedchemotherapy cycle224 23425123823320133512 completedpacked red cell transfusion222 23425123813320133512 deferredchemotherapy cycle222 23425123793320133512 completedchemotherapy cycle221 23425123783320133512 completedpacked red cell transfusion219 23425123773320133512 deferredchemotherapy cycle219 23425123753320133512 completedchemotherapy cycle217 23425123733320133512 completedchemotherapy cycle216 23425123503320133512 completedradiotherapy cycle214 23425123493320133512 completedradiotherapy cycle213 23425123483320133512 completedchemotherapy course22521823425123173320133512 completedradiotherapy course21521123425123163320133512 incomplete excisioncompletedradical mastectomy197 23425122903320133512 successfulcompletedprimary treatment package20519723425122873320133512 startedhormone anatagonist therapy044923425119973320133512 complete excisioncompletedlumpectomy449 23425119903320133512 successfulcompletedprimary treatment package45744923425119873320133512 OutcomeStatusNameEventEndDateEventStartDateIDSimID Interventions 15 completedexamination465 23425120313320133511 completedtesting465 23425120223320133511 completedXray465 23425120213320133511 completedexamination457 23425120133320133511 completedexamination457 23425120123320133511 completedexamination457 23425120103320133511 completedtesting457 23425120013320133511 completedXray457 23425120003320133511 completedexcision biopsy449 23425119943320133511 completedhistopathology449 23425119923320133511 completedexcision biopsy449 23425119913320133511 completedcancer staging449 23425119883320133511 completedexamination446 23425119763320133511 completedexamination446 23425119743320133511 completedexamination446 23425119733320133511 completedexamination446 23425119713320133511 completedtesting446 23425119533320133511 completedXray446 23425119513320133511 completedXray443 23425119473320133511 completedXray287 23425119433320133511 completedXray131 23425119393320133511 StatusNameEventEndDateEventStartDateIDSimID Investigations ~100 dailyepirubicin23425128183320133512 dailydoxorubicin23425124793320133512 daily5-fluorouracil23425123203320133512 dailycyclophosphamide23425123193320133512 dailyepirubicin23425123183320133512 RegimeNameIDSimID Drugs ~5 clinicmammography screeningscheduled0023425122293320133511 clinicmammography screeningcompleted841 23425122223320133511 clinicfollow upcompleted737 23425121963320133511 clinicfollow upcompleted633 23425121523320133511 clinicfollow upcompleted545 23425121083320133511 clinicfollow upcompleted489 23425120643320133511 clinicfollow upcompleted465 23425120203320133511 clinicinitial treatment planningcompleted449 23425119863320133511 clinicmammography screeningcompleted443 23425119463320133511 clinicmammography screeningcompleted131 23425119383320133511 LocationTypeStatusEventEndDateEventStartDateIDSimID Consults ~10 Loci bone metabolism23479114143322572593 Lbrain23479113193322572593 Llung23479112943322572593 Rlung23479112923322572593 brain23479112683322572593 Raxilla23479110903322572593 spleen23479110723322572593 liver23479110703322572593 abdomen23479110653322572593 Raxillary lymphnodes23479110623322572593 ESR concentration23479110603322572593 Creatinine concentration23479110583322572593 Alkaline Phosphatase concentration23479110563322572593 Bilirubin concentration23479110543322572593 GGT concentration23479110523322572593 platelet count23479110503322572593 leucocyte count23479110483322572593 haemoglobin concentration23479110463322572593 blood23479110443322572593 chest23479110423322572593 Rbreast23479110363322572593 LateralityNameIDSimID ~20~600 2342511955PROBLEMHAS_FINDING2342511953INVESTIGATION3320133511 2342511954LOCUSHAS_TARGET2342511953INVESTIGATION3320133511 2342511950CONSULTRECOMMENDED_BY2342511953INVESTIGATION3320133511 2342511936PROBLEMINDICATED_BY2342511953INVESTIGATION3320133511 PATIENTHAS_LOCUS2342511952LOCUS3320133511 2342511950CONSULTRECOMMENDED_BY2342511951INVESTIGATION3320133511 2342511936PROBLEMINDICATED_BY2342511951INVESTIGATION3320133511 2342511985CONSULTARRANGED2342511950CONSULT3320133511 2342511937LOCUSHAS_LOCUS2342511948PROBLEM3320133511 2342511948PROBLEMHAS_FINDING2342511947INVESTIGATION3320133511 2342511949CONSULTARRANGED2342511946CONSULT3320133511 2342511937LOCUSHAS_LOCUS2342511944PROBLEM3320133511 2342511944PROBLEMHAS_FINDING2342511943INVESTIGATION3320133511 2342511937LOCUSHAS_TARGET2342511943INVESTIGATION3320133511 2342511945CONSULTARRANGED2342511942CONSULT3320133511 2342511937LOCUSHAS_LOCUS2342511940PROBLEM3320133511 2342511940PROBLEMHAS_FINDING2342511939INVESTIGATION3320133511 2342511937LOCUSHAS_TARGET2342511939INVESTIGATION3320133511 2342511941CONSULTARRANGED2342511938CONSULT3320133511 PATIENTHAS_LOCUS2342511937LOCUS3320133511 2342511937LOCUSHAS_LOCUS2342511936PROBLEM3320133511 Item2IDItem2TypeRelationItem1IDItem1TypeSimID Relations

28 time The story of an illness Human: 1382 Mass: 1666 locus Pain: 5735 locus Radio: 1812 plans Chemo: 6502 plans treats locus target attends Ulcer: 1945 finding Cancer: 1914 finding Breast: 1492 locus Clinic: 4096 reason Biopsy: 1066 reason Clinic: 1024 plans Clinic: 2010 plans reason

29 Why are textual reports needed? Clinicians and other health professionals use patient health summaries at the point of care, where time is a critical resource Reports provide quick access to an overview of a patients medical history Typically, an electronic patient record contains around 1000 messages Even structured, this volume of data is very large Access to relevant information about particular patients is difficult Textual reports: are easy to read and understand can be customised to the type of information needed provide a quick way of identifying errors in the patient record alleviate the need to know in detail the structure of the underlying database

30 Why are paraphrases needed? Alternative views of the patient record, i.e., Reports from various viewpoints: Full chronological reports Summaries of investigations, interventions, treatments Same content, different textual representation Potted summaries also important (30- second overview of patients history)

31 Content selection Two notions: Spine events: the main concepts in the summary (depending on user- defined type of summary) Skeleton events: linked to the spine by various relations Basic procedure: Step 1: group linked events into clusters and remove small clusters Typically, a small number of very large clusters and a small number of small clusters Small clusters are assumed not to be related to the main topic of the summary Step 2: Identify spine events according to the type of summary Longitudinal, Investigations, Interventions, Problems Step 3: Identify the skeleton events If (problem is spine event and investigation has_indication problem) then select investigation (unless already selected) Repeat step 2 a certain number of times (given by a threshold parameter)

32 Spine of Problem events

33 pain cancer breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Problem The patient identifies pain in the left breast. A lump in the breast is found through a mammogram. A biopsy performed on the breast reveals cancer in the left breast. The patient receives radiotherapy to treat the cancer. Skin ulceration develops in the left breast as a result of radiotherapy, which is treated with hyperbaric oxygenation.

34 pain breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Interventions Radiotherapy on the breast is initiated to treat cancer in the breast. A first radiotherapy cycle is performed. The radiotherapy causes skin ulceration. The patient receives hyperbaric oxygenation to treat the ulcer.

35 painbreast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Investigations A mammogram is performed because of pain in the left breast, which identifies a lump in the breast. A biopsy of the lump identifies cancer in the left breast.

36 pain cancer breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer pain breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Interventions Problem painbreast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Investigations

37 Discourse structuring Mostly given by relations in the EPR 19 different types of relations, which can be: Attributive: Problem has_locus Locus Rhetorical: Problem caused_by Intervention Attributive relations do not contribute to the discourse structure In a first step, events linked through attributive relations are combined: Message_Problem+Message_Locus => Message_Problem_Locus Messages are grouped according to type of summary: Longitudinal: events occurring in the same week should be grouped together and further grouped into years Logical: arrange chronologically and then group similar events (e.g., liver panels, screening consults)

38 Discourse structuring Within each group: link messages by discourse relations inferred from EPR relations: Cause, Result, Sequence assume a List relation if no relation specified Between groups: If all events in one group are linked to events in another group by some EPR relation, link groups through the corresponding discourse relation Otherwise, assume a List relation

39 Aggregation Problems: Problem_1:name HAS_LOCUS Locus_1 Problem_2:name HAS_LOCUS Locus_2 Enlargement of the liver + Enlargement of the spleen => Enlargement of the liver and/but not of the spleen Investigations: Investigation_1:name HAS_INDICATION Problem_1 HAS_LOCUS Locus_1 Investigation_2:name HAS_INDICATION Problem_2 HAS_LOCUS Locus_2 Examination of the abdomen revealed no enlargement of the liver Examination of the lymphnodes revealed no lymphadenopathy => Examination revealed no enlargement of the liver and no lymphadenopathy Text structuring Problem_3 HAS_LOCUS {Locus_1, Locus_2} Investigation_3 HAS_INDICATION {Problem_1, Problem_2}

40 Aggregation Interventions Intervention_1 PART_OF Intervention_0 Intervention_2 PART_OF Intervention_0 [ID01]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID02]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID03]Chemotherapy cycle PART_OF [ID0]Chemotherapy 3 chemotherapy cycles Ellipsis Examination of the left breast revealed no recurrent cancer in the left breast => Examination of the left breast revealed no recurrent cancer Text structuring {count} Intervention_1

41 Text structuring Events can be compacted according to domain- specific rules: Clinical examination is: examination of the liver, examination of the spleen, examination of the abdomen Clinical examination was normal Clinical examination was normal apart from an enlargement of the spleen Clinical examination revealed enlargement of the spleen Liver panel is: billirubin concentration, ESR concentration, GCT concentration The liver panel was in the normal range (apart from a very high level of GCT)

42 Maintaining the thread of discourse Textual representation should reflect the relative importance of events At discourse level: spine concepts are preferably realised in nuclear units and skeleton events in satellite units At sentence level: spine events are assigned salient syntactical roles The status of an event of being on the spine or on the skeleton determines its realisation as a sentence, a main or subordinate clause, phrase

43 Typical output of the NL generator Year 1 Week 0 A mammography screening was scheduled at the clinic. Week 1 Primary cancer of the right breast; histopathology: invasive tubular adenocarcinoma. YEAR 2 Week 131 Xray revealed no cancer of the right breast. YEAR 5 Week 287 Xray revealed no cancer of the right breast. YEAR 8 Week 443 Xray revealed cancer of the right breast. Week 446 Examination (indicated by primary cancer of the right breast) revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing (indicated by primary cancer of the right breast) revealed no abnormality of the haemoglobin concentration and no abnormality of the leucocyte count. An Xray (indicated by primary cancer of the right breast) was performed. Very high level of the ESR concentration. Very high level of the Creatinine concentration. Very high level of the Alkaline Phosphatase concentration. Very high level of the Bilirubin concentration. Very high level of the GGT concentration. No abnormality of the platelet count. Week 449 An initial treatment planning was completed at the clinic. Excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Cancer staging revealed stage1 cancer. Hormone anatagonist therapy was started to treat primary cancer of the right breast. Lumpectomy was performed on the breast to treat primary cancer of the right breast. Primary treatment package was started to treat primary cancer of the right breast. …………………. YEAR 17 Week 893 Xray revealed no cancer of the right breast. Long chronological report

44 Typical output of the NL generator Focus on Problems In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 446, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes revealed by examination. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was initiated to treat primary cancer of the right breast. In weeks 457 to 737, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast. Compact reports Focus on Interventions In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was started to treat primary cancer of the right breast. Focus on Investigations In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 446, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. In weeks 457 to 737, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast

45 Conclusions We produce summarised reports that are loose paraphrases of each other at the discourse level Although reports work on the same input, the content may vary slightly with the type of report There is little paraphrasing at lower levels (lexical, sentential), mainly resulting from realisation of rhetorical relations aggregation and generalisation

46 Ongoing work on report generation Add domain-specific knowledge to improve content selection Some events are become important depending on context Change the (sub-)domain Test if the generation method is easily portable Link NLG to IR to improve IR Produce reports for patients

47 Summary and Conclusions CLEF is now entering the integration phase, moving towards testing and deployment Major emphases at this point are on privacy and security Informing patients a major thread for future work. Integrating IE and NLG

48 Thank You! Collaborators: Catalina Hallett Richard Power

49

50 Evaluation procedure Subjects: We tested the performance of 15 subjects. Subjects had a range of expertise in the CLEF domain -- from expert (oncologist) to novice (computer scientist), but most subjects had some medical training. Subjects had no previous experience with the CLEF WYSIWYM query interface, but most were aware of its fundamental principles. Methodology: Subjects were given a set of four fixed queries to formulate using the CLEF WYSIWYM query interface. The queries were expressed in language as different as possible from the language in the query interface. Each subject received the queries in a different order.

51 Evaluation – data analysis We recorded the time taken to compose each query. the number of operations used for constructing a query and compared it with the optimal number of operations (pre-computed). We analysed whether performance, as indicated by Speed Efficiency improves with training (experience).

52 Evaluation results Time to completion Subjects performance improved dramatically with experience. After their first experience of composing a query, subjects completion time halved, and asymptotes at that level.

53 Evaluation results Performance over time: performance normalised over complexity After just one go with the CLEF interface, subjects are highly proficient in their ability to compose complex queries. By the time they get to their fourth query, subjects performance is almost perfect. Mean : 0.18 Optimal operation = min # of operations needed to compose the query perfectly. This is a measure of the complexity of the query.

54 Evaluation – comparison with SQL Very small scale experiment Two subjects: with expert knowledge of the structure, organisation and content of the CLEF database highly skilled users of SQL with minimal experience with WYSIWYM were given access to the SNOMED and ICD codes required to build the SQL Each subject composed a query first in the CLEF WYSIWYM Interface and then in SQL

55 Evaluation – comparison with SQL Subject 1 – Query 1 WYSIWYM: 2.3 mins SQL: 8.5 mins (incomplete) Subject 2 – Query 2 WYSIWYM: 4.5 mins SQL:12 mins (incomplete) Even with a slowly reacting interface, the subjects were much faster composing queries in WYSIWYM than in SQL

56 Are the feedback texts ambiguous to the users Identified 6 types of ambiguity 4 examples of each, with forced-choice judgements by 15 subjects Random jugements would give a score of 33% Results show 84% correct judgements

57 summary patient records for clinicians and medical researchers repositorysummarisation for patients summary patient records linear text animated dialoguehypertext

58 Sample report for Clinicians In the weeks 195 to 196, self examination revealed lump of the right breast. In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast.

59 Sample report for Clinicians In the weeks 195 to 196, self examination revealed lump of the right breast. In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast. …

60 Sample report for Patients You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination and you found that you had a lump in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 4th you did another self examination and you found that you still had a lump in your right breast. On October 11th you had a radical mastectomy to treat cancer in your right breast. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. Cancer is a tumour that tends to spread, both locally and to other parts of the body. …

61 Cancer is a tumour that tends to spread, both locally and to other parts of the body. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 4th you did another self examination. you found that you had a lump in your right breast. On October 11th you had a radical mastectomy. to treat cancer in your right breast. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. SEQUENCE HAS-FINDING SEQUENCE MOTIVATION EXPLANATION Presenting patient records in hypertext: dividing the text into related units

62 Cancer is a tumour that tends to spread, both locally and to other parts of the body. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. SEQUENCE A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 4th you did another self examination. SEQUENCE you found that you had a lump in your right breast. HAS-FINDING SEQUENCE On October 11th you had a radical mastectomy. to treat cancer in your right breast. MOTIVATION A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. EXPLANATION Presenting patient records in hypertext: giving graphical attributes to the text units

63 you found that you had a lump in your right breast. The radical mastectomy was done to treat cancer in your right breast. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. On October 4th you did another self examination. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 11th you had a radical mastectomy. Presenting patient records in hypertext: using animation to represent discourse patterns dynamically Cancer is a tumour that tends to spread, both locally and to other parts of the body. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.

64 You had a consultation with your doctor on September 20th 1993.

65 On September 27th you did a self examination.

66 You found that you had a lump in your right breast. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination.

67 You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. You found that you had a lump in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.

68 You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination.

69 A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination. On October 11th you had a radical mastectomy.

70 A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination. On October 11th you had a radical mastectomy. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.

71 The radical mastectomy was done to treat cancer in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination. On October 11th you had a radical mastectomy. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.

72 Monologues/Dialogues Monologue Autonomous agent reads the generated report Aims: accessibility, education (not translation) Dialogue Report is generated as a script that 2 agents act out Aims: accessibility, vicarious learning Example (video clip)


Download ppt "Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University."

Similar presentations


Ads by Google