Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University.

Similar presentations


Presentation on theme: "Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University."— Presentation transcript:

1 Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University

2 Outline Background: the CLEF project Patient records as data-encoded patient histories Role of NLG in CLEF Intuitive querying with natural language Generating tailored reports from CLEF data

3 Background: the CLEF project CLEF (Clinical E-Science Framework) is an MRC- funded project aiming at providing a repository of well organised data-encoded clinical histories Aim: to provide the framework for a new type of medical research: in silico experiments Partners: NLP: OU, Sheffield Medical informatics: Manchester Electronic Health Records: Royal Marsden Hospital, UCL Privacy/confidentiality: Cambridge

4 Collect clinical information from multiple sites Analyse, structure and integrate it Make it available, using GRID tools To authorised clinicians and e- Health scientists In a secure and ethical collaborative framework GRID

5 The CLEF repository Chronicle Repository Organised data on individual patients Data from : Referral letters Review notes Lab results Nurse notes Hospital admission notes Hospital discharge notes Treatment notes Surgery reports

6 The CLEF Chronicle Representing the story of a patient over time

7 time The story of an illness Human: 1382 Mass: 1666 locus Pain: 5735 locus Radio: 1812 plans Chemo: 6502 plans treats locus target attends Ulcer: 1945 finding Cancer: 1914 finding Breast: 1492 locus Clinic: 4096 reason Biopsy: 1066 reason Clinic: 1024 plans Clinic: 2010 plans reason

8 000 abnormality metast atic lymphnode count oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve 1cancer oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve stage1cancer abnormality enlargement enlargement lymphaden opathy recurr ent cancer abnormality abnormality oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve cancer cancer cancer oestrogen receptor +ve invasive tubular adeno 05.8BRCA 1 +ve primary cancer NodesIn volved NodesCo unted TumourMark er HistologyGrade mmSi ze Genot ype Clinical Course Existe nce StatusName EventEn dDate EventSta rtDate IDSimID Problems ~200 unsuccessfulcompletedrelapse treatment package completedchemotherapy cycle completedchemotherapy cycle completedpacked red cell transfusion deferredchemotherapy cycle completedchemotherapy cycle completedpacked red cell transfusion deferredchemotherapy cycle completedchemotherapy cycle completedchemotherapy cycle completedradiotherapy cycle completedradiotherapy cycle completedchemotherapy course completedradiotherapy course incomplete excisioncompletedradical mastectomy successfulcompletedprimary treatment package startedhormone anatagonist therapy complete excisioncompletedlumpectomy successfulcompletedprimary treatment package OutcomeStatusNameEventEndDateEventStartDateIDSimID Interventions 15 completedexamination completedtesting completedXray completedexamination completedexamination completedexamination completedtesting completedXray completedexcision biopsy completedhistopathology completedexcision biopsy completedcancer staging completedexamination completedexamination completedexamination completedexamination completedtesting completedXray completedXray completedXray completedXray StatusNameEventEndDateEventStartDateIDSimID Investigations ~100 dailyepirubicin dailydoxorubicin daily5-fluorouracil dailycyclophosphamide dailyepirubicin RegimeNameIDSimID Drugs ~5 clinicmammography screeningscheduled clinicmammography screeningcompleted clinicfollow upcompleted clinicfollow upcompleted clinicfollow upcompleted clinicfollow upcompleted clinicfollow upcompleted clinicinitial treatment planningcompleted clinicmammography screeningcompleted clinicmammography screeningcompleted LocationTypeStatusEventEndDateEventStartDateIDSimID Consults ~10 Loci bone metabolism Lbrain Llung Rlung brain Raxilla spleen liver abdomen Raxillary lymphnodes ESR concentration Creatinine concentration Alkaline Phosphatase concentration Bilirubin concentration GGT concentration platelet count leucocyte count haemoglobin concentration blood chest Rbreast LateralityNameIDSimID ~20~ PROBLEMHAS_FINDING INVESTIGATION LOCUSHAS_TARGET INVESTIGATION CONSULTRECOMMENDED_BY INVESTIGATION PROBLEMINDICATED_BY INVESTIGATION PATIENTHAS_LOCUS LOCUS CONSULTRECOMMENDED_BY INVESTIGATION PROBLEMINDICATED_BY INVESTIGATION CONSULTARRANGED CONSULT LOCUSHAS_LOCUS PROBLEM PROBLEMHAS_FINDING INVESTIGATION CONSULTARRANGED CONSULT LOCUSHAS_LOCUS PROBLEM PROBLEMHAS_FINDING INVESTIGATION LOCUSHAS_TARGET INVESTIGATION CONSULTARRANGED CONSULT LOCUSHAS_LOCUS PROBLEM PROBLEMHAS_FINDING INVESTIGATION LOCUSHAS_TARGET INVESTIGATION CONSULTARRANGED CONSULT PATIENTHAS_LOCUS LOCUS LOCUSHAS_LOCUS PROBLEM Item2IDItem2TypeRelationItem1IDItem1TypeSimID Relations A typical cancer patient

9 The role of NLG an intuitive query interface to provide efficient access to aggregated data-encoded patient histories for: Assisting in diagnosis and treatment Identifying patterns in treatment Selecting subjects for clinical trials generating reports from the data-encoded histories, for clinicians to use at the point of care.

10 Intuitive querying of the CLEF repository

11 What does the CLEF database provide Evidence from about 20,000 patient records, comprising 3.5 million record components (about 5GB of data). These are all in the area of cancer. 162 queriable fields various text-only records (non-queriable) Two types of data: Structured Extracted from narratives by IE Queriable data is encoded according to various medical terminologies (SNOMED, ICD, UMLS) There are approximately 19,500 different medical codes currently used in the database (a relatively small subset of SNOMED and ICD)

12 Queriable data Structured data: Demographics: Age, gender, postal district, ethnical group, occupation Laboratory findings: 32 types of haematology findings 51 types of chemistry findings Cytology reports Histopathology reports Imaging studies: Radiology procedure, site, diagnosis, morphology, topography, report, indication, department Treatments: Prescription drugs Chemotherapy protocol IV chemotherapy Radiotherapy Surgical procedures Diagnoses Clinical diagnosis Cause(s) of death Data extracted from narratives

13 Query interface requirements Designed for: casual and moderate users, who are familiar with the semantic domain of the repository but not with its technical implementation Typically clinicians or medical researchers Should be able to: Allow the construction of complex queries with nested structures and temporal expressions Minimise the risk of ambiguities Offer good coverage of the data types in the CLEF database Should be used with: Minimal training No prior knowledge of medical terminologies, formal querying languages, databases

14 Typical queries How many patients with AML have had a normal count after two cycles of treatment? How many patients with primary breast cancer have relapsed in the last five years? What is the median time between first drug treatment for metastatic breast cancer and death? In breast cancer patients, what is the incidence of lymphoedema of the arm that persists more than two years after primary surgical treatment? What is the average number of x-rays for patients with prostate cancer? What is the average time between first treatment for cervical cancer and death for patients aged less than 60 at death compared with those aged over 60? How many patients between the ages of 40 and 60 when they were first diagnosed with lung cancer had a platelet count higher than 300 but a white cell count lower than 3 before the 4th cycle of any course of chemotherapy they received during treatment?

15 Querying alternatives SQL: Not appropriate for the typical CLEF user Requires deep knowledge of the database structure and content, medical terminologies used in the database Graphical interfaces: Have to cope with large number of parameters Nested structures and temporal restrictions are difficult to express Natural Language interfaces: More natural and more expressive than formal querying languages, but… Sensitive to errors in composition, spelling, vocabulary Normally understand only a subset of natural language Complex queries are difficult to process It is difficult to trace the source of errors in the result

16 The CLEF approach Similar to Natural Language interfaces, however the user edits the conceptual meaning of a query instead of its surface text Allows users to easily construct non-ambiguous queries Guides the users towards constructing correct queries only (queries compatible with the content of the database) It is semi-database independent but very domain specific Based on the Conceptual Authoring (aka WYSIWYM ) technique (Power and Scott, 1998) The query is presented to the user as an interactive text, and it is edited by making selections on various components of the query Each selection triggers a text re-generation process which results into a new feedback text containing the selection the user made

17 Query editing

18 Modelling queries There are 4 distinct sections of a query: A description of the subjects (in terms of demographics information and basic diagnosis) A description of treatments that the subjects received A description of laboratory findings An outcome section (what do we want from the group of patients we have just described) Each query element can be expressed as a conjunction or disjunction of same-type query elements, e.g.,: Cancer of the breast and of the lung Patients who received chemotherapy and radiotherapy Some query elements can be temporally related to each other, e.g.,: Patients who received chemotherapy within 5 months of surgery Patients alive 5 years after the diagnosis

19 Constraining user choices At each step, users are only given correct choices Choices are context dependent Patients diagnosed with [some cancer] in [some body part] User selects [some cancer] => squamous cell carcinoma The interface restricts the choices available for [some body part] to those sites where squamous cell carcinoma can develop

20 Dealing with ambiguities Once a query is constructed, there is only one way it can be interpreted – there is no disambiguation task to be performed … but users may be misled into constructing a different query than they intend to

21 Answer generation The answer set consists of an age/gender breakdown of the patients that fulfil the query requirements Each additional clinical feature is combined with the age/gender breakdown to provide more detailed information 3 types of rendering: Text Charts Table

22 Evaluation Research questions: Can the WYSIWYM query formulation method be easily learned by users of CLEF? Is it easier to formulate CLEF queries in SQL or with the WYSIWYM query formulation method? Are the interactive feedback texts ambiguous?

23 Evaluation results show that… The CLEF Conceptual Authoring query interface works! The method is easily acquired. Investigation shows that it is much easier to use than current alternatives (viz. SQL ). The feedback texts tend to be easily understood It is a viable solution to the querying the CLEF repository. However ….

24 Unresolved issues Are the queries we currently support really the ones users will want to ask? Does the query interface provide sufficient data coverage?

25 Generating reports from the CLEF repository

26 The context We aim at generating reports from the data- encoded Electronic Patient Records Our reports are aimed at clinicians for use at the point of care Various types of report work on the same input (roughly the same content) but express information from different viewpoints We address the problem of conceptual restatement in generating summarised reports

27 Typical input 000 abnormality metast atic lymphnode count oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve 1cancer oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve stage1cancer abnormality enlargement enlargement lymphaden opathy recurr ent cancer abnormality abnormality oestrogen receptor +ve invasive tubular adeno 00BRCA 1 +ve cancer cancer cancer oestrogen receptor +ve invasive tubular adeno 05.8BRCA 1 +ve primary cancer NodesIn volved NodesCo unted TumourMark er HistologyGrade mmSi ze Genot ype Clinical Course Existe nce StatusName EventEn dDate EventSta rtDate IDSimID Problems ~200 unsuccessfulcompletedrelapse treatment package completedchemotherapy cycle completedchemotherapy cycle completedpacked red cell transfusion deferredchemotherapy cycle completedchemotherapy cycle completedpacked red cell transfusion deferredchemotherapy cycle completedchemotherapy cycle completedchemotherapy cycle completedradiotherapy cycle completedradiotherapy cycle completedchemotherapy course completedradiotherapy course incomplete excisioncompletedradical mastectomy successfulcompletedprimary treatment package startedhormone anatagonist therapy complete excisioncompletedlumpectomy successfulcompletedprimary treatment package OutcomeStatusNameEventEndDateEventStartDateIDSimID Interventions 15 completedexamination completedtesting completedXray completedexamination completedexamination completedexamination completedtesting completedXray completedexcision biopsy completedhistopathology completedexcision biopsy completedcancer staging completedexamination completedexamination completedexamination completedexamination completedtesting completedXray completedXray completedXray completedXray StatusNameEventEndDateEventStartDateIDSimID Investigations ~100 dailyepirubicin dailydoxorubicin daily5-fluorouracil dailycyclophosphamide dailyepirubicin RegimeNameIDSimID Drugs ~5 clinicmammography screeningscheduled clinicmammography screeningcompleted clinicfollow upcompleted clinicfollow upcompleted clinicfollow upcompleted clinicfollow upcompleted clinicfollow upcompleted clinicinitial treatment planningcompleted clinicmammography screeningcompleted clinicmammography screeningcompleted LocationTypeStatusEventEndDateEventStartDateIDSimID Consults ~10 Loci bone metabolism Lbrain Llung Rlung brain Raxilla spleen liver abdomen Raxillary lymphnodes ESR concentration Creatinine concentration Alkaline Phosphatase concentration Bilirubin concentration GGT concentration platelet count leucocyte count haemoglobin concentration blood chest Rbreast LateralityNameIDSimID ~20~ PROBLEMHAS_FINDING INVESTIGATION LOCUSHAS_TARGET INVESTIGATION CONSULTRECOMMENDED_BY INVESTIGATION PROBLEMINDICATED_BY INVESTIGATION PATIENTHAS_LOCUS LOCUS CONSULTRECOMMENDED_BY INVESTIGATION PROBLEMINDICATED_BY INVESTIGATION CONSULTARRANGED CONSULT LOCUSHAS_LOCUS PROBLEM PROBLEMHAS_FINDING INVESTIGATION CONSULTARRANGED CONSULT LOCUSHAS_LOCUS PROBLEM PROBLEMHAS_FINDING INVESTIGATION LOCUSHAS_TARGET INVESTIGATION CONSULTARRANGED CONSULT LOCUSHAS_LOCUS PROBLEM PROBLEMHAS_FINDING INVESTIGATION LOCUSHAS_TARGET INVESTIGATION CONSULTARRANGED CONSULT PATIENTHAS_LOCUS LOCUS LOCUSHAS_LOCUS PROBLEM Item2IDItem2TypeRelationItem1IDItem1TypeSimID Relations

28 time The story of an illness Human: 1382 Mass: 1666 locus Pain: 5735 locus Radio: 1812 plans Chemo: 6502 plans treats locus target attends Ulcer: 1945 finding Cancer: 1914 finding Breast: 1492 locus Clinic: 4096 reason Biopsy: 1066 reason Clinic: 1024 plans Clinic: 2010 plans reason

29 Why are textual reports needed? Clinicians and other health professionals use patient health summaries at the point of care, where time is a critical resource Reports provide quick access to an overview of a patients medical history Typically, an electronic patient record contains around 1000 messages Even structured, this volume of data is very large Access to relevant information about particular patients is difficult Textual reports: are easy to read and understand can be customised to the type of information needed provide a quick way of identifying errors in the patient record alleviate the need to know in detail the structure of the underlying database

30 Why are paraphrases needed? Alternative views of the patient record, i.e., Reports from various viewpoints: Full chronological reports Summaries of investigations, interventions, treatments Same content, different textual representation Potted summaries also important (30- second overview of patients history)

31 Content selection Two notions: Spine events: the main concepts in the summary (depending on user- defined type of summary) Skeleton events: linked to the spine by various relations Basic procedure: Step 1: group linked events into clusters and remove small clusters Typically, a small number of very large clusters and a small number of small clusters Small clusters are assumed not to be related to the main topic of the summary Step 2: Identify spine events according to the type of summary Longitudinal, Investigations, Interventions, Problems Step 3: Identify the skeleton events If (problem is spine event and investigation has_indication problem) then select investigation (unless already selected) Repeat step 2 a certain number of times (given by a threshold parameter)

32 Spine of Problem events

33 pain cancer breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Problem The patient identifies pain in the left breast. A lump in the breast is found through a mammogram. A biopsy performed on the breast reveals cancer in the left breast. The patient receives radiotherapy to treat the cancer. Skin ulceration develops in the left breast as a result of radiotherapy, which is treated with hyperbaric oxygenation.

34 pain breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Interventions Radiotherapy on the breast is initiated to treat cancer in the breast. A first radiotherapy cycle is performed. The radiotherapy causes skin ulceration. The patient receives hyperbaric oxygenation to treat the ulcer.

35 painbreast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Investigations A mammogram is performed because of pain in the left breast, which identifies a lump in the breast. A biopsy of the lump identifies cancer in the left breast.

36 pain cancer breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer pain breast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Interventions Problem painbreast radiotherapy cycle Hyperbaric oxygenation radiotherapy lump mammogram biopsy cancer ulcer Investigations

37 Discourse structuring Mostly given by relations in the EPR 19 different types of relations, which can be: Attributive: Problem has_locus Locus Rhetorical: Problem caused_by Intervention Attributive relations do not contribute to the discourse structure In a first step, events linked through attributive relations are combined: Message_Problem+Message_Locus => Message_Problem_Locus Messages are grouped according to type of summary: Longitudinal: events occurring in the same week should be grouped together and further grouped into years Logical: arrange chronologically and then group similar events (e.g., liver panels, screening consults)

38 Discourse structuring Within each group: link messages by discourse relations inferred from EPR relations: Cause, Result, Sequence assume a List relation if no relation specified Between groups: If all events in one group are linked to events in another group by some EPR relation, link groups through the corresponding discourse relation Otherwise, assume a List relation

39 Aggregation Problems: Problem_1:name HAS_LOCUS Locus_1 Problem_2:name HAS_LOCUS Locus_2 Enlargement of the liver + Enlargement of the spleen => Enlargement of the liver and/but not of the spleen Investigations: Investigation_1:name HAS_INDICATION Problem_1 HAS_LOCUS Locus_1 Investigation_2:name HAS_INDICATION Problem_2 HAS_LOCUS Locus_2 Examination of the abdomen revealed no enlargement of the liver Examination of the lymphnodes revealed no lymphadenopathy => Examination revealed no enlargement of the liver and no lymphadenopathy Text structuring Problem_3 HAS_LOCUS {Locus_1, Locus_2} Investigation_3 HAS_INDICATION {Problem_1, Problem_2}

40 Aggregation Interventions Intervention_1 PART_OF Intervention_0 Intervention_2 PART_OF Intervention_0 [ID01]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID02]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID03]Chemotherapy cycle PART_OF [ID0]Chemotherapy 3 chemotherapy cycles Ellipsis Examination of the left breast revealed no recurrent cancer in the left breast => Examination of the left breast revealed no recurrent cancer Text structuring {count} Intervention_1

41 Text structuring Events can be compacted according to domain- specific rules: Clinical examination is: examination of the liver, examination of the spleen, examination of the abdomen Clinical examination was normal Clinical examination was normal apart from an enlargement of the spleen Clinical examination revealed enlargement of the spleen Liver panel is: billirubin concentration, ESR concentration, GCT concentration The liver panel was in the normal range (apart from a very high level of GCT)

42 Maintaining the thread of discourse Textual representation should reflect the relative importance of events At discourse level: spine concepts are preferably realised in nuclear units and skeleton events in satellite units At sentence level: spine events are assigned salient syntactical roles The status of an event of being on the spine or on the skeleton determines its realisation as a sentence, a main or subordinate clause, phrase

43 Typical output of the NL generator Year 1 Week 0 A mammography screening was scheduled at the clinic. Week 1 Primary cancer of the right breast; histopathology: invasive tubular adenocarcinoma. YEAR 2 Week 131 Xray revealed no cancer of the right breast. YEAR 5 Week 287 Xray revealed no cancer of the right breast. YEAR 8 Week 443 Xray revealed cancer of the right breast. Week 446 Examination (indicated by primary cancer of the right breast) revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing (indicated by primary cancer of the right breast) revealed no abnormality of the haemoglobin concentration and no abnormality of the leucocyte count. An Xray (indicated by primary cancer of the right breast) was performed. Very high level of the ESR concentration. Very high level of the Creatinine concentration. Very high level of the Alkaline Phosphatase concentration. Very high level of the Bilirubin concentration. Very high level of the GGT concentration. No abnormality of the platelet count. Week 449 An initial treatment planning was completed at the clinic. Excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Cancer staging revealed stage1 cancer. Hormone anatagonist therapy was started to treat primary cancer of the right breast. Lumpectomy was performed on the breast to treat primary cancer of the right breast. Primary treatment package was started to treat primary cancer of the right breast. …………………. YEAR 17 Week 893 Xray revealed no cancer of the right breast. Long chronological report

44 Typical output of the NL generator Focus on Problems In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 446, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes revealed by examination. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was initiated to treat primary cancer of the right breast. In weeks 457 to 737, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast. Compact reports Focus on Interventions In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was started to treat primary cancer of the right breast. Focus on Investigations In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 446, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. In weeks 457 to 737, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast

45 Conclusions We produce summarised reports that are loose paraphrases of each other at the discourse level Although reports work on the same input, the content may vary slightly with the type of report There is little paraphrasing at lower levels (lexical, sentential), mainly resulting from realisation of rhetorical relations aggregation and generalisation

46 Ongoing work on report generation Add domain-specific knowledge to improve content selection Some events are become important depending on context Change the (sub-)domain Test if the generation method is easily portable Link NLG to IR to improve IR Produce reports for patients

47 Summary and Conclusions CLEF is now entering the integration phase, moving towards testing and deployment Major emphases at this point are on privacy and security Informing patients a major thread for future work. Integrating IE and NLG

48 Thank You! Collaborators: Catalina Hallett Richard Power

49

50 Evaluation procedure Subjects: We tested the performance of 15 subjects. Subjects had a range of expertise in the CLEF domain -- from expert (oncologist) to novice (computer scientist), but most subjects had some medical training. Subjects had no previous experience with the CLEF WYSIWYM query interface, but most were aware of its fundamental principles. Methodology: Subjects were given a set of four fixed queries to formulate using the CLEF WYSIWYM query interface. The queries were expressed in language as different as possible from the language in the query interface. Each subject received the queries in a different order.

51 Evaluation – data analysis We recorded the time taken to compose each query. the number of operations used for constructing a query and compared it with the optimal number of operations (pre-computed). We analysed whether performance, as indicated by Speed Efficiency improves with training (experience).

52 Evaluation results Time to completion Subjects performance improved dramatically with experience. After their first experience of composing a query, subjects completion time halved, and asymptotes at that level.

53 Evaluation results Performance over time: performance normalised over complexity After just one go with the CLEF interface, subjects are highly proficient in their ability to compose complex queries. By the time they get to their fourth query, subjects performance is almost perfect. Mean : 0.18 Optimal operation = min # of operations needed to compose the query perfectly. This is a measure of the complexity of the query.

54 Evaluation – comparison with SQL Very small scale experiment Two subjects: with expert knowledge of the structure, organisation and content of the CLEF database highly skilled users of SQL with minimal experience with WYSIWYM were given access to the SNOMED and ICD codes required to build the SQL Each subject composed a query first in the CLEF WYSIWYM Interface and then in SQL

55 Evaluation – comparison with SQL Subject 1 – Query 1 WYSIWYM: 2.3 mins SQL: 8.5 mins (incomplete) Subject 2 – Query 2 WYSIWYM: 4.5 mins SQL:12 mins (incomplete) Even with a slowly reacting interface, the subjects were much faster composing queries in WYSIWYM than in SQL

56 Are the feedback texts ambiguous to the users Identified 6 types of ambiguity 4 examples of each, with forced-choice judgements by 15 subjects Random jugements would give a score of 33% Results show 84% correct judgements

57 summary patient records for clinicians and medical researchers repositorysummarisation for patients summary patient records linear text animated dialoguehypertext

58 Sample report for Clinicians In the weeks 195 to 196, self examination revealed lump of the right breast. In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast.

59 Sample report for Clinicians In the weeks 195 to 196, self examination revealed lump of the right breast. In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast. …

60 Sample report for Patients You had a consultation with your doctor on September 20th On September 27th you did a self examination and you found that you had a lump in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 4th you did another self examination and you found that you still had a lump in your right breast. On October 11th you had a radical mastectomy to treat cancer in your right breast. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. Cancer is a tumour that tends to spread, both locally and to other parts of the body. …

61 Cancer is a tumour that tends to spread, both locally and to other parts of the body. You had a consultation with your doctor on September 20th On September 27th you did a self examination. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 4th you did another self examination. you found that you had a lump in your right breast. On October 11th you had a radical mastectomy. to treat cancer in your right breast. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. SEQUENCE HAS-FINDING SEQUENCE MOTIVATION EXPLANATION Presenting patient records in hypertext: dividing the text into related units

62 Cancer is a tumour that tends to spread, both locally and to other parts of the body. You had a consultation with your doctor on September 20th On September 27th you did a self examination. SEQUENCE A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 4th you did another self examination. SEQUENCE you found that you had a lump in your right breast. HAS-FINDING SEQUENCE On October 11th you had a radical mastectomy. to treat cancer in your right breast. MOTIVATION A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. EXPLANATION Presenting patient records in hypertext: giving graphical attributes to the text units

63 you found that you had a lump in your right breast. The radical mastectomy was done to treat cancer in your right breast. You had a consultation with your doctor on September 20th On September 27th you did a self examination. On October 4th you did another self examination. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 11th you had a radical mastectomy. Presenting patient records in hypertext: using animation to represent discourse patterns dynamically Cancer is a tumour that tends to spread, both locally and to other parts of the body. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.

64 You had a consultation with your doctor on September 20th 1993.

65 On September 27th you did a self examination.

66 You found that you had a lump in your right breast. You had a consultation with your doctor on September 20th On September 27th you did a self examination.

67 You had a consultation with your doctor on September 20th On September 27th you did a self examination. You found that you had a lump in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.

68 You had a consultation with your doctor on September 20th On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination.

69 A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. You had a consultation with your doctor on September 20th On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination. On October 11th you had a radical mastectomy.

70 A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. You had a consultation with your doctor on September 20th On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination. On October 11th you had a radical mastectomy. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.

71 The radical mastectomy was done to treat cancer in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. You had a consultation with your doctor on September 20th On September 27th you did a self examination. You found that you had a lump in your right breast. On October 4th you did another self examination. On October 11th you had a radical mastectomy. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.

72 Monologues/Dialogues Monologue Autonomous agent reads the generated report Aims: accessibility, education (not translation) Dialogue Report is generated as a script that 2 agents act out Aims: accessibility, vicarious learning Example (video clip)


Download ppt "Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University."

Similar presentations


Ads by Google