Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ehud Reiter, Computing Science, University of Aberdeen1 CS5545: Natural Language Generation Background Reading: Reiter and Dale, Building Natural Language.

Similar presentations


Presentation on theme: "Ehud Reiter, Computing Science, University of Aberdeen1 CS5545: Natural Language Generation Background Reading: Reiter and Dale, Building Natural Language."— Presentation transcript:

1 Ehud Reiter, Computing Science, University of Aberdeen1 CS5545: Natural Language Generation Background Reading: Reiter and Dale, Building Natural Language Generation Systems, chaps 1, 2

2 Ehud Reiter, Computing Science, University of Aberdeen2 Words instead of Pictures l Natural Language Generation (NLG): Generate English sentences to communicate data, instead of visualisations (or tables) l A research focus of Aberdeen CS Dept

3 Ehud Reiter, Computing Science, University of Aberdeen3 Example : FoG l Produces textual weather reports in English and French l Input: »Graphical/numerical weather depiction l User: »Environment Canada (Canadian Weather Service)

4 Ehud Reiter, Computing Science, University of Aberdeen4 FoG: Input

5 Ehud Reiter, Computing Science, University of Aberdeen5 FoG: Output

6 Ehud Reiter, Computing Science, University of Aberdeen6 Why use words? l Many potential reasons »Media restrictions (eg, text messages) »Users not knowledgeable enough to interpret a graph correctly »Words also communicate background info, emphasis, interpretation, … »People (in some cases) make better decisions from words than graphs

7 Ehud Reiter, Computing Science, University of Aberdeen7 Too hard for 1/3 of patients

8 Ehud Reiter, Computing Science, University of Aberdeen8 Easier for many people? l I’m afraid to say that you have a 1 in 3 chance of dying from a heart attack before your 65 th birthday if you carry on as you are. But if you stop smoking, take your medicine, and eat better, a fatal heart attack will be much less likely (only a 1 in 12 chance).

9 Ehud Reiter, Computing Science, University of Aberdeen9 Text vs Graph l Focus on key info (absolute risk, optimum risk) l Integrate with explanation (optimum risk means if you stop smoking, eat better, take medicine) l Add emphasis, perspective, “spin” (eg. “I’m afraid to say” indicates this is a serious problem)

10 Ehud Reiter, Computing Science, University of Aberdeen10 Experiment: Decision Making l Showed 40 medical professionals (from junior nurses to senior doctors) data from a baby in neonatal ICU »Text summary of graphical depiction l Asked to make a treatment decision »Better decision when shown text »But said they preferred the graphic

11 Ehud Reiter, Computing Science, University of Aberdeen11 Graphic Depiction

12 Ehud Reiter, Computing Science, University of Aberdeen12 Text Summary

13 Ehud Reiter, Computing Science, University of Aberdeen13 What is NLG? l NLG systems are computer systems which produces understandable and appropriate texts in English or other human languages »Input is data (raw, analysed) »Output is documents, reports, explanations, help messages, and other kinds of texts l Requires »Knowledge of language »Knowledge of the domain

14 Ehud Reiter, Computing Science, University of Aberdeen14 Text Language Technology Natural Language Understanding Natural Language Generation Speech Recognition Speech Synthesis Text Meaning Speech

15 Ehud Reiter, Computing Science, University of Aberdeen15 Aberdeen NLG Systems l STOP (smoking cessation letters) (demo) l SumTime (weather forecasts) (demos) l Ilex (museum description) (demo?) l SkillSum (feedback on assessment) l StandUp (help children make puns) l BabyTalk (summary of patient data) »Looking for a PhD student…

16 Ehud Reiter, Computing Science, University of Aberdeen16

17 Ehud Reiter, Computing Science, University of Aberdeen17 How do NLG Systems Work? l Usually three stages l Document planning: decide on content and structure of text l Microplanning: decide how to linguistically express text (which words, sentences, etc to use) l Realisation: actually produce text, conforming to rules of grammar

18 Ehud Reiter, Computing Science, University of Aberdeen18 Scuba: example input l Input: three types »Raw data: eg dive data in scuba.mdb »Trends: segmented data (as in pract 2) »Patterns: eg, rapid ascent, sawtooth, reverse dive profile, etc (as in pract 2)

19 Ehud Reiter, Computing Science, University of Aberdeen19 Scuba: target (human) output l Your first ascent was a bit rapid; you ascended from 33m to the surface in 5 minutes, you should have taken more time to make this ascent. You also did not stop at 5m, we recommend that anyone diving beneath 12m should stop for 3 minutes at 5m. Your second ascent was fine.

20 Ehud Reiter, Computing Science, University of Aberdeen20 Document Planning l Content selection: Of the zillions of things I could say, which should I say? »Depends on what is important »Also depends on what is easy to say l Structure: How should I organise this content as a text? »What order do I say things in? »Rhetorical structure?

21 Ehud Reiter, Computing Science, University of Aberdeen21 Scuba: content l Probably focus on patterns indicating dangerous activities »Most important thing to mention l How much should we say about these? »Detail? Explanations? l Should we say anything for safe dives? »Maybe just acknowledge them?

22 Ehud Reiter, Computing Science, University of Aberdeen22 Scuba: structure l Order by time (first event first) »Or should we mention the most dangerous patterns first? l Linking words (cue phrases) »Also, but, because, …

23 Ehud Reiter, Computing Science, University of Aberdeen23 Microplanning l Lexical choice: Which words to use? l Aggegation: How should information be distributed across sentences and paras l Reference: How should the text refer to objects and entities?

24 Ehud Reiter, Computing Science, University of Aberdeen24 SCUBA: microplanning l Lexical choice: »A bit rapid vs too fast vs unwise vs … »Ascended vs rose vs rose to surface vs … l Aggregation: 1 sentence or 3 sent? »“Your first ascent was a bit rapid; you ascended from 33m to the surface in 5 minutes, it would have been better if you had taken more time to make this ascent.”

25 Ehud Reiter, Computing Science, University of Aberdeen25 Scuba: Microplanning l Aggregation (continued) »Phrase merging –“Your first ascent was fine. Your second ascent was fine” vs –“Your first and second ascents were fine.” »Reference –Your ascent vs –Your first ascent vs –Your ascent from 33m at 3 min

26 Ehud Reiter, Computing Science, University of Aberdeen26 Realisation l Grammars (linguistic): Form legal English sentences based on decisions made in previous stages »Obey sublanguage, genre constraints l Structure: Form legal HTML, RTF, or whatever output format is desired

27 Ehud Reiter, Computing Science, University of Aberdeen27 Scuba: Realisation l Simple linguistic processing »Capitalise first word of sentence »Subject-verb agreement –Your first ascent was fine –Your first and second ascents were fine l Structure »Inserting line breaks in text (pouring) »Add HTML markups, eg,

28 Ehud Reiter, Computing Science, University of Aberdeen28 Multimodal NLG l Speech output »Feed text output into a speech synthesiser »Tight integration with synthesiser –Higher quality voice l Text and visualisations »Produce separately »Tight integration –Eg, text refers to graphic, graphs has text popup

29 Ehud Reiter, Computing Science, University of Aberdeen29 Building NLG Systems l Knowledge l Representations l Algorithms l Systems

30 Ehud Reiter, Computing Science, University of Aberdeen30 Building NLG Systems: Knowledge l Need knowledge »Which patterns most important? »What order to use? »Which words to use? »When to merge phrases? »How to form plurals »Etc l Where does this come from?

31 Ehud Reiter, Computing Science, University of Aberdeen31 Knowledge Sources l Imitate a corpus of human-written texts »Most straightforward, will focus on l Ask domain experts »Useful, but experts often not very good at explaining what they are doing l Experiments with users »Very nice in principle, but a lot of work

32 Ehud Reiter, Computing Science, University of Aberdeen32 Scuba: Corpus l See which patterns humans mention in the corpus, and have the system mention these l See the order used by humans, and have the system imitate these l etc

33 Ehud Reiter, Computing Science, University of Aberdeen33 Building NLG Systems: Algorithms and Representations l Various algorithms and representations have been designed for NLG tasks »Will discuss in later lectures l But often can simply code NLG systems straightforwardly in Java, without special algorithms »Knowledge is more important

34 Ehud Reiter, Computing Science, University of Aberdeen34 Building NLG Systems: Systems l Ideally should be able to plug knowledge into downloadable systems l Unfortunately very little in the way of downloadable NLG systems »Mostly specialised stuff primarily of interest to academics, eg http://openccg.sourceforge.net/ http://openccg.sourceforge.net/ l I would like to improve situation

35 Ehud Reiter, Computing Science, University of Aberdeen35 Aberdeen NLG group l 15 academic staff, researchers, PhD students »Leader: Prof Chris Mellish l (one of) the best NLG groups in world l Looking for more researchers and PhD students… (esp BabyTalk project) »Let me know if interested!


Download ppt "Ehud Reiter, Computing Science, University of Aberdeen1 CS5545: Natural Language Generation Background Reading: Reiter and Dale, Building Natural Language."

Similar presentations


Ads by Google