Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning.

Similar presentations


Presentation on theme: "Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning."— Presentation transcript:

1 Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

2 Ehud Reiter, Computing Science, University of Aberdeen2 Braun’s SaferDriver l Aberdeen MSc by Research l App which gives drivers feedback about speeding, etc »Gathers data from GPS, acceleration »Produces map as well as text »Evaluated l https://aclweb.org/anthology/W/W15/W15-4726.pdf

3 Ehud Reiter, Computing Science, University of Aberdeen3 Document Planning l First stage of NLG l Two tasks »Decide on content (our focus) »Decide on rhetorical structure l Can be interleaved

4 Ehud Reiter, Computing Science, University of Aberdeen4 Document Planning l Problem: Usually the output text can only communicate a small portion of the input data »Which bits should be communicated? »Should the text also communicate summaries, perspectives, etc »How should information be ordered and structured?

5 Ehud Reiter, Computing Science, University of Aberdeen5 ScubaText: Imitate Corpus l One approach to document planning is to analyse corpus texts (after aligning them to data), and manually infer content and structure rules. l Scubatext example

6 Ehud Reiter, Computing Science, University of Aberdeen6 Example Input

7 Ehud Reiter, Computing Science, University of Aberdeen7 Input Segments diveNosegNoitimeivalueftimefvalue 1460101.3606.3 14602606.314032.2 1460314032.24800 1460448007600 14605760092038.9 1460692038.9130041.6 14607130041.6150015.5 14608150015.5186027.2 14609186027.221609.2 14601021609.226002.7

8 Ehud Reiter, Computing Science, University of Aberdeen8 Corresponding Corpus Text l Your first ascent was a bit rapid; you ascended from 33m to the surface in 5 minutes, it would have been better if you had taken more time to make this ascent. You also did not stop at 5m, we recommend that anyone diving beneath 12m should stop for 3 minutes at 5m. Your second ascent was fine.

9 Ehud Reiter, Computing Science, University of Aberdeen9 Align corpus text with data Input: 1460314032.24800 Output: (representation of) Your first ascent was a bit rapid; you ascended from 33m to the surface in 5 minutes, it would have been better if you had taken more time to make this ascent Input: 14601021609.226002.7 Output: (representation of) Your second ascent was fine

10 Ehud Reiter, Computing Science, University of Aberdeen10 Possible content rules l Describe segments that end (near) 0 »And that don’t start at 0 »Also segment at end of dive l Give additional info about such segments whose slope is too high »Explain risk »Say what should have happened

11 Ehud Reiter, Computing Science, University of Aberdeen11 Possible ordering rules l Break up dive into sections, where each section starts and ends at surface l For each section, start with most important safety issue (or say dive was fine if no safety issue) l Then add less important safety issues

12 Ehud Reiter, Computing Science, University of Aberdeen12 More examples l We’ve just looked at one example here! l Need to repeat process for at least 20- 30 examples, which cover spread of possible cases (including special cases) l Merge rules and deal with conflicts »Often causes by different corpus authors writing differently; may give priority to one particular author, and imitate his style

13 Ehud Reiter, Computing Science, University of Aberdeen13 Content Determination l The most important aspect of NLG! »If we get content right, users may not be too fussed if language isn’t perfect »If we get content wrong, users will be unhappy even if language is perfect l Also the most domain-dependent aspect »Based on domain, user, tasks more than general knowledge about language

14 Ehud Reiter, Computing Science, University of Aberdeen14 Text Structure l Output of DP also shows text structure »Tree, leaves represent sentences, phrases –For now, assume leaves are strings –Look at other representations next week »Internal nodes group leaves, lower nodes –Can mark sentences, paragraphs, etc breaks –Can express rhetorical relations between nodes

15 Ehud Reiter, Computing Science, University of Aberdeen15 Example tree l Your first ascent was too rapid, you ascended from 33m to the surface in 5 minutes. However, your second ascent was fine. your first ascent was too rapid you ascended from 33m to the surface in 5 minutes N2: elaboration N1: contrast (paragraph) your second ascent was fine

16 Ehud Reiter, Computing Science, University of Aberdeen16 Why tree? l Tree shows where constituents can be merged (children of same parent) »Your first ascent was a bit rapid. You ascended from 33m to the surface in 5 minutes. However, your second ascent was fine. (OK) »Your first ascent was a bit rapid, you ascended from 33m to the surface in 5 minutes. However, your second ascent was fine. (OK) »Your first ascent was a bit rapid. You ascended from 33m to the surface in 5 minutes, however, your second ascent was fine. (NOT OK)

17 Ehud Reiter, Computing Science, University of Aberdeen17 Rhetorical Rel l Shows how messages relate to each other l RR can be expressed via cue phrases. »Best cue phrase for RR depends on context –Your first ascent was a bit rapid. However, your second ascent was fine. –Your first ascent was a bit rapid, but your second ascent was fine. »Also readers like cue phrases to be varied, not same one used again and again –Eg, don’t overuse “for example” »Hence better to specify abstract RR in text spec

18 Ehud Reiter, Computing Science, University of Aberdeen18 Common Rhetorical Rels l CONCESSION (although, despite) l CONTRAST (but, however) l ELABORATION (usually no cue) l EXAMPLE (for example, for instance) l REASON (because, since) l SEQUENCE (and, also) l Research community does not agree »Many different sets of rhetorical rels proposed

19 Ehud Reiter, Computing Science, University of Aberdeen19 How Choose Content l Theoretical approach: deep reasoning based on deep knowledge of user, task, context, etc l Pragmatic approach: write schemas which try to imitate human-written texts in a corpus l Statistical approach: Use learning tech to learn content rules from corpus

20 Ehud Reiter, Computing Science, University of Aberdeen20 Theoretical Approach l Deduce what the user needs to know, and communicate this l Based on in-depth knowledge »User (knowledge, task, etc) »Context, domain, world l Use AI reasoning engine »AI Planner, plan recognition system

21 Ehud Reiter, Computing Science, University of Aberdeen21 Example l User (on rig) asks for weather report »User’s task is unloading supply boats » supply boat SeaHorse is approaching rig l Inference: user wants to unload SeaHorse when it arrives »Tell him what affects choice of unloading procedure (eg, wind dir, wave height) –Use knowledge of rig, SeaHorse, cargo, …

22 Ehud Reiter, Computing Science, University of Aberdeen22 Theoretical Approach l Not feasible in practice (at least in my experience) »Lack knowledge about user –Maybe wants to land supply helicopter »Lack knowledge of context –Eg, which supply boats are approaching »Very hard to maintain knowledge base –New users, boats, tasks, regulations…

23 Ehud Reiter, Computing Science, University of Aberdeen23 Pragmatic Approach: Schema l Templates, recipes, programs for text content l Typically based on imitating patterns seen in human-written texts (corpus) »Revised based on user feedback l Specify structure as well as content

24 Ehud Reiter, Computing Science, University of Aberdeen24 Schema Implementation l Usually just written as code in Java or other standard programming languages »Some special languages, but publicly available ones not very useful

25 Ehud Reiter, Computing Science, University of Aberdeen25 Pseudocode example Schema ScubaSchema for each ascent A in data set if ascent is too fast add unsafeAscentSchema(A) else add safeAscentSchema(A) set rhetorical relation

26 Ehud Reiter, Computing Science, University of Aberdeen26 Pseudocode example Schema unsafeAscentSchema(Ascent A) create DocumentElement add sentence “Your ascent was too fast” add sentence “You ascended from [A.startValue()] to the surface in [A.duration()] minutes” return

27 Ehud Reiter, Computing Science, University of Aberdeen27 Creating Schemas l Creating schemas is an art, no solid methodology (yet) l Williams and Reiter (2005) »Create top-down, based on corpus texts »First identify high-level structure of corpus doc –Eg, each ascent described in temporal sequence –Build schemas based on this »Then create low-level schemas (rules) –Eg, for describing a single ascent

28 Ehud Reiter, Computing Science, University of Aberdeen28 Williams and Reiter l Problems »Corpus texts likely to be inconsistent –Especially if several authors wrote texts »Some cases not covered in the corpus –Unusual cases, boundary cases l Developer needs to use intuition for such cases »check with experts, users!

29 Ehud Reiter, Computing Science, University of Aberdeen29 Williams and Reiter l Evaluation/Testing »Essential! »Developer-based: –Compare to corpora –Check boundary cases for reasonableness »Expert-based: Ask experts to revise (post- edit) texts »User-based: Ask users for comments

30 Ehud Reiter, Computing Science, University of Aberdeen30 Statistical content det l Statistical/learning techniques »Parse corpus, align with source data, use machine learning algorithms to learn content selection rules/schemas/cases –Barzilay and Lapata, 2005 –Kondadidi et al 2013 l Worth considering if large corpora available

31 Ehud Reiter, Computing Science, University of Aberdeen31 Reuters approach l http://www.aclweb.org/anthology/P/P13/P13-1138.pdf http://www.aclweb.org/anthology/P/P13/P13-1138.pdf l Training: Parse corpus of old newswire articles »Turn sentences, phrases into templates »Cluster templates into “conceptual unit” »Train model which chooses template for 1 st, 2 nd, 3 rd sentence –Features such as “prior conceptual unit” l Generation »Eliminate templates/conceptual units which dont fit the input data »Use model to choose templates for 1 st, 2 nd, etc sentences »Fill in holes in template from data

32 Ehud Reiter, Computing Science, University of Aberdeen32 Example l Corpus: Mr Jones has a Bachelors in Finance from Harvard l Template: [person] has [degree] in [subject] from [university] l Conceptual unit: Person/degree/subject/university »Also covers “[person] was awarded a [degree] in [subject] from [university]”

33 Ehud Reiter, Computing Science, University of Aberdeen33 Example l Data includes tuple l Ranking module chooses previous template as top candidate for sentence 2 l Include sentence “Miss Smith has an MSc in Computing from Aberdeen”

34 Ehud Reiter, Computing Science, University of Aberdeen34 Advanced Topic: Computing Rhetorical Relation l In theory, can compute the rhetorical relation between nodes »There are formal definitions of rhet rel –See p103 of Reiter and Dale »Reason about which definition best fits to current context l Not easy to do in practice »Alternative is imitate corpus

35 Ehud Reiter, Computing Science, University of Aberdeen35 Advanced Topic: Perspectives l Text can communicate perspectives, “spin”, as well as raw data. Eg, »Smoking is killing you »If you keep on smoking, your health may keep on getting worse »If you stop smoking, your health is likely to improve »If you stop smoking, you’ll feel better

36 Ehud Reiter, Computing Science, University of Aberdeen36 Perspectives l How to choose between these? l Depends on personality of reader »Some people react better to positive messages, others to negative messages »Some react better to short direct messages, others want these weakened (“may”, “is likely to”) »Hard to predict…

37 Ehud Reiter, Computing Science, University of Aberdeen37 Advanced: User-Adaptation l Texts should depend on »User’s personality (previous) »User’s domain knowledge (how much do we need to explain) »User’s vocabulary (can we use technical terms in the text) »User’s task (what does he need to know) l Hard to get this information…

38 Ehud Reiter, Computing Science, University of Aberdeen38 Advanced: Continuity l Babytalk: users complained about »TcPO2 suddenly decreased to 8.1. TcPO2 suddenly decreased to 9.3. l intermediate rise of TcPO2 not mentioned because slow and less medically important l But users have “continuity” assumption that behaviour remains as last mentioned »Need to add material if continuity violated »TcPO2 suddenly decreased to 8.1. After rising back to 11.2, TcPO2 suddenly decreased to 9.3. l Narrative issue (Reiter et al, 2008)?

39 Ehud Reiter, Computing Science, University of Aberdeen39 Conclusion l Content determination is the first and most important aspect of NLG »What information should we communicate? l Mostly based on imitating what is observed in human-written texts »Using schemas, written in Java l Also decide on structure »Tree structure, rhetorical relations


Download ppt "Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning."

Similar presentations


Ads by Google