Presentation is loading. Please wait.

Presentation is loading. Please wait.

Share and Share Alike: Resources for Language Generation Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007.

Similar presentations


Presentation on theme: "Share and Share Alike: Resources for Language Generation Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007."— Presentation transcript:

1 Share and Share Alike: Resources for Language Generation Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007

2 Cognitive Systems University of Sheffield 2 What type of resource is needed for generation? What type of scientific problem is generation? An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation; Language Productivity Assumption : An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output

3 Cognitive Systems University of Sheffield 3 Dialogue vs. generation? Dialogue is like generation in that there is no single right answer for how to do a task in dialogue; Information gathering and information presentation in dialogue systems are generation problems; DARPA evaluation for dialogue systems; Fixed domain “TRAVEL PLANNING” First: ATIS evaluations compared dialogue system behaviour against human behaviour in corpus of human-wizard dialogues (Hirschman 2000); No “mixed initiative”, different dialogue strategies, divergence of context, user modeling;

4 Cognitive Systems University of Sheffield 4 Dialogue vs. generation? Second: define context, evaluate on system response to user utterance in a particular context; Much more like generation, context is defined, system ‘communicative goal’ is defined Form: How is ‘the same response’ defined? Some forms for identical content may be better than others; Content: User Models, definitions of context. Also dialogue system should be able to decide on communicative goal.

5 Cognitive Systems University of Sheffield 5 Dialogue vs. generation? Third: Communicator evaluation: given user task (NYC to LHR, Continental, April 22nd, 2007), collect metrics (time to completion, ASR error, utterance output quality, concept understanding, user satisfaction); Corpus semi-automatically labelled with dialogue act (quality/strategy metrics) for system utterances (8 or more different instantiations from different systems for particular communicative goals); Try to understand which metrics are contributors to user satisfaction (PARADISE); User utterance labelled subsequently, used in RL experiments comparing dialogue strategies; Hard to compare particular scientific techniques for particular modules in systems, plug and play never worked

6 Cognitive Systems University of Sheffield 6 Dialogue vs. generation: Conclusions? Just having a fixed task (TRAVEL) by itself does not necessarily lead to scientific progress; Want to compare particular scientific techniques for particular modules in systems; Plug and play is the only way to do this; BUT: very hard to define for a whole community what interfaces between modules should be

7 Cognitive Systems University of Sheffield 7 Position What type of resources would be useful for scientific advancement in language generation?? Almost anything!! “If you build it they will come” - “If its useful people will use it” Can we leverage what we already have in our own research groups, share it, and make it better?

8 Cognitive Systems University of Sheffield 8 What is needed to incentivize data sharing Many different domains/problems/modules => NEED LOTS OF DIFFERENT RESOURCES; Resources costly (developing group not ‘finished’ yet) => FINANCIAL INCENTIVE; SCIENTIFIC INCENTIVE; CITATION INCENTIVE; Costs too much to support resource preparation, maintenance, distribution and re-use => NSF/LDC FINANCIAL/SUPPORT NOTE: MANY LDC RESOURCES ARE ``FOUND DATA’’ (not explicitly commissioned)

9 A proposal for one shared resource

10 Cognitive Systems University of Sheffield 10 Information presentation of one or more database entities Natural Language Interfaces/SDS (McKeown85, McCoy89, Cooperative Response literature, Carenini&Moore01, Polifroni etal 03, COGENTEX w/ active buyers website, Walkeretal04,Demberg&Moore06, etc) Different communicative goals; Summarize, Recommend, Compare, Describe (DB entities) Representation not controversial (attributes and values for DB entities, relations between entity and attribute) Application not dependent on NLU

11 Cognitive Systems University of Sheffield 11 What type of resource is needed for generation? What type of scientific problem is generation? An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation; Language Productivity Assumption : An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output

12 Cognitive Systems University of Sheffield 12 We could make available a resource of: INPUT-1: Speech ACT, SET of DB Entities SUMMARIZE(SET); DESCRIBE(ENTITY), RECOMMEND(ENTITY,SET), COMPARE(SET) INPUT-2: user model, discourse/dialogue context, style parameters, etc. OUTPUT-1: a set of alternative outputs possibly with TTS markup OUTPUT-2: human generated ratings or rankings for the outputs oriented to the criteria specified by INPUT-2

13 Cognitive Systems University of Sheffield 13 A Content Plan for a Recommend strategy: recommend relations: justify(nuc1; sat:2); justify(nuc:1; sat:3); justify(nuc:1, sat:4) content: 1. assert(best (Babbo)) 2. assert(has-att (Babbo, foodquality(superb))) 3. assert(has-att (Babbo, decor(excellent))) 4. assert(has-att (Babbo, service(excellent)))

14 Cognitive Systems University of Sheffield 14 Human Feedback for Ranking The ratings can represent any metric associated with the possible response, e.g. coherence, information quality, social appropriateness, personality. Informational Coherence SPARKY, a generator for MATCH SPOT, a generator for AT&T COMMUNICATOR Users are shown response variants then told: For each variant, please rate to what extent you agree with this statement. The utterance is easy to understand, well-formed and appropriate to the dialogue context.

15 Cognitive Systems University of Sheffield 15 Examples: Learned Rules applied to test fold Realization HumanRankBoost Babbo has the best overall quality among the selected restaurants because it has superb food quality, with excellent service, and it has excellent decor. 1.5 0.45 Babbo has excellent service. It has superb food quality. It has excellent decor. It has the best overall quality among the selected restaurants. 2.00.21 Since Babbo has excellent service and superb food quality, with excellent decor, it has the best overall quality among the selected restaurants. 3.50.77 Babbo has excellent service and superb food quality, with excellent decor. It has the best overall quality among the selected restaurants 40.88 With excellent decor, excellent service and superb food quality, Babbo has the best overall quality among the selected restaurants.. 5 0.91

16 Cognitive Systems University of Sheffield 16 Individual Differences (Sentence Planning Preferences) Realization (Mairesse&Walker05, Stentetal04) Judge AJudge B Chanpen Thai has the best overall quality among the selected restaurants since it is a Thai restaurant, with good service, its price is 24 dollars, and it has good food quality. 1 4 Chanpen Thai has the best overall quality among the selected restaurants because it has good service, it has good food quality, it is a Thai restaurant, and its price is 24 dollars. 25 Chanpen Thai has the best overall quality among the selected restaurants. Its price is 24 dollars. It is a Thai restaurant, with good service. It has good food quality. 33 Chanpen Thai has the best overall quality among the selected restaurants. This Thai restaurant has good food quality. Its price is 24 dollars, and it has good service. & 4 & 3 43 Chanpen Thai is a Thai restaurant, with good food quality. It has good service. Its price is 24 dollars. It has the best overall quality among the selected restaurants. & 4 & 2 4 2

17 Cognitive Systems University of Sheffield 17 Human Feedback for Ranking (2) Ten Item Personality Inventory Questionnaire, (Gosling 2003) PERSONAGE Users are shown response variants then told: For each variant, rate on a scale of 1 to 7 whether: The speaker is quiet, reserved; The speaker is enthusiastic;

18 Cognitive Systems University of Sheffield 18 Personality judgments: `Recommend Le Marais’ Realization Extra Err... it seems to me that Le Marais isn’t as bad as the others. 1.83 Right, I mean, Le Marais is the only restaurant that is any good. 2.83 Ok, I mean, Le Marais is a quite french, kosher and steak house place, you know and the atmosphere isn’t nasty, it has nice atmosphere. It has friendly service. It seems to me that the service is nice. It isn’t as bad as the others, is it? 5.17 Le Marais has the best overall quality among the selected restaurants. It has decent decor, it has decent service, and its price is 44 dollars. This French, Kosher, Steak House restaurant has very good food quality. 5.67 Well, it seems to me that I am sure you would like Le Marais. It has good food, the food is sort of rather tasty, the ambience is nice, the atmosphere isn’t sort of nasty, it features rather friendly servers and its price is around 44 dollars. 5.83 I am sure you would like Le Marais, you know. The atmosphere is acceptable, the servers are nice and it’s a french, kosher and steak house place. Actually, the food is good, even if its price is 44 dollars. 6.00 Basically, actually, I am sure you would like Le Marais. It features friendly service and acceptable atmosphere and it’s a french, kosher and steak house place. Even if its price is 44 dollars, it just has really good food, nice food. 6.17

19 Cognitive Systems University of Sheffield 19 What else is out there? Coconut corpus: referring expression generation, but add alternatives and ratings? Boston directions corpus (NSF funded early 1990s) Communicator corpus (8 different system outputs for dialogue contexts that can be characterized) Tools: Halogen, Penman, FUF-SURGE, RealPro Library of text plans, content plans, sentence planners?


Download ppt "Share and Share Alike: Resources for Language Generation Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007."

Similar presentations


Ads by Google