Share and Share Alike: Resources for Language Generation Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007.

Slides:



Advertisements
Similar presentations
Database Planning, Design, and Administration
Advertisements

Interlanguage phonology: Phonological description of what constitute ‘foreign accents’ have been developed. Studies about the reception of such accents.
User Interface Design Yonsei University 2 nd Semester, 2013 Sanghyun Park.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Theoretical Structure of Financial Accounting
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved. Business and Administrative Communication SIXTH EDITION.
Search Engines and Information Retrieval
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
MITRE © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. What Works, What Doesn’t -- And What Needs to Work Lynette Hirschman Information Technology Center.
Reasons to study concepts of PL
Information Retrieval in Practice
Measuring the quality of academic library electronic services and resources Jillian R Griffiths Research Associate CERLIM – Centre for Research in Library.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Case-based Reasoning System (CBR)
Assignment 1 Pointers ● Be sure to use all tags properly – Don't use a tag for something it wasn't designed for – Ex. Do not use heading tags... for regular.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
1 Software Requirements Specification Lecture 14.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Project Workshops Results and Evaluation. General The Results section presents the results to demonstrate the performance of the proposed solution. It.
Lecture Nine Database Planning, Design, and Administration
The Software Product Life Cycle. Views of the Software Product Life Cycle  Management  Software engineering  Engineering design  Architectural design.
Course Instructor: Aisha Azeem
Preparing for the Verbal Reasoning Measure. Overview Introduction to the Verbal Reasoning Measure Question Types and Strategies for Answering General.
Copyright © 2001 by Wiley. All rights reserved. Chapter 1: Introduction to Programming and Visual Basic Computer Operations What is Programming? OOED Programming.
Discussion examples Andrea Zhok.
Work Sample: Engineering Design Grades 3-5
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Personality Modeling in Dialogue Systems François Mairesse Machine Intelligence Lab University of Cambridge SRI International’s Artificial Intelligence.
Chapter 10 Architectural Design
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
1 Framework Programme 7 Guide for Applicants
Search Engines and Information Retrieval Chapter 1.
Where Innovation Is Tradition SYST699 – Spec Innovations Innoslate™ System Engineering Management Software Tool Test & Analysis.
Requirements Gathering. Why are requirements important? To understand what we are going to be doing We build systems for others, not for ourselves Requirements.
Role-plays for CALL: System Architecture and Resources Sabrina Wilske & Magdalena Wolska Saarland University ICL, Villach, September.
ITEC224 Database Programming
OB : Building Effective Interviewing Skills Building Effective Interviewing Skills Structure Objectives Basic Design Content Areas Questions Interview.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
1 Chapter 14 Architectural Design 2 Why Architecture? The architecture is not the operational software. Rather, it is a representation that enables a.
Programming Project (Last updated: August 31 st /2010) Updates: - All details of project given - Deadline: Part I: September 29 TH 2010 (in class) Part.
How to write a successful EU funded project proposal? Fred de Vries Brussels 21 April 2004 Seminar Networking eLearning Practitioners.
A COMPETENCY APPROACH TO HUMAN RESOURCE MANAGEMENT
Requirement Engineering. Review of Last Lecture Problems with requirement Requirement Engineering –Inception (Set of Questions) –Elicitation (Collaborative.
Software Project Management With Usage of Metrics Candaş BOZKURT - Tekin MENTEŞ Delta Aerospace May 21, 2004.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Computers and Scientific Thinking David Reed, Creighton University Functions and Libraries 1.
Introduction to Dialogue Systems. User Input System Output ?
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
An Evaluation Competition? Eight Reasons to be Cautious Donia Scott Open University & Johanna Moore University of Edinburgh.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Software Requirements Specification Document (SRS)
Team Exercise. 5/29/2007SE Survival Exercise2 SURVIVAL!
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Teaching with Depth An Understanding of Webb’s Depth of Knowledge
Chapter (12) – Old Version
Inquiry learning and SimQuest
David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K
Systems Analysis and Design
VELTI Evaluation Methodology
Issues in Spoken Dialogue Systems
CSc4730/6730 Scientific Visualization
Practical Database Design and Tuning Objectives
Presentation transcript:

Share and Share Alike: Resources for Language Generation Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007

Cognitive Systems University of Sheffield 2 What type of resource is needed for generation? What type of scientific problem is generation? An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation; Language Productivity Assumption : An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output

Cognitive Systems University of Sheffield 3 Dialogue vs. generation? Dialogue is like generation in that there is no single right answer for how to do a task in dialogue; Information gathering and information presentation in dialogue systems are generation problems; DARPA evaluation for dialogue systems; Fixed domain “TRAVEL PLANNING” First: ATIS evaluations compared dialogue system behaviour against human behaviour in corpus of human-wizard dialogues (Hirschman 2000); No “mixed initiative”, different dialogue strategies, divergence of context, user modeling;

Cognitive Systems University of Sheffield 4 Dialogue vs. generation? Second: define context, evaluate on system response to user utterance in a particular context; Much more like generation, context is defined, system ‘communicative goal’ is defined Form: How is ‘the same response’ defined? Some forms for identical content may be better than others; Content: User Models, definitions of context. Also dialogue system should be able to decide on communicative goal.

Cognitive Systems University of Sheffield 5 Dialogue vs. generation? Third: Communicator evaluation: given user task (NYC to LHR, Continental, April 22nd, 2007), collect metrics (time to completion, ASR error, utterance output quality, concept understanding, user satisfaction); Corpus semi-automatically labelled with dialogue act (quality/strategy metrics) for system utterances (8 or more different instantiations from different systems for particular communicative goals); Try to understand which metrics are contributors to user satisfaction (PARADISE); User utterance labelled subsequently, used in RL experiments comparing dialogue strategies; Hard to compare particular scientific techniques for particular modules in systems, plug and play never worked

Cognitive Systems University of Sheffield 6 Dialogue vs. generation: Conclusions? Just having a fixed task (TRAVEL) by itself does not necessarily lead to scientific progress; Want to compare particular scientific techniques for particular modules in systems; Plug and play is the only way to do this; BUT: very hard to define for a whole community what interfaces between modules should be

Cognitive Systems University of Sheffield 7 Position What type of resources would be useful for scientific advancement in language generation?? Almost anything!! “If you build it they will come” - “If its useful people will use it” Can we leverage what we already have in our own research groups, share it, and make it better?

Cognitive Systems University of Sheffield 8 What is needed to incentivize data sharing Many different domains/problems/modules => NEED LOTS OF DIFFERENT RESOURCES; Resources costly (developing group not ‘finished’ yet) => FINANCIAL INCENTIVE; SCIENTIFIC INCENTIVE; CITATION INCENTIVE; Costs too much to support resource preparation, maintenance, distribution and re-use => NSF/LDC FINANCIAL/SUPPORT NOTE: MANY LDC RESOURCES ARE ``FOUND DATA’’ (not explicitly commissioned)

A proposal for one shared resource

Cognitive Systems University of Sheffield 10 Information presentation of one or more database entities Natural Language Interfaces/SDS (McKeown85, McCoy89, Cooperative Response literature, Carenini&Moore01, Polifroni etal 03, COGENTEX w/ active buyers website, Walkeretal04,Demberg&Moore06, etc) Different communicative goals; Summarize, Recommend, Compare, Describe (DB entities) Representation not controversial (attributes and values for DB entities, relations between entity and attribute) Application not dependent on NLU

Cognitive Systems University of Sheffield 11 What type of resource is needed for generation? What type of scientific problem is generation? An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation; Language Productivity Assumption : An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output

Cognitive Systems University of Sheffield 12 We could make available a resource of: INPUT-1: Speech ACT, SET of DB Entities SUMMARIZE(SET); DESCRIBE(ENTITY), RECOMMEND(ENTITY,SET), COMPARE(SET) INPUT-2: user model, discourse/dialogue context, style parameters, etc. OUTPUT-1: a set of alternative outputs possibly with TTS markup OUTPUT-2: human generated ratings or rankings for the outputs oriented to the criteria specified by INPUT-2

Cognitive Systems University of Sheffield 13 A Content Plan for a Recommend strategy: recommend relations: justify(nuc1; sat:2); justify(nuc:1; sat:3); justify(nuc:1, sat:4) content: 1. assert(best (Babbo)) 2. assert(has-att (Babbo, foodquality(superb))) 3. assert(has-att (Babbo, decor(excellent))) 4. assert(has-att (Babbo, service(excellent)))

Cognitive Systems University of Sheffield 14 Human Feedback for Ranking The ratings can represent any metric associated with the possible response, e.g. coherence, information quality, social appropriateness, personality. Informational Coherence SPARKY, a generator for MATCH SPOT, a generator for AT&T COMMUNICATOR Users are shown response variants then told: For each variant, please rate to what extent you agree with this statement. The utterance is easy to understand, well-formed and appropriate to the dialogue context.

Cognitive Systems University of Sheffield 15 Examples: Learned Rules applied to test fold Realization HumanRankBoost Babbo has the best overall quality among the selected restaurants because it has superb food quality, with excellent service, and it has excellent decor Babbo has excellent service. It has superb food quality. It has excellent decor. It has the best overall quality among the selected restaurants Since Babbo has excellent service and superb food quality, with excellent decor, it has the best overall quality among the selected restaurants Babbo has excellent service and superb food quality, with excellent decor. It has the best overall quality among the selected restaurants With excellent decor, excellent service and superb food quality, Babbo has the best overall quality among the selected restaurants

Cognitive Systems University of Sheffield 16 Individual Differences (Sentence Planning Preferences) Realization (Mairesse&Walker05, Stentetal04) Judge AJudge B Chanpen Thai has the best overall quality among the selected restaurants since it is a Thai restaurant, with good service, its price is 24 dollars, and it has good food quality. 1 4 Chanpen Thai has the best overall quality among the selected restaurants because it has good service, it has good food quality, it is a Thai restaurant, and its price is 24 dollars. 25 Chanpen Thai has the best overall quality among the selected restaurants. Its price is 24 dollars. It is a Thai restaurant, with good service. It has good food quality. 33 Chanpen Thai has the best overall quality among the selected restaurants. This Thai restaurant has good food quality. Its price is 24 dollars, and it has good service. & 4 & 3 43 Chanpen Thai is a Thai restaurant, with good food quality. It has good service. Its price is 24 dollars. It has the best overall quality among the selected restaurants. & 4 & 2 4 2

Cognitive Systems University of Sheffield 17 Human Feedback for Ranking (2) Ten Item Personality Inventory Questionnaire, (Gosling 2003) PERSONAGE Users are shown response variants then told: For each variant, rate on a scale of 1 to 7 whether: The speaker is quiet, reserved; The speaker is enthusiastic;

Cognitive Systems University of Sheffield 18 Personality judgments: `Recommend Le Marais’ Realization Extra Err... it seems to me that Le Marais isn’t as bad as the others Right, I mean, Le Marais is the only restaurant that is any good Ok, I mean, Le Marais is a quite french, kosher and steak house place, you know and the atmosphere isn’t nasty, it has nice atmosphere. It has friendly service. It seems to me that the service is nice. It isn’t as bad as the others, is it? 5.17 Le Marais has the best overall quality among the selected restaurants. It has decent decor, it has decent service, and its price is 44 dollars. This French, Kosher, Steak House restaurant has very good food quality Well, it seems to me that I am sure you would like Le Marais. It has good food, the food is sort of rather tasty, the ambience is nice, the atmosphere isn’t sort of nasty, it features rather friendly servers and its price is around 44 dollars I am sure you would like Le Marais, you know. The atmosphere is acceptable, the servers are nice and it’s a french, kosher and steak house place. Actually, the food is good, even if its price is 44 dollars Basically, actually, I am sure you would like Le Marais. It features friendly service and acceptable atmosphere and it’s a french, kosher and steak house place. Even if its price is 44 dollars, it just has really good food, nice food. 6.17

Cognitive Systems University of Sheffield 19 What else is out there? Coconut corpus: referring expression generation, but add alternatives and ratings? Boston directions corpus (NSF funded early 1990s) Communicator corpus (8 different system outputs for dialogue contexts that can be characterized) Tools: Halogen, Penman, FUF-SURGE, RealPro Library of text plans, content plans, sentence planners?