Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 To Share a Task or Not: Some Ramblings from a Mad (i.e., crazy) INLGer Kathy McCoy CIS Department University of Delaware.

Similar presentations


Presentation on theme: "1 To Share a Task or Not: Some Ramblings from a Mad (i.e., crazy) INLGer Kathy McCoy CIS Department University of Delaware."— Presentation transcript:

1 1 To Share a Task or Not: Some Ramblings from a Mad (i.e., crazy) INLGer Kathy McCoy CIS Department University of Delaware

2 2 What is intended by Shared Task? A competition for money? A competition for money? A funded activity in itself? A funded activity in itself? A competition just for the fun of it? A competition just for the fun of it? A competition or a cooperation? A competition or a cooperation? A cooperation would entail groups of researchers collaborating on a larger system (need agreed-upon architecture) A cooperation would entail groups of researchers collaborating on a larger system (need agreed-upon architecture) A competition would entail different groups working “against each other” on the same problem A competition would entail different groups working “against each other” on the same problem

3 3 What is the desired outcome? An advance in technology that may be applicable in lots of different places? An advance in technology that may be applicable in lots of different places? An advance in NLG technology that will allow more commercialization? bigger web presence? more excitement? An advance in NLG technology that will allow more commercialization? bigger web presence? more excitement? More funding for INLG research? More funding for INLG research? More publications of INLG research? More publications of INLG research? Bring more people into the field? Bring more people into the field? Get some important task done (that needs INLG)? Get some important task done (that needs INLG)?

4 4 What about Comparative Evaluations? Major problem here is that we must agree on what is to be evaluated and how. Major problem here is that we must agree on what is to be evaluated and how. Must have a number of different groups working on precisely the same problem with same assumptions. Must have a number of different groups working on precisely the same problem with same assumptions. What is the desired outcome of comparative evaluations? What is the desired outcome of comparative evaluations? We get to name a system “winner”? We get to name a system “winner”? Presumably we would learn something about the task, but it isn’t quite clear to me what that something is. Presumably we would learn something about the task, but it isn’t quite clear to me what that something is.

5 5 2 Ends of the Spectrum in Shared Task/Evaluations 1. The killer application Text summarization should have been it! Text summarization should have been it! Generates excitement in the field Generates excitement in the field Generates funding opportunities Generates funding opportunities 2. Component pieces Referring expression generation is an example Referring expression generation is an example What will be accomplished? What will be accomplished? Someone gets a gold star? Someone gets a gold star?

6 6 The kind you want depends on your ultimate goal. The kind you want depends on your ultimate goal. Both share some dangers revolving around choice of evaluation methods. Both share some dangers revolving around choice of evaluation methods.

7 7 Dangers in Shared Task Exclusion Exclusion Shared task metrics become the de facto standard for evaluating research in the field Shared task metrics become the de facto standard for evaluating research in the field Doesn’t allow one to do research that doesn’t do well with the metrics (and the metrics are going to be prejudiced) Doesn’t allow one to do research that doesn’t do well with the metrics (and the metrics are going to be prejudiced) May leave generation behind – Killer Apps may find such interesting problems that generation becomes secondary. May leave generation behind – Killer Apps may find such interesting problems that generation becomes secondary. Emphasis on shallow processing excluding theoretical benefits Emphasis on shallow processing excluding theoretical benefits

8 8 Multiple or Human-Based Metrics Don’t Help

9 9 The Killer App Story The application itself must define the appropriate metric(s) – does the application work? The application itself must define the appropriate metric(s) – does the application work? Many of the things we hold near and dear have a significantly smaller influence than some other things Many of the things we hold near and dear have a significantly smaller influence than some other things Discourse coherence Discourse coherence Complicated syntax/variation in syntax Complicated syntax/variation in syntax Lexical choice Lexical choice Referring expression generation Referring expression generation

10 10 But… We KNOW these things are important! We KNOW these things are important! Problem becomes: Problem becomes: Other “more important” aspects are deemed to make more of a difference Other “more important” aspects are deemed to make more of a difference By the time these issues come up, people have invested too much time into a particular kind of solution By the time these issues come up, people have invested too much time into a particular kind of solution

11 11 Comparative Evaluations The nature of the shared/agreed upon evaluation methods placed a judgment on importance of some aspects over others The nature of the shared/agreed upon evaluation methods placed a judgment on importance of some aspects over others Evaluation is necessarily prejudiced with respect to which issues are stressed Evaluation is necessarily prejudiced with respect to which issues are stressed Referring expressions: distinguishing descriptions with concrete knowledge base Referring expressions: distinguishing descriptions with concrete knowledge base What about referring expressions in news stories? Pronoun use? Conjunctions? Influence of surrounding text? What about referring expressions in news stories? Pronoun use? Conjunctions? Influence of surrounding text?


Download ppt "1 To Share a Task or Not: Some Ramblings from a Mad (i.e., crazy) INLGer Kathy McCoy CIS Department University of Delaware."

Similar presentations


Ads by Google