Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural versus Standardized Approaches to Spoken System Design Ellen Campana Arts, Media and Engineering Psychology Arizona State University Talk: 25/05/2007.

Similar presentations


Presentation on theme: "Natural versus Standardized Approaches to Spoken System Design Ellen Campana Arts, Media and Engineering Psychology Arizona State University Talk: 25/05/2007."— Presentation transcript:

1 Natural versus Standardized Approaches to Spoken System Design Ellen Campana Arts, Media and Engineering Psychology Arizona State University Talk: 25/05/2007

2 A Comparison Using the Dual-Task Paradigm Ellen Campana Arts, Media and Engineering Psychology Arizona State University Talk: 25/05/2007

3 February 6 th, 2006 Ellen Campana Slide 3 BCS and CS  Convergence Increasingly rich domains Multidimensional data Continuous models  Synergy Theoretical  Dialog Systems offer explicit models for rich domains  Study of human language offers models of continuous processing Methodological  Dialog Systems can track & manipulate variables during interaction  Study of humans provides methods for comparison & evaluation  Will Result In Much more detailed models of the mechanisms underlying human language in real-world contexts Improved Human-Computer Interaction for real users

4 February 6 th, 2006 Ellen Campana Slide 4 Outline of Talk  Background Dialogue Systems (Natural & Standardized) System Evaluation Referring Expressions  Experiments Discourse Context & Referring Expressions Visual Context & Referring Expressions  Summary and Implications

5 February 6 th, 2006 Ellen Campana Slide 5 Dialogue Systems  Computer Interfaces Produce speech output Understand speech input  Used by people to achieve real-world goals  Should be easy to use  Different approaches to making things “easy” Naturalness Standardization

6 February 6 th, 2006 Ellen Campana Slide 6 Natural vs. Standardized Systems VS. Standardized Above all, systems must be consistent: throughout a single use across different applications With enough practice, users will learn how the systems work and the systems will become easier and easier for them to use. People are good at learning Natural Above all, systems must be natural. The goal should be to approximate human-human communication as closely as possible in all respects. If the approximation is close enough, using the system will be as easy as talking with another person. People are good at talking to each other

7 February 6 th, 2006 Ellen Campana Slide 7 Example: Natural Approach UserWe need to get the woman in Penfield to Strong SystemOK UserWhat vehicles are available? SystemThere are ambulances in Pittsford and Webster UserOK. Use one from Pittsford SystemDid you know that Route 96 is blocked due to construction? UserOh… Let’s use the interstate instead. SystemOK. I’ll dispatch the crew. Emergency dispatch using the TRIPS system (Allen et al., 2001)

8 February 6 th, 2006 Ellen Campana Slide 8 Example: Natural Approach  Advantages Say what comes naturally Understanding is natural Users don’t need training  Disadvantages Domain-specific Porting is time-consuming and costly Requires trained developers Still error-prone Still slow in real use

9 February 6 th, 2006 Ellen Campana Slide 9 Example: Standardized Approach UserTheater is Showcase Cinemas Pittsburgh East …… UserWhat are the titles SystemFive titles: My Big Fat Greek Wedding, Abandon, Brown Sugar {…} UserTitle is My Big Fat Greek Wedding SystemMy Big Fat Greek Wedding UserWhat are the show times? System4 Showtimes: 3:45, 5:45, 7:45, 9:45 Data lookup using the Universal Speech Interface (Tomko et al., 2005)

10 February 6 th, 2006 Ellen Campana Slide 10 Example: Standardized Approach  Advantages Domain-independent structure Porting cheap and easy Low error rate Fast to use  Disadvantages Users need to learn interaction style  Keywords  Constraints Users need to adhere to system constraints

11 February 6 th, 2006 Ellen Campana Slide 11 System Evaluation and Comparison  Developer time & effort  Effects on user Subjective surveys  Holistic, interaction-level ratings Component accuracy measures  Confounded with system design Efficiency measures  Often conflict  Efficiency is not the same as ease-of-use  Need: fine-grained objective ease-of-use metric

12 February 6 th, 2006 Ellen Campana Slide 12 Measuring Ease-of-use Directly Ease of use: the degree to which a task or process consumes limited-capacity human cognitive resources Vision Attention Memory Central Executive  Can be measured using Dual-Task Methodology Used in GUI design (but not in speech so far) Deeper understanding of contributing factors Findings that generalize to new domains / systems

13 February 6 th, 2006 Ellen Campana Slide 13 Case Study: Referring Expressions  Referring Expressions = words or phrases that are used to talk about things in the world  Central to dialog systems / speech interfaces Generated & understood by systems and users  Errors lead to miscommunication  System design approaches differ Natural: Form depends on context Standardized: Context-independent form  Complicated (ambiguity)

14 February 6 th, 2006 Ellen Campana Slide 14 Ambiguity is Everywhere “the orange mug” “the mug” “it” or “that” “Ellen’s favorite coffee mug” “the mug I used to keep rubber bands in” “whatever I left in the other room” “all the dirty dishes”

15 February 6 th, 2006 Ellen Campana Slide 15 Ambiguity is Everywhere “whatever I left in the other room”

16 February 6 th, 2006 Ellen Campana Slide 16 Natural Approach “the orange mug” “the mug” “it” or “that” “Ellen’s favorite coffee mug” “the mug I used to keep rubber bands in” “whatever I left in the other room” “all the dirty dishes”

17 February 6 th, 2006 Ellen Campana Slide 17 Standardized Approach “the orange mug” “the mug” “it” or “that” “Ellen’s favorite coffee mug” “the mug I used to keep rubber bands in” “whatever I left in the other room” “all the dirty dishes”

18 February 6 th, 2006 Ellen Campana Slide 18 Standardized Approach “the orange mug” UserOptions SystemYou can specify or ask about container, utensil, food {…} UserContainer options SystemContainer can be the blue bowl, the orange mug, the fat mug {…}

19 February 6 th, 2006 Ellen Campana Slide 19 Natural vs. Standardized Systems  Natural Approach Distribution over a Lifetime Specialized Language Abilities Perception (low-level)  Standardized Approach

20 February 6 th, 2006 Ellen Campana Slide 20 Natural vs. Standardized Systems  Natural Approach Distribution over a Lifetime Specialized Language Abilities Perception (low-level)  Standardized Approach Local Distribution

21 February 6 th, 2006 Ellen Campana Slide 21 Natural vs. Standardized Systems  Natural Approach Distribution over a Lifetime Specialized Language Abilities Perception (low-level)  Standardized Approach Local Distribution Learning and Adaptation

22 February 6 th, 2006 Ellen Campana Slide 22 Natural vs. Standardized Systems  Natural Approach Distribution over a Lifetime Specialized Language Abilities Perception (low-level)  Standardized Approach Local Distribution Learning and Adaptation Perspective-taking (high-level)

23 February 6 th, 2006 Ellen Campana Slide 23 Natural vs. Standardized Systems  Natural Approach Distribution over a Lifetime Specialized Language Abilities Perception (low-level)  Standardized Approach Local Distribution Learning and Adaptation Perspective-taking (high-level) Best system might have elements of both!

24 Experiments

25 February 6 th, 2006 Ellen Campana Slide 25 Experiment Structure Two experiments with the same basic structure  Comparison: Natural vs. Standardized systems  Design: Between-Subjects  Focus: System generation / user understanding of referring expressions in a visual environment  Measure: Dual-task comparison

26 February 6 th, 2006 Ellen Campana Slide 26 Dual-Task Methodology Users are asked to do two things at once Primary Task: the one we’re measuring Secondary Task: simple, well-understood To compare systems manipulate the primary task while holding the secondary task constant across the conditions Using Spoken Interface Flicker Detection

27 February 6 th, 2006 Ellen Campana Slide 27 Dual-Task Methodology Ease of use: the degree to which a task or process consumes limited-capacity human cognitive resources ???? Visual Attention Secondary Task (Flicker Detection) Primary Task (Using Spoken Interface)

28 February 6 th, 2006 Ellen Campana Slide 28 Dual-Task Measures  Secondary task (detecting flickers) Reaction Times Precision (# hits / # flashes) Recall (# hits / # responses)  Primary task (using spoken interface) Reaction Times Percent Correct

29 February 6 th, 2006 Ellen Campana Slide 29 Put the small blue square above the big blue triangle.

30 Experiment 1: Discourse Context

31 February 6 th, 2006 Ellen Campana Slide 31 Discourse Context and Reference  Discourse Context = history of entities that have been previously referred to within a conversation or discourse, when, and how  Discourse context affects referring form Most “Salient” Entities Least “Salient” Entities Topic of Prev. Sent.New / Never Mentioned Full Noun PhrasePronounReduced Noun Phrase “the big orange mug”“the mug” “it”

32 February 6 th, 2006 Ellen Campana Slide 32 Natural use of Discourse Context

33 February 6 th, 2006 Ellen Campana Slide 33 Natural use of Discourse Context Please put the big blue bowl to the right of the big orange mug. Now put the mug to the right of the small yellow bowl. Even though there are 4 mugs, listeners don’t tend to get confused

34 February 6 th, 2006 Ellen Campana Slide 34 Experimental Trials “Put [entity1] above [entity2]” “Now put [entity1] in the center.” “Now click on [entity4].” [entity2][entity3] Most “Salient” Entities Least “Salient” Entities Topic of Prev. Sent.New / Never Mentioned Full Noun Phrase Pronoun Reduced Noun Phrase [entity3] [entity 4][entity2][entity1]

35 February 6 th, 2006 Ellen Campana Slide 35 Experimental Trials  Natural Condition Put the small blue square above the big blue triangle. Now put the triangle in the center. Now click on the big red square.  Standardized Condition Put the small blue square above the big blue triangle. Now put the big blue triangle in the center. Now click on the big red square.

36 February 6 th, 2006 Ellen Campana Slide 36 Experimental Trials  Natural Condition Put the small blue square above the big blue triangle. Now put the triangle in the center. Now click on the big red square.  Standardized Condition Put the small blue square above the big blue triangle. Now put the big blue triangle in the center. Now click on the big red square.

37 February 6 th, 2006 Ellen Campana Slide 37 Experimental Design  136 total trials 36 Experimental trials (flicker at NP offset in 2 nd cmd)  12 REDUCED NP  12 PRONOUN  12 FULL NP 36 Matched Filler trials (no flicker in 2 nd instr) 64 Misc Filler trials  3 instructions per trial  Flickers in 50% of instructions overall  20 total participants (10 in each condition)

38 February 6 th, 2006 Ellen Campana Slide 38 Results System Type Secondary Task Reaction Times in milliseconds Secondary Task Reaction Times *

39 February 6 th, 2006 Ellen Campana Slide 39 Results Secondary Task Reaction Times in milliseconds Trial Type Secondary Task Reaction Times * *

40 February 6 th, 2006 Ellen Campana Slide 40 Results Secondary Task Precision (precision = # hits / # flashes) Average Secondary Task Precision System Type *

41 February 6 th, 2006 Ellen Campana Slide 41 Results Average Secondary Task Recall System Type Secondary Task Recall (recall = # hits / # responses) *

42 February 6 th, 2006 Ellen Campana Slide 42 Results  In NATURAL condition Participants faster in secondary task Participants more accurate in secondary task  NOT due to sacrificing primary task performance No differences in primary task RTs No differences in primary task accuracy  NOT due to lack of practice in STANDARDIZED condition If anything, participants got worse over time

43 February 6 th, 2006 Ellen Campana Slide 43 Results Experiment Half Percent Correct Primary Task Accuracy *

44 February 6 th, 2006 Ellen Campana Slide 44 Experiment 1 Conclusions  Referring expressions that were NATURAL with respect to discourse context required fewer resources for users to understand  Entire distribution of system utterances was important (not just individual instructions)  Systems should not systematically ignore discourse context

45 Experiment 2: Visual Context

46 February 6 th, 2006 Ellen Campana Slide 46 Visual Context and Reference  Visual Context = set of entities that are available for referring to  Balance between informativity and redundancy Include adjectives when necessary Don’t include adjectives when redundant Especially true for scalar adjectives (less for color)  Violations (over-/under-specification) = problem Inferences about set of entities Inferences about speaker

47 February 6 th, 2006 Ellen Campana Slide 47 Natural Reference Examples Look! A bowl!

48 February 6 th, 2006 Ellen Campana Slide 48 Natural Reference Examples Look! A bowl! Look! A big bowl!

49 February 6 th, 2006 Ellen Campana Slide 49 Natural Reference Examples Look! A bowl! Look! A big bowl! Look! A big yellow bowl!

50 February 6 th, 2006 Ellen Campana Slide 50 Natural Reference Examples Look! A big yellow bowl!

51 February 6 th, 2006 Ellen Campana Slide 51 Experimental Design  FULL NP Size and color necessary

52 February 6 th, 2006 Ellen Campana Slide 52 Put the big blue square in the top left corner

53 February 6 th, 2006 Ellen Campana Slide 53 Experimental Design  FULL NP Size and color necessary  REDUCED_MISLEADING Size and color both redundant Size and color actually misleading

54 February 6 th, 2006 Ellen Campana Slide 54 Put the triangle in the top left corner

55 February 6 th, 2006 Ellen Campana Slide 55 Put the big blue triangle in the top left corner

56 February 6 th, 2006 Ellen Campana Slide 56 Put thetriangle

57 February 6 th, 2006 Ellen Campana Slide 57 Put thebigbluetriangle

58 February 6 th, 2006 Ellen Campana Slide 58 Experimental Design  FULL NP Size and color necessary  REDUCED_MISLEADING Size and color both redundant Size and color actually misleading  REDUCED_EQUALINFO Size and color both redundant Information content equal across conditions

59 February 6 th, 2006 Ellen Campana Slide 59 Put the triangle in the top left corner

60 February 6 th, 2006 Ellen Campana Slide 60 Put the big blue triangle in the top left corner

61 February 6 th, 2006 Ellen Campana Slide 61 Put thetriangle

62 February 6 th, 2006 Ellen Campana Slide 62 Put thebigbluetriangle

63 February 6 th, 2006 Ellen Campana Slide 63 Experimental Design  168 total trials 36 Experimental trials  12 FULL NP  12 REDUCED MISLEADING  12 REDUCED EQUAL INFO 36 Matched Filler trials (no flicker) 96 Misc Filler trials  Just one instruction per trial  Flickers in 50% of trials  24 total participants (12 in each condition)

64 February 6 th, 2006 Ellen Campana Slide 64 Results  Only differences found in EQUAL_INFO trials

65 February 6 th, 2006 Ellen Campana Slide 65 Results Secondary Task Reaction Times in milliseconds System Type Secondary Task Reaction Times REDUCED_EQUAL_INFO trials * *

66 February 6 th, 2006 Ellen Campana Slide 66 Results  Only differences found in EQUAL_INFO trials Participants in NATURAL condition were faster in secondary task NOT due to speed-accuracy tradeoff NOT due to sacrificing primary task performance

67 February 6 th, 2006 Ellen Campana Slide 67 Results Experiment Half Secondary Task Reaction Times REDUCED_EQUAL_INFO trials Secondary Task Reaction Times in milliseconds *

68 February 6 th, 2006 Ellen Campana Slide 68 Results  Only differences found in EQUAL_INFO trials Participants in NATURAL condition were faster in secondary task NOT due to speed-accuracy tradeoff NOT due to sacrificing primary task performance  Practice made even that difference disappear

69 February 6 th, 2006 Ellen Campana Slide 69 Experiment 2 Conclusions  Referring expressions that were NATURAL with respect to visual context did not result in robust benefits in terms of ease-of-use  Users seem to be able to adapt to systematic over-specification in system productions  Visual context may be something that developers can streamline or ignore

70 Summary and Implications

71 February 6 th, 2006 Ellen Campana Slide 71 Contributions of this Work  Theoretical Articulated the two approaches and gave them names Necessary to bring debate back(?) to empirical realm  Methodological Introduced dual-task methodology as a means for comparing dialogue system design approaches Demonstrated concretely how to use it  Empirical Discourse context = worth the effort for natural Visual context = candidate for standardization

72 Thank You! Questions?


Download ppt "Natural versus Standardized Approaches to Spoken System Design Ellen Campana Arts, Media and Engineering Psychology Arizona State University Talk: 25/05/2007."

Similar presentations


Ads by Google