Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University.

Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University of Washington, dub Group CHI 2011

Discretionary Use of Interfaces CHI research community grew from discretionary use of computer interfaces (starting from 1980s), meaning free choices (i.e., people choose which interfaces to use to accomplish their tasks) Now, task (and its goal) is a choice (e.g., blogs, web browsing, SNS, Wikipedia), ubiquitous applications (e.g., smartphones, Nike+iPod) Widely accepted evaluation metrics in CHI research: – Indirect prediction about whether an interface will be preferred over other alternatives – Examples: time-on-task, # of errors, subjective interpretations of think-aloud, survey reports

Evaluating “User Choices” Industry: A/B testing (split testing, bucket testing) – Method of marketing testing by which multiple versions of one element are tested against a metric to define which is more successful – These versions undergo testing simultaneously to determine which is better – Conversions are measured from the different sets of users (between-subjects) Yet, A/B testing is challenging: large up-front investment and large existing user-base to deploy/test (say, thousands of people) vs. Sample size matters Control (baseline) Treatment A Treatment B  Statistical significance test (e.g., t-test or chi-square)

Measuring User’s Preference Proposal: a semi-automated approach – Post thousands of “interface test tasks” to M-Turk – Observe how workers choose to complete the tasks (and how many times they do so) – Analyze the data to measure the preference How?

Example: Fitts’ law test Fittsʼ law models the time required to click a widget of a size and width—this technique can model how much people prefer to use a widget Width Distance Difficulty = f(width, distance) Each time they clicked on the bar, it moved to the opposite side of the screen Bar moves Click! For a given job, subjects are asked to click on a blue rectangle 60 times

Example: Fitts’ law test Participants were assigned one of three index of difficulty conditions. Each point is the number of clicks a participant completed before quitting (points jittered to show spread) Participants preferred big buttons to small buttons (p < 0.10) Participants were allowed a maximum of 3,060 clicks each The regression line accounts for this maximum using a Tobit analysis

Utility Utility in Economics: – The degree to which a person prefers a particular choice among options available When a user chooses to use system A instead of B, it’s said that Utility(A) > Utility(B) Use economic utility to quantify aggregate user preference – Example: If a user has no preference between (1) being paid $0.25 for using system A, and (2) being paid $0.50 for using system B – Money-metric of utility: |Utility(A) – Utility(B)| = $0.25

Measuring Utility Utility = f(task, interface, context) – A user finds values in completing a task, but takes some actions with a computer through some interface – And the user’s context matters (e.g., demographics, social, moral status, etc.) Preference measurement begins with determining how much you must pay people to convince them to use an interface for a task

Measuring Utility Reservation wage: the wage below which a worker will not take a task Present a worker with a job at a price and observe their behavior: the worker will either complete a task at a given price or not Gather/analyze all the data: (Interface ID, Worker ID, Wage, Number of Completions)

Measuring Utility Posting all scenarios/conditions simultaneously to M-Turk Handling selection bias via a mystery task with “??? price” Setting a limit on sub-tasks that a single worker can complete (e.g., 50) Handling market price fluctuations (as people likes to take high paying tasks)

Fitts’ Law Study Subjects clicked on a blue rectangle 60 times Each time they clicked on the bar, it moved to the opposite side of the screen Width Distance Difficulty = f(width, distance)

Fitts’ Law Study Price range: $0.01-$0.06 Difficulty: easy, medium, hard Each task: 60 clicks Upper limit of # tasks: 51 5 hours 15 minutes, $970

Aesthetics: CAPTCHAs

Survival graph shows how many workers made it through how many tasks, for each of the four experimental conditions Pretty and ugly lines are separated at the left, but converge toward the right – This suggests either that the utility effect of aesthetics fades over time, or that the types of users who complete many CAPTCHAs are more concerned with pay than aesthetics. The shaded regions are 95% confidence intervals

Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University.

Similar presentations

Presentation on theme: "Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University.

Similar presentations

Presentation on theme: "Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University."— Presentation transcript:

Similar presentations

About project

Feedback