1 Lecture 6: How to Design a Good Usability Evaluation Brad Myers 05-863 / 08-763 / 46-863: Introduction to Human Computer Interaction for Technology Executives.

Slides:



Advertisements
Similar presentations
Testing through user observations User Observation: Guidelines for Apple Developers, Kathleen Gomoll & Anne Nicol, January 1990 Notes based on:
Advertisements

1 Lecture 5: Evaluation Using User Studies Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives Fall,
1 Lecture 5: How to Design a Good Usability Evaluation Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives.
Running a User Study Alfred Kobsa University of California, Irvine.
Web E’s goal is for you to understand how to create an initial interaction design and how to evaluate that design by studying a sample. Web F’s goal is.
HCI 특론 (2007 Fall) User Testing. 2 Hall of Fame or Hall of Shame? frys.com.
SEVEN FREQUENTLY ASKED QUESTIONS ABOUT USABILITY TESTING Usability Testing 101.
Analyzing and Presenting Results Establishing a User Orientation Alfred Kobsa University of California, Irvine.
Lecture 5: Evaluation Using User Studies
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
1 User Testing. 2 Hall of Fame or Hall of Shame? frys.com.
Usability Testing Lecture #8 - March 5th, : User Interface Design and Development.
Think-aloud usability experiments or concurrent verbal accounts Judy Kay CHAI: Computer human adapted interaction research group School of Information.
User Interface Testing. Hall of Fame or Hall of Shame?  java.sun.com.
Energy Information Administration Usability Testing Colleen Blessing.
Heuristic Evaluation IS 485, Professor Matt Thatcher.
An evaluation framework
Saul Greenberg Evaluating Interfaces With Users Why evaluation is crucial to interface design General approaches and tradeoffs in evaluation The role of.
1 Debriefing, Recommendations CSSE 376, Software Quality Assurance Rose-Hulman Institute of Technology May 3, 2007.
ICS 463, Intro to Human Computer Interaction Design: 8. Evaluation and Data Dan Suthers.
ISE554 The WWW 3.4 Evaluation Methods. Evaluating Interfaces with Users Why evaluation is crucial to interface design General approaches and tradeoffs.
James Tam Evaluating Interfaces With Users Why evaluation is crucial to interface design General approaches and tradeoffs in evaluation The role of ethics.
Damian Gordon.  Summary and Relevance of topic paper  Definition of Usability Testing ◦ Formal vs. Informal methods of testing  Testing Basics ◦ Five.
Web 2.0 Testing and Marketing E-engagement capacity enhancement for NGOs HKU ExCEL3.
Usability Methods: Cognitive Walkthrough & Heuristic Evaluation Dr. Dania Bilal IS 588 Spring 2008 Dr. D. Bilal.
Gathering Usability Data
Chapter 14: Usability testing and field studies
1 Lecture 6: How to Design a Good Usability Evaluation Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives.
User Interface Evaluation Usability Testing Methods.
Chapter 11: An Evaluation Framework Group 4: Tony Masi, Sam Esswein, Brian Rood, & Chris Troisi.
Presentation: Techniques for user involvement ITAPC1.
Part 1-Intro; Part 2- Req; Part 3- Design  Chapter 20 Why evaluate the usability of user interface designs?  Chapter 21 Deciding on what you need to.
Gathering User Data IS 588 Dr. Dania Bilal Spring 2008.
Research and Analysis Methods October 5, Surveys Electronic vs. Paper Surveys –Electronic: very efficient but requires users willing to take them;
Human Computer Interaction
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Usability testing. Goals & questions focus on how well users perform tasks with the product. – typical users – doing typical tasks. Comparison of products.
What is Usability? Usability Is a measure of how easy it is to use something: –How easy will the use of the software be for a typical user to understand,
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley The Resonant Interface HCI Foundations for Interaction Design First Edition.
Usability Testing CS774 Human Computer Interaction Spring 2004.
Evaluating HRD Programs
Usability Testing Chapter 6. Reliability Can you repeat the test?
1 ISE 412 Usability Testing Purpose of usability testing:  evaluate users’ experience with the interface  identify specific problems in the interface.
Designing & Testing Information Systems Notes Information Systems Design & Development: Purpose, features functionality, users & Testing.
COMP5047 Pervasive Computing: 2012 Think-aloud usability experiments or concurrent verbal accounts Judy Kay CHAI: Computer human adapted interaction research.
The product of evaluation is knowledge. This could be knowledge about a design, knowledge about the user or knowledge about the task.
Usability Engineering Dr. Dania Bilal IS 582 Spring 2006.
Running a User Study Alfred Kobsa University of California, Irvine.
Facilitate Group Learning
Usability Evaluation, part 2. REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter?
EVALUATION PROfessional network of Master’s degrees in Informatics as a Second Competence – PROMIS ( TEMPUS FR-TEMPUS-JPCR)
Usability Engineering Dr. Dania Bilal IS 592 Spring 2005.
1 SEG3120 Analysis and Design for User Interfaces LAB1: Video tape evaluation.
Ethics in Evaluation Why ethics? What you have to do Slide deck by Saul Greenberg. Permission is granted to use this for non-commercial purposes as long.
1 Technical Communication A Reader-Centred Approach First Canadian Edition Paul V. Anderson Kerry Surman
Usability Evaluation or, “I can’t figure this out...do I still get the donuts?”
Steps in Planning a Usability Test Determine Who We Want To Test Determine What We Want to Test Determine Our Test Metrics Write or Choose our Scenario.
Usability Engineering Dr. Dania Bilal IS 582 Spring 2007.
Usability Engineering Dr. Dania Bilal IS 587 Fall 2007.
11/10/981 User Testing CS 160, Fall ‘98 Professor James Landay November 10, 1998.
Evaluation / Usability. ImplementDesignAnalysisEvaluateDevelop ADDIE.
School of Engineering and Information and Communication Technology KIT305/607 Mobile Application Development Week 7: Usability (think-alouds) Dr. Rainer.
Day 8 Usability testing.
Evaluation through user participation
Usability Testing 3 CPSC 481: HCI I Fall 2014 Anthony Tang.
Usability Evaluation, part 2
Lecture 6: How to Design a Good Usability Evaluation
Empirical Evaluation Data Collection: Techniques, methods, tricks Objective data IRB Clarification All research done outside the class (i.e., with non-class.
Presentation transcript:

1 Lecture 6: How to Design a Good Usability Evaluation Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives Fall, 2012, Mini 2 © Brad Myers

HW #3 Note: due on MONDAY, November 19, 2012 due to Thanksgiving holiday Note: TA office hours change accordingly © Brad Myers2

3 Why Evaluate with “ Usability Evaluations” ? Following guidelines never sufficient for good UIs Need both good design and user studies (Similar to users with CI) Note: users, subjects  participants Quality, before and after user studies Good designers Average designers © Brad Myers

4 “Don’ts” of Usability Evaluations Don’t evaluate whether it works (quality assurance) Don’t have experimenters evaluate it – get users Don’t (just) ask user questions. Not an “opinion survey.” Instead, watch their behavior. Don’t evaluate with groups: see how well site works for each person individually (not a “focus group”) Don’t train users: want to see if they can figure it out themselves. Don’t test user  evaluate the system Not a “user test”  call it Usability Evaluation instead Don’t put your ego as a designer on the line © Brad Myers

5 Issue: Reliability Do the results generalize to other people? Individual differences Up to a factor of 10 in performance If comparing two systems Statistics for confidence intervals, p<.01 But rarely are doing A vs. B studies Also, small number of users cannot evaluate an entire site Just a sample © Brad Myers

6 Issue: Validity Did the evaluation measure what we wanted? Wrong users “Confounding” factors, etc, Issues which were not controlled but not relevant to the evaluation Other usability problems, setting, etc. Ordering effects Learning effects Too much help given to some users © Brad Myers

7 Make an Evaluation Plan Goals: Formative – help decide features and design  CIs Summative – evaluate system  Now Pilot evaluations Preliminary evaluations to check materials, look for bugs, etc. Evaluate the instructions, timing Users do not have to be representative © Brad Myers

8 Evaluation Design “Between subjects” vs. “within subjects” For comparing different conditions Within: Each user does all conditions Removes individual differences Add ordering effects Between Each user does one condition Quicker for each user But need more users due to huge variation in people Randomized assignment of conditions To people, or order © Brad Myers

9 Performance Measurements Efficiency, learnability, user’s preference Time, number of tasks completed, number of errors, severity of errors, number of times help needed, quality of results, emotions, etc. Decide in advance what is relevant Can get quantifiable, objective numbers “Usability Engineering” (lecture 9) Can instrument software to take measurements Or try to log results “live” or from videotape Emotions and preferences from questionnaires and apparent frustration, happiness with system © Brad Myers

10 Questionnaire Design Collect general demographic information that may be relevant Age, sex, computer experience, etc. Evaluate feelings towards your product and other products Important to design questionnaire carefully Users may find questions confusing May not answer the question you think you are asking May not measure what you are interested in © Brad Myers

11 Questionnaire, 2 “Likert scale” Propose something and let people agree or disagree: agreedisagree The system was easy to use: “Semantic differential scale” Two opposite feelings: difficult easy Finding the right information was: If multiple choices, rank order them: Rank the choices in order of preference (with 1 being most preferred and 4 being least): Interface #1 Interface #2 Interface #3 Interface #4 (in a real survey, describe the interfaces) © Brad Myers

Survey example Hartson & Pyla, p. 446 © Brad Myers12

13 Videotaping Often useful for measuring after the evaluation But very slow to analyze and transcribe Useful for demonstrating problems to developers, management Compelling to see someone struggling Facilitate Impact analysis Which problems will be most important to fix? How many users and how much time wasted on each problem But careful notetaking will often suffice when usability problems are noticed © Brad Myers

14 “Think Aloud” Protocols “Single most valuable usability engineering method” Get user to continuously verbalize their thoughts Find out why user does things What thought would happen, why stuck, frustrated, etc. Encourage users to expand on whatever interesting But interferes with timings May need to “coach” user to keep talking Unnatural to describe what thinking Ask general questions: “What did you expect”, “What are you thinking now” Not: “What do you think that button is for”, “Why didn’t you click here” Will “give away” the answer or bias the user Alternative: have two users and encourage discussion © Brad Myers

15 Getting Users Should be representative If multiple groups of users Representatives of each group, if possible Issues: Managers will pick most able people as participants Getting users who are specialists E.g., doctors, dental assistants Maybe can get students, retirees Paying users Novices vs. experts Very different behaviors, performance, etc. © Brad Myers

16 Number of participants About 10 for statistical studies As few as 5 for usability evaluation Can update after each user to correct problems But can be misled by “spurious behavior” of a single person  Accidents or just not representative Five users cannot evaluate all of a system

17 Ethical Considerations No harm to the users Emotional distress Highly trained people especially concerned about looking foolish Emphasize system being evaluated, not user Results of evaluation and users’ identities kept confidential Stop evaluation if user is too upset At end, ask for comments, explain any deceptions, thank the participants At universities, have “Institutional Review Board” (IRB) © Brad Myers

18 Milgram Psychology Experiments Stanley Milgram Subject (“teacher” T) told by experimenter (E) to shock another person ("Learner" L, an actor) if L gets answers wrong > 65% of subjects were willing to give apparently harmful electric shocks – up to 450 volts – to a pitifully protesting victim Study created emotional distress Some subjects needed significant counseling afterward Image from Wikipedia © Brad Myers

19 Prepare for the Evaluation Set up realistic situation Write up task scenarios Write detailed script of what you will say PRACTICE Recruit users © Brad Myers

20 Who runs the experiment? Trained usability engineers know how to run a valid usability evaluation Called “facilitators” Good methodology is important 2-3 vs. 5-6 of 8 usability problems found But useful for developers & designers to watch Available if system crashes or user gets completely stuck But have to keep them from interfering Randy Pausch’s strategy Having at least one observer (notetaker) is useful Common error: don’t help too early! © Brad Myers

21 Where Evaluate? Usability Labs Cameras, 2-way mirrors, specialists Separate observation and control room Should disclose who is watching Having one may increase usability evaluations in an organization Can usually perform an evaluation anywhere Can use portable video recorder, etc. © Brad Myers

22 Stages of an Evaluation Preparation Make sure evaluation is ready to go before user arrives Introduction Say purpose is to evaluate software Consent form Give instructions Pre-test questionnaire Write down outline to make sure consistent for all users Running the evaluation Debriefing after the evaluation Post-test questionnaire, explain purpose, thanks © Brad Myers

23 Conduct the Observation Introduce the observation phase Instruct them on how to do a think aloud Final instructions (“Rules”) You won’t be able to answer Qs during, but if questions cross their mind, say them aloud If you forget to think aloud, I’ll say “Please keep talking” © Brad Myers

24 Cleaning up After an Evaluation For desktop applications Remove old files, recent file lists, etc. Harder for evaluations of web sites: In real evaluations of web sites, need to remove history to avoid hints to next user Browser history, “cookies”, etc. © Brad Myers

25 Analyze Think-Aloud Data NOT just a transcription of the tape. Establish criteria for critical incidents Record breakdowns and other observations (old: UAR Template): New: Form with rows: © Brad Myers

26 Analyzing the data Numeric data Example: times, number of errors, etc. Tables and plots using a spreadsheet Look for trends and outliers Organize problems by scope and severity Scope: How widespread is the problem? Severity: How critical is the problem? © Brad Myers

27 Scope and Severity Separately Proportion of users experiencing the problem FewMany Impact of the problem on the users who experience it SmallLow SeverityMedium Severity LargeMedium SeverityHigh Severity © Brad Myers

28 Composite Severity Ratings (From Nielsen: – not a real usability problem 1 – cosmetic problem only–need not be fixed 2 – minor usability problem–low priority 3 – major usability problem–important to fix 4 – usability catastrophe—imperative to fix before releasing product © Brad Myers

29 Write a Summarizing Report “Executive” summary Conceptual re-designs are most important If just “tuning”, then a “top ten” list Levels of severity help rank the problems “Highlights” video is often a helpful communications device © Brad Myers

30 What to do with Results Modify system to fix most important problems Can modify after each user, if don’t need statistical results No need for other users to “suffer” But remember: user is not a designer © Brad Myers