Lecture 6: How to Design a Good Usability Evaluation

Slides:



Advertisements
Similar presentations
1 Lecture 5: Evaluation Using User Studies Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives Fall,
Advertisements

1 Lecture 5: How to Design a Good Usability Evaluation Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives.
Web E’s goal is for you to understand how to create an initial interaction design and how to evaluate that design by studying a sample. Web F’s goal is.
Each individual person is working on a GUI subset. The goal is for you to create screens for three specific tasks your user will do from your GUI Project.
Lecture 5: Evaluation Using User Studies
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
1 User Testing. 2 Hall of Fame or Hall of Shame? frys.com.
User Interface Testing. Hall of Fame or Hall of Shame?  java.sun.com.
An evaluation framework
ISE554 The WWW 3.4 Evaluation Methods. Evaluating Interfaces with Users Why evaluation is crucial to interface design General approaches and tradeoffs.
James Tam Evaluating Interfaces With Users Why evaluation is crucial to interface design General approaches and tradeoffs in evaluation The role of ethics.
Chapter 14: Usability testing and field studies
Introduction to Usability By : Sumathie Sundaresan.
1 Lecture 6: How to Design a Good Usability Evaluation Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives.
User Interface Evaluation Usability Testing Methods.
Evaluation Framework Prevention vs. Intervention CHONG POH WAN 21 JUNE 2011.
Chapter 11: An Evaluation Framework Group 4: Tony Masi, Sam Esswein, Brian Rood, & Chris Troisi.
Part 1-Intro; Part 2- Req; Part 3- Design  Chapter 20 Why evaluate the usability of user interface designs?  Chapter 21 Deciding on what you need to.
Research and Analysis Methods October 5, Surveys Electronic vs. Paper Surveys –Electronic: very efficient but requires users willing to take them;
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Usability testing. Goals & questions focus on how well users perform tasks with the product. – typical users – doing typical tasks. Comparison of products.
Usability Testing Chapter 6. Reliability Can you repeat the test?
1 ISE 412 Usability Testing Purpose of usability testing:  evaluate users’ experience with the interface  identify specific problems in the interface.
Designing & Testing Information Systems Notes Information Systems Design & Development: Purpose, features functionality, users & Testing.
1 Lecture 6: How to Design a Good Usability Evaluation Brad Myers / / : Introduction to Human Computer Interaction for Technology Executives.
Usability Evaluation, part 2. REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter?
EVALUATION PROfessional network of Master’s degrees in Informatics as a Second Competence – PROMIS ( TEMPUS FR-TEMPUS-JPCR)
Introduction to Usability By : Sumathie Sundaresan.
By Godwin Alemoh. What is usability testing Usability testing: is the process of carrying out experiments to find out specific information about a design.
Usability Evaluation or, “I can’t figure this out...do I still get the donuts?”
11/10/981 User Testing CS 160, Fall ‘98 Professor James Landay November 10, 1998.
Evaluation / Usability. ImplementDesignAnalysisEvaluateDevelop ADDIE.
Hi there. Please silence your mobile phone. Offenders will be invited
Usability Report: Errors and Tips
Using Wikis to Facilitate Collaborative Research Projects
PeerWise Student Instructions
AP CSP: Cleaning Data & Creating Summary Tables
User Interface Evaluation
Evaluation through user participation
Sigma-Aldrich PT Portal
Designing Questionnaire
Imran Hussain University of Management and Technology (UMT)
Lecture3 Data Gathering 1.
Experimental Psychology
What aspect of usability is this clip illustrating?
Soliciting Reader Contributions to Software Tutorials
A Usability Study of a Language Centre Web Site
Usability Testing 3 CPSC 481: HCI I Fall 2014 Anthony Tang.
Open for Business: Website User Testing.
Presented by Arseniy Mstislavskiy
Usability Evaluation, part 2
Questionnaires and interviews
About Market Research Making a Questionnaire
Experimental Psychology PSY 433
Lecture 2: Discovering what people can't tell you: Contextual Inquiry and Analysis Methodology Brad Myers / : Introduction to Human Computer.
Business and Management Research
DAY 2: Create PT: Make a Plan
Completing the tasks for A452 with….
Title of notes: Text Annotation page 7 right side (RS)
Academic Communication Lesson 3
Experimental Design.
Experimental Design.
User Testing November 27, 2007.
Business and Management Research
CS305, HW1, Spring 2008 Evaluation Assignment
Usability Testing.
Lecture 2: Discovering what people can't tell you: Contextual Inquiry and Analysis Methodology Brad Myers / : Introduction to Human Computer.
Empirical Evaluation Data Collection: Techniques, methods, tricks Objective data IRB Clarification All research done outside the class (i.e., with non-class.
Cognitive Walkthrough
Experimental Psychology PSY 433
Presentation transcript:

Lecture 6: How to Design a Good Usability Evaluation Brad Myers 05-863 / 45-888: Introduction to Human Computer Interaction for Technology Executives Fall, 2017, Mini 2 © 2017 - Brad Myers

Homeworks HW #1 graded on Canvas HW #2 due today Can download annotated pdf files Can redo if more than 10 points off HW #2 due today Generous late policy for #1, #2 and #3 (but not #4-6) Start on HW #3 – note that due on Monday Shorter time to do it Warning: cannot be easily done quickly – two phases Not diagrams this time – instead use table in the template document © 2017 - Brad Myers

Why Evaluate with “Usability Evaluations”? Following guidelines never sufficient for good UIs Need both good design and user studies (Similar to users with CI) Note: users, subjects  participants Good designers Average designers © 2017 - Brad Myers Quality, before and after user studies

“Don’ts” of Usability Evaluations Don’t evaluate whether it works (quality assurance) Don’t have experimenters evaluate it – get users Don’t (just) ask user questions. Not an “opinion survey.” Instead, watch their behavior. Don’t evaluate with groups: see how well system works for each person individually (not a “focus group”) Don’t train users: want to see if they can figure it out themselves. Don’t test user  evaluate the system Not a “user test”  call it Usability Evaluation instead Don’t put your ego as a designer on the line © 2017 - Brad Myers

Issue: Reliability Do the results generalize to other people? Individual differences Up to a factor of 10 in performance If comparing two systems Statistics for confidence intervals, p<.01 But rarely are doing A vs. B studies Also, small number of users cannot evaluate an entire site Just a sample © 2017 - Brad Myers

Issue: Validity Did the evaluation measure what we wanted? Wrong users “Confounding” factors, etc, Issues which were not controlled but not relevant to the evaluation Other usability problems, setting, etc. Ordering effects Learning effects Too much help given to some users © 2017 - Brad Myers

Plan your Evaluation Goals: Pilot evaluations Formative – help decide features and design  CIs Summative – evaluate system  Now Pilot evaluations Preliminary evaluations to check materials, look for bugs, etc. Evaluate the instructions, timing Users do not have to be representative © 2017 - Brad Myers

Evaluation Design “Between subjects” vs. “within subjects” For comparing different conditions Within: Each user does all conditions Removes individual differences Add ordering effects Between Each user does one condition Quicker for each user But need more users due to huge variation in people Randomized assignment of conditions To people, or order © 2017 - Brad Myers

Some Measurements Learnability: Time to learn how to do specific tasks (at a specific proficiency) Efficiency: (Expert) Time to execute benchmark (typical) tasks. Throughput. Errors: Error rate per task. Time spent on errors. Error severity. Lots of measures from web analytics: Abandonment rates, Completion rates, Clickthroughs, % completions, etc. Subjective satisfaction: Questionnaire. © 2017 - Brad Myers

Performance Measurements Time, number of tasks completed, number of errors, severity of errors, number of times help needed, quality of results, emotions, etc. Decide in advance what is relevant Can get quantifiable, objective numbers “Usability Engineering” Can instrument software to take measurements Or try to log results “live” or from videotape Some available from web analytics Emotions and preferences from questionnaires and apparent frustration, happiness with system © 2017 - Brad Myers

Goal Levels Pick Levels for your system: Minimum acceptable level Desired (planned) level Theoretical best level Current level or competitor's level Minimum Acceptable Current Best Desired 1 2 5 Errors © 2017 - Brad Myers

Questionnaire Design Collect general demographic information that may be relevant Age, sex, computer experience, etc. Evaluate feelings towards your product and other products Important to design questionnaire carefully Users may find questions confusing May not answer the question you think you are asking May not measure what you are interested in © 2017 - Brad Myers

Questionnaire, 2 “Likert scale” “Semantic differential scale” Propose something and let people agree or disagree: agree disagree The system was easy to use: 1 .. 2 .. 3 .. 4 .. 5 “Semantic differential scale” Two opposite feelings: difficult easy Finding the right information was: -2 .. -1 .. 0 .. 1 .. 2 If multiple choices, rank order them: Rank the choices in order of preference (with 1 being most preferred and 4 being least): Interface #1 Interface #2 Interface #3 Interface #4 (in a real survey, describe the interfaces) © 2017 - Brad Myers

Question Design Very hard to design questions that are hard to misunderstand or misread Clear writing, simple sentences If users make mistakes, then questionnaire is invalid For example, all positive answers in a column Do not alternate (ref: http://www.measuringu.com/positive-negative.php) This website was easy to use. It was difficult to find what I needed on this website. User confusion overrides trying to make people pay attention (doesn’t work) © 2017 - Brad Myers

Questionnaire, examples Examples of problems: “How big is the codebase for this project?” 300 50k SLOC 342658 2000 files Big Revised: have ranges instead of a textbox: Up to 1000 LOC 1000 – 10,000 10,000 – 100,000 100,000 – 1,000,000 Bigger than 1,000,000 © 2017 - Brad Myers

Standard (Validated) Questionnaires “Questionnaire for User Interface Satisfaction” (QUIS) Chin, J.P., Diehl, V.A., Norman, K.L. (1988) Development of an Instrument Measuring User Satisfaction of the Human-Computer Interface. ACM CHI'88 Proceedings, 213-218. http://lap.umd.edu/quis/ Short version: http://hcibib.org/perlman/question.cgi?form=QUIS License costs money, but can be downloaded for free. Long version has 16 pages of questions “System usability scale” – short questionnaire (10-12 items) Brooke, J. (1996). "SUS: a "quick and dirty" usability scale". In P. W. Jordan, B. Thomas, B. A. Weerdmeester, & A. L. McClelland. Usability Evaluation in Industry. London: Taylor and Francis. http://www.usabilitest.com/uxxc4jP © 2017 - Brad Myers

Survey example Hartson & Pyla, p. 446 © 2017 - Brad Myers

http://www.usabilitest.com/uxxc4jP © 2017 - Brad Myers

Videotaping Often useful for measuring after the evaluation But very slow to analyze and transcribe Useful for demonstrating problems to developers, management Compelling to see someone struggling Facilitate Impact analysis Which problems will be most important to fix? How many users and how much time wasted on each problem But careful notetaking will often suffice when usability problems are noticed © 2017 - Brad Myers

“Think Aloud” Protocols “Single most valuable usability engineering method” – Nielsen Get user to continuously verbalize their thoughts Find out why user does things What thought would happen, why stuck, frustrated, etc. Encourage users to expand on whatever interesting But interferes with timings May need to “coach” user to keep talking Unnatural to describe what thinking Ask general questions: “What did you expect”, “What are you thinking now” Not: “What do you think that button is for”, “Why didn’t you click here” Will “give away” the answer or bias the user Alternative: have two users and encourage discussion © 2017 - Brad Myers

Getting Users Should be representative If multiple groups of users Representatives of each group, if possible Issues: Managers will pick most able people as participants Getting users who are specialists E.g., doctors, dental assistants Maybe can get students, retirees Paying users Novices vs. experts Very different behaviors, performance, etc. © 2017 - Brad Myers

Ideas for getting users If you want people who are not like you Laundromats Coffee shops Paying people Good for general people Not for companies Not for “highly compensated” people (Doctors) But everyone likes a gift, like a Starbucks card © 2017 - Brad Myers

Number of participants About 20 for statistical studies As few as 5 for usability evaluation Can update after each user to correct problems But can be misled by “spurious behavior” of a single person Accidents or just not representative cite: https://www.nngroup.com/articles/how-many-test-users/ Five users cannot evaluate all of a system Different tests for different parts Can’t just do longer tests © 2017 - Brad Myers

Another chart for number Testing more users didn't result in appreciably more insights cite: https://www.nngroup.com/articles/how-many-test-users/ © 2017 - Brad Myers

Ethical Considerations No harm to the users Emotional distress Highly trained people especially concerned about looking foolish Emphasize system being evaluated, not user Results of evaluation and users’ identities kept secret Stop evaluation if user is too upset At end, ask for comments, explain any deceptions, thank the participants At universities, have “Institutional Review Board” (IRB) © 2017 - Brad Myers

Milgram Psychology Experiments Stanley Milgram 1961-1962 Subject (“teacher” T) told by experimenter (E) to shock another person ("Learner" L, an actor) if L gets answers wrong > 65% of subjects were willing to give apparently harmful electric shocks – up to 450 volts – to a pitifully protesting victim Study created emotional distress Some subjects needed significant counseling afterward http://www.stanleymilgram.com/ Image from Wikipedia © 2017 - Brad Myers

Authoring the Evaluation Set up realistic situation Write up task scenarios Write detailed script of what you will say PRACTICE Recruit users © 2017 - Brad Myers

Who runs the experiment? Trained usability engineers know how to run a valid usability evaluation Called “facilitators” Good methodology is important 2-3 vs. 5-6 of 8 usability problems found But useful for developers & designers to watch Available if system crashes or user gets completely stuck But have to keep them from interfering Randy Pausch’s strategy Having at least one observer (notetaker) is useful Common error: don’t help too early! © 2017 - Brad Myers

Where Evaluate? Usability Labs Cameras, 2-way mirrors, specialists Separate observation and control room Should disclose who is watching Having one may increase usability evaluations in an organization Can usually perform an evaluation anywhere Can use portable video recorder, screen recorder, etc. © 2017 - Brad Myers

Stages of an Evaluation Preparation Introduction Running the evaluation Cleanup after the evaluation © 2017 - Brad Myers

Preparation and Introduction Make sure evaluation is ready to go before user arrives Introduce the observation phase Say purpose is to evaluate software Consent form Pre-test questionnaire Give instructions Instruct them on how to do a think aloud Write down script to make sure consistent for all users Final instructions (“Rules”): Say that you won’t be able to answer questions during, but if questions cross their mind, say them aloud If you forget to think aloud, I’ll say “Please keep talking” Introduce the observation phase Begin observation <play DateTime3.scm> Conclude observation <play conclusion.scm> © 2017 - Brad Myers

Running the Evaluation Run the think-aloud At end: Post-test questionnaire Explain purpose & any deceptions Thanks © 2017 - Brad Myers

Cleaning up after an evaluation For desktop applications Remove old files, recent file lists, etc. Harder for evaluations of web sites: In evaluations of web sites, need to remove history to avoid hints to next user Browser history, “cookies”, etc. © 2017 - Brad Myers

Analyzing the data Numeric data Example: times, number of errors, etc. Tables and plots using a spreadsheet Look for trends and outliers Organize problems by scope and severity Scope: How widespread is the problem? Severity: How critical is the problem? http://www.cs.cmu.edu/~bam/uicourse/UsabilityEvalReport_template2016.docx © 2017 - Brad Myers

Scope and Severity Separately Proportion of users experiencing the problem Few Many Impact of the problem on the users who experience it Small Low Severity Medium Severity Large High Severity © 2017 - Brad Myers

Write a Summarizing Report “Executive” summary Conceptual re-designs are most important If just “tuning”, then a “top ten” list Levels of severity help rank the problems “Highlights” video is often a helpful communications device © 2017 - Brad Myers

What to do with Results Modify system to fix most important problems Can modify after each user, if don’t need statistical results No need for other users to “suffer” But remember: user is not a designer Don’t necessarily adopt the user’s proposed fixes © 2017 - Brad Myers