Intro to Evaluation See how (un)usable your software really is…

Slides:



Advertisements
Similar presentations
Evaluation of User Interface Design
Advertisements

Copyright 1999 all rights reserved The HCI Design Process n User Interfaces are not just built by sitting down and drawing up designs for them n Just like.
6.811 / PPAT: Principles and Practice of Assistive Technology Wednesday, 16 October 2013 Prof. Rob Miller Today: User Testing.
CS305: HCI in SW Development Evaluation (Return to…)
Cognitive Walkthrough More evaluation without users.
Data analysis and interpretation. Agenda Part 2 comments – Average score: 87 Part 3: due in 2 weeks Data analysis.
IAT 334 Experimental Evaluation ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY.
©N. Hari Narayanan Computer Science & Software Engineering Auburn University 1 COMP 7620 Evaluation Chapter 9.
Observation Watch, listen, and learn…. Agenda  Observation exercise Come back at 3:40.  Questions?  Observation.
Empirical Methods in Human- Computer Interaction.
Think-aloud usability experiments or concurrent verbal accounts Judy Kay CHAI: Computer human adapted interaction research group School of Information.
User Interface Testing. Hall of Fame or Hall of Shame?  java.sun.com.
Experiments Testing hypotheses…. Agenda Homework assignment Review evaluation planning Observation continued Empirical studies In-class practice.
Usable Privacy and Security Carnegie Mellon University Spring 2008 Lorrie Cranor 1 Designing user studies February.
Intro to Evaluation See how (un)usable your software really is…
An evaluation framework
ICS 463, Intro to Human Computer Interaction Design: 8. Evaluation and Data Dan Suthers.
Gathering Usability Data
Empirical Evaluation Assessing usability (with users)
Usability Testing. Testing Methods Same as Formative Surveys/questionnaires Interviews Observation Documentation Automatic data recording/tracking Artificial/controlled.
Predictive Evaluation
Evaluation Framework Prevention vs. Intervention CHONG POH WAN 21 JUNE 2011.
Presentation: Techniques for user involvement ITAPC1.
1 Testing the UI This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory Abowd, Jim Foley,
Intro to Evaluation See how (un)usable your software really is…
Ch 14. Testing & modeling users
Multimedia Specification Design and Production 2013 / Semester 1 / week 9 Lecturer: Dr. Nikos Gazepidis
Evaluating a Research Report
Fall 2002CS/PSY Empirical Evaluation Analyzing data, Informing design, Usability Specifications Inspecting your data Analyzing & interpreting results.
Human Computer Interaction
Usability Evaluation June 8, Why do we need to do usability evaluation?
What is Usability? Usability Is a measure of how easy it is to use something: –How easy will the use of the software be for a typical user to understand,
©2010 John Wiley and Sons Chapter 6 Research Methods in Human-Computer Interaction Chapter 6- Diaries.
Usability Testing Chapter 6. Reliability Can you repeat the test?
CS2003 Usability Engineering Usability Evaluation Dr Steve Love.
COMP5047 Pervasive Computing: 2012 Think-aloud usability experiments or concurrent verbal accounts Judy Kay CHAI: Computer human adapted interaction research.
Observation, Interviews, and Questionnaires a.k.a. How to watch and talk to your users.
Usability Assessment Methods beyond Testing Chapter 7 Evaluating without users.
Intro to Evaluation See how (un)usable your software really is…
CS5714 Usability Engineering Formative Evaluation of User Interaction: During Evaluation Session Copyright © 2003 H. Rex Hartson and Deborah Hix.
Usability Engineering Dr. Dania Bilal IS 582 Spring 2006.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Observation & Experiments Watch, listen, and learn…
Usability Evaluation, part 2. REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter?
EVALUATION PROfessional network of Master’s degrees in Informatics as a Second Competence – PROMIS ( TEMPUS FR-TEMPUS-JPCR)
Usability Engineering Dr. Dania Bilal IS 592 Spring 2005.
Cognitive Walkthrough More evaluating with experts.
Introduction to Evaluation without Users. Where are you at with readings? Should have read –TCUID, Chapter 4 For Next Week –Two Papers on Heuristics from.
Intro to Evaluation See how (un)usable your software really is…
Introduction to Evaluation “Informal” approaches.
Fall 2002CS/PSY Predictive Evaluation (Evaluation Without Users) Gathering data about usability of a design by a specified group of users for a particular.
Observation & Experiments Watch, listen, and learn…
Usability Evaluation or, “I can’t figure this out...do I still get the donuts?”
Usability Engineering Dr. Dania Bilal IS 582 Spring 2007.
Usability Engineering Dr. Dania Bilal IS 587 Fall 2007.
How do we know if our UI is good or bad?.
Observational Methods Think Aloud Cooperative evaluation Protocol analysis Automated analysis Post-task walkthroughs.
Evaluation / Usability. ImplementDesignAnalysisEvaluateDevelop ADDIE.
1 ITM 734 Introduction to Human Factors in Information Systems Cindy Corritore Testing the UI – part 2.
School of Engineering and Information and Communication Technology KIT305/607 Mobile Application Development Week 7: Usability (think-alouds) Dr. Rainer.
Day 8 Usability testing.
User Interface Evaluation
SIE 515 Design Evaluation Lecture 7.
Evaluation through user participation
Observation & Experiments
Evaluation.
HCI Evaluation Techniques
Experimental Evaluation
Empirical Evaluation Data Collection: Techniques, methods, tricks Objective data IRB Clarification All research done outside the class (i.e., with non-class.
Cognitive Walkthrough
Presentation transcript:

Intro to Evaluation See how (un)usable your software really is…

Why evaluation is done? Summative assess an existing system judge if it meets some criteria Formative assess a system being designed gather input to inform design Summative or formative? Depends on maturity of system how evaluation results will be used Same technique can be used for either

Other distinctions Form of results of obtained Quantitative Qualitative Who is experimenting with the design End users HCI experts Approach Experimental Naturalistic Predictive

Evaluation techniques Predictive modeling Questionnaire Empirical user studies (experiments) Heuristic evaluation Cognitive walkthrough Think aloud (protocol analysis) Interviews Experience Sampling Focus Groups

Evaluation techniques Predictive Evaluation Fitt’s law, Hick’s, etc. Observation Think-aloud Cooperative evaluation Watch users perform tasks with your interface We’ll start talking about this today

More techniques Empirical user studies (experiments) Test hypotheses about your interface Examine dependent variables against independent variables More next lecture… Interviews Questionnaire Focus Groups Get user feedback More next week…

Still more techniques Discount usability techniques Use HCI experts instead of users Fast and cheap method to get broad feedback Heuristic evaluation Several experts examine interface using guiding heuristics (like the ones we used in design) Cognitive Walkthrough Several experts assess learnability of interface for novices You will do one of each of these

And still more techniques Diary studies Users relate experiences on a regular basis Can write down, call in, etc. Experience Sampling Technique Interrupt users with very short questionnaire on a random-ish basis Good to get idea of regular and long term use in the field (real world) We won’t talk more about these…

Evaluation is Detective Work Goal: gather evidence that can help you determine whether your usability goals are being met Evidence (data) should be: Relevant Diagnostic Credible Corroborated

Data as Evidence Relevant Appropriate to address the hypotheses e.g., Does measuring “number of errors” provide insight into how effective your new air traffic control system supports the users’ tasks? Diagnostic Data unambiguously provide evidence one way or the other e.g., Does asking the users’ preferences clearly tell you if the system performs better? (Maybe)

Data as Evidence Credible Are the data trustworthy? Gather data carefully; gather enough data Corroborated Do more than one source of evidence support the hypotheses? e.g. Both accuracy and user opinions indicate that the new system is better than the previous system. But what if completion time is slower?

General Recommendations Include both objective & subjective data e.g. “completion time” and “preference” Use multiple measures, within a type e.g. “reaction time” and “accuracy” Use quantitative measures where possible e.g. preference score (on a scale of 1-7) Note: Only gather the data required; do so with minimum interruption, hassle, time, etc.

Making an evaluation plan What criteria are important? What resources available? evaluators, prototype, subjects, time Required authenticity of system

Evaluation planning Decide on techniques, tasks, materials How many people, how long How to record data, how to analyze data Prepare materials – interfaces, storyboards, questionnaires, etc. Pilot the entire evaluation Test all materials, tasks, questionnaires, etc. Find and fix the problems with wording, assumptions Get good feel for length of study

Recruiting Participants Various “subject pools” Volunteers Paid participants Students (e.g., psych undergrads) for course credit Friends, acquaintances, family, lab members “Public space” participants - e.g., observing people walking through a museum , newsgroup lists Must fit user population (validity) Note: Ethics, IRB, Consent apply to *all* participants, including friends & “pilot subjects”

Consent Why important? People can be sensitive about this process and issues Errors will likely be made, participant may feel inadequate May be mentally or physically strenuous What are the potential risks (there are always risks)?

Performing the Study Be well prepared so participant’s time is not wasted Explain procedures without compromising results Session should not be too long, subject can quit anytime Never express displeasure or anger Data to be stored anonymously, securely, and/or destroyed Expect anything and everything to go wrong!! (a little story)

Data Inspection Look at the results First look at each participant’s data Were there outliers, people who fell asleep, anyone who tried to mess up the study, etc.? Then look at aggregate results and descriptive statistics

Inspecting Your Data “What happened in this study?” Keep in mind the goals or hypotheses you had at the beginning Questions: Overall, how did people do? “5 W’s” (Where, what, why, when, and for whom were the problems?)

Making Conclusions Where did you meet your criteria? Where didn’t you? What were the problems? How serious are these problems? What design changes should be made? But don’t make things worse… Prioritize and plan changes to the design Iterate on entire process

Example: Heather’s study Software: MeetingViewer interface fully functional Criteria – learnability, efficiency, see what aspects of interface get used, what might be missing Resources – subjects were students in a research group, just me as evaluator, plenty of time Wanted completely authentic experience

Heather’s evaluation Task: answer questions from a recorded meeting, use my software as desired Think-aloud Video taped, software logs Also had post questionnaire Wrote my own code for log analysis Watched video and matched behavior to software logs

Example materials

Data analysis Basic data compiled: Time to answer a question (or give up) Number of clicks on each type of item Number of times audio played Length of audio played User’s stated difficulty with task User’s suggestions for improvements More complicated: Overall patterns of behavior in using the interface User strategies for finding information

Data representation example

Data presentation

Some usability conclusions Need fast forward and reverse buttons (minor impact) Audio too slow to load (minor impact) Target labels are confusing, need something different that shows dynamics (medium impact) Need more labeling on timeline (medium impact) Need different place for notes vs. presentations (major impact)

Observing Users Not as easy as you think One of the best ways to gather feedback about your interface Watch, listen and learn as a person interacts with your system Qualitative & quantitative, end users, experimental or naturalistic

Conducting an Observation Determine the tasks Determine what data you will gather IRB approval if necessary Recruit participants Collect the data Inspect & analyze the data Draw conclusions to resolve design problems Redesign and implement the revised interface

Observation Direct In same room Can be intrusive Users aware of your presence Only see it one time May use 1-way mirror to reduce intrusiveness Indirect Video recording Reduces intrusiveness, but doesn’t eliminate it Cameras focused on screen, face & keyboard Gives archival record, but can spend a lot of time reviewing it

Location Observations may be In lab - Maybe a specially built usability lab Easier to control Can have user complete set of tasks In field Watch their everyday actions More realistic Harder to control other factors

Understanding what you see In simple observation, you observe actions but don’t know what’s going on in their head Often utilize some form of verbal protocol where users describe their thoughts

Engaging Users in Evaluation Qualitative techniques Think-aloud - can be very helpful Post-hoc verbal protocol - review video Critical incident logging - positive & negative Structured interviews - good questions “What did you like best/least?” “How would you change..?” Identifying errors can be difficult

Verbal Protocol One technique: Think aloud User describes verbally what s/he is thinking and doing What they believe is happening Why they take an action What they are trying to do

Think Aloud Very widely used, useful technique Allows you to understand user’s thought processes better Potential problems: Can be awkward for participant Thinking aloud can modify way user performs task

Cooperative approach Another technique: Co-discovery learning (Constructive interation) Join pairs of participants to work together Use think aloud Perhaps have one person be semi-expert (coach) and one be novice More natural (like conversation) so removes some awkwardness of individual think aloud Variant: let coach be from design team (cooperative evaluation)

Alternative What if thinking aloud during session will be too disruptive? Can use post-event protocol User performs session, then watches video afterwards and describes what s/he was thinking Sometimes difficult to recall Opens up door of interpretation

Issues What if user gets stuck on a task? You can ask (in cooperative evaluation) “What are you trying to do..?” “What made you think..?” “How would you like to perform..?” “What would make this easier to accomplish..?” Maybe offer hints This is why cooperative approaches are used Can provide design ideas

Inputs / Outcomes Need operational prototype could use Wizard of Oz simulation What you get out “process” or “how-to” information Errors, problems with the interface compare user’s (verbalized) mental model to designer’s intended model

Historical Record In observing users, how do you capture events in the session for later analysis?

Capturing a Session 1. Paper & pencil Can be slow May miss things Is definitely cheap and easy Time 10:00 10:03 10:08 10:22 Task 1 Task 2 Task 3 … SeSe SeSe

Capturing a Session 2. Recording (audio and/or video) Good for think-aloud Hard to tie to interface Multiple cameras may be needed Good, rich record of session Can be intrusive Can be painful to transcribe and analyze

Capturing a Session 3. Software logging Modify software to log user actions Can give time-stamped key press or mouse event Two problems: Too low-level, want higher level events Massive amount of data, need analysis tools

Example logs |hrichter| |MV|START| |hrichter| |MV|QUESTION|false|false|false|false|false|false| |hrichter| |MV|TAB|AGENDA |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|SLIDECHANGE| |hrichter| |MV|SEEK|PRESENTATION-A|566|604189| |hrichter| |MV|SEEK|PRESENTATION-A|566|604189| |hrichter| |MV|SEEK|PRESENTATION-A|566|604189| |hrichter| |MV|TAB|AGENDA |hrichter| |MV|SEEK|AGENDA|566|149613| |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|SLIDECHANGE| |hrichter| |MV|SEEK|PRESENTATION|566|315796| |hrichter| |MV|PLAY|566| |hrichter| |MV|TAB|AV |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|SLIDECHANGE| |hrichter| |MV|SEEK|PRESENTATION|566|271191| |hrichter| |MV|TAB|AV |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|TAB|AGENDA |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|TAB|AV |hrichter| |MV|TAB|AGENDA |hrichter| |MV|TAB|AV |hrichter| |MV|STOP|566| |hrichter| |MV|END

Analysis Many approaches Task based How do users approach the problem What problems do users have Need not be exhaustive, look for interesting cases Performance based Frequency and timing of actions, errors, task completion, etc. Very time consuming!!

Usability Lab ation_room2.htm Large viewing area in this one- way mirror which includes an angled sheet of glass the improves light capture and prevents sound transmission between rooms. Doors for participant room and observation rooms are located such that participants are unaware of observers movements in and out of the observation room.

Observation Room State-of-the-art observation room equipped with three monitors to view participant, participant's monitor, and composite picture in picture. One-way mirror plus angled glass captures light and isolates sound between rooms. Comfortable and spacious for three people, but room enough for six seated observers. Digital mixer for unlimited mixing of input images and recording to VHS, SVHS, or MiniDV recorders.