Download presentation
Presentation is loading. Please wait.
Published byJoella Heather Lee Modified over 8 years ago
1
An Introduction to Usability Testing Bill Killam, MA CHFP Adjunct Professor University of Maryland bkillam@user-centereddesign.com User-Centered Design www.user-centereddesign.com
2
The Background User-Centered Design www.user-centereddesign.com
3
Definitions “Usability testing” is the common name for multiple forms both user and non-user based system evaluation Done for many, many years prior, but popularized in the media by Jakob Neilson in the 1990’s User-Centered Design www.user-centereddesign.com 3
4
4 What does “usability” mean? ISO 9126 – A set of attributes that bear on the effort needed for use, and on the individual assessment of such use, by a stated or implied set of users ISO 9241 Definition – “Extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.”
5
User-Centered Design www.user-centereddesign.com 5 What does “usability” mean? Jakob Neilson – Satisfaction – Efficiency – Learnability – Low Errors – Memorability Ben Shneiderman – Ease of learning – Speed of task completion – Low error rate – Retention of knowledge over time – User satisfaction
6
User-Centered Design www.user-centereddesign.com 6 Usability Defined Accessibility – A precursor to usability: if users cannot gain access to the product, its usability is a moot point Functional Suitability – Does the product contain the functionality required by the user? Functional Discoverability – Can the user “discover” the functions of a product? Ease-of-learning – Can the user figure out how to exercise the functionality provided? Ease-of-use – Can the user exercise the functionality accurately and efficiently once its learned (includes accessibility issues)? – Can users use it safely? Ease-of-recall – Can the knowledge of operation be easily maintained over time? Safety – Can the user operate the system in relative safety, and recover from errors? Subjective Preference – Do user’s like using it?
7
Usability Testing is… …any of a number of methodologies used to assess, based on our understanding of human capabilities and limitations, how a product’s design contributes or hinders its use when used by the intended users to perform the intend tasks in the intended environment User-Centered Design www.user-centereddesign.com
8
Usability Testing is not… …basic or even applied research …large scale effort that can demonstrate internal and external reliability and validity …require statistical analysis to determine the significance of the results …the user-centered design process (its only one of the standard tools in the process) User-Centered Design www.user-centereddesign.com 8
9
Usability Testing is not… …Market Research – Qualitative & Quantitative market research is an attempt to understanding the actual or potential user population for a product of service Size of markets Reasons for purchasing/not purchasing User-Centered Design www.user-centereddesign.com 9
10
Usability Testing is not… …User Acceptance Testing – A process to ensure the functional specifications have been met – It occurs after development, just before a product is shipped – It tells you the functionality exists regardless of if it is “usable” – It may capture some subjective preference data, but not truly based on the products usability User-Centered Design www.user-centereddesign.com 10
11
When is usability assessed? On an existing product to determine if usability problems exist During the design phase of a product During the development phase of a product to assess proposed changes Once a design is completed to determine if goals were met (Each of these represent different purposes with different approaches and are often funded from different sources.) User-Centered Design www.user-centereddesign.com
12
The Methods User-Centered Design www.user-centereddesign.com
13
Types of Usability Testing Non-User Based – Compliance Reviews – Expert Reviews (Heuristic Evaluations) – Cognitive Walkthroughs User-based – Ethnographic Observation – User Surveys/Questionnaires – Think Aloud Protocols – Performance-based Protocols – Co-Discover Protocols User-Centered Design www.user-centereddesign.com 13
14
Non-User Based Testing User-Centered Design www.user-centereddesign.com
15
Compliance Testing Possible (within limits) to be performed by anyone Can remove the low level usability issues that often mask more significant usability issues User-Centered Design www.user-centereddesign.com 15
16
Compliance Testing (concluded) Style Guide-based Testing – Checklists – Interpretation Issues – Scope Limitations Available Standards – Commercially GUI & Web Standards and Style Guides – Domain Specific GUI & Web Standards and Style Guides – Internal Standards and Style Guides Interface Specification Testing – May revert to user acceptance testing User-Centered Design www.user-centereddesign.com 16
17
Expert Review Aka: Heuristic Evaluation One or two usability experts review a product, application, etc. Free format review or structured review based on heuristics Subjective but based on sound usability and design principles Highly dependent on the qualifications of the reviewer(s) User-Centered Design www.user-centereddesign.com 17
18
Expert Review (Concluded) Nielson’s 10 Most Common Mistakes Made by Web Developers (three versions) Shneiderman’s 8 Golden Rules Constantine & Lockwood Heuristics Forrester Group Heuristics Norman’s 4 Principles of Usability User-Centered Design www.user-centereddesign.com 18
19
Cognitive Walkthrough Team Approach Best if a diverse population of reviewers Issues related to cognition (understanding) more than presentation Also low cost usability testing Highly dependent on the qualifications of the reviewer(s) User-Centered Design www.user-centereddesign.com 19
20
User Based Testing User-Centered Design www.user-centereddesign.com
21
User Surveys Standardized vs. Custom made Formats – Closed ended – Open ended – Scaled questions Cognitive Testing of the Survey itself User bias in evaluations – Leniency Effect – Central tendency – Strictness Bias Strategic Responding – the intent of the survey is so obvious that the participants attempt to respond in a way they believe will help Self Selection versus Directed Surveys User-Centered Design www.user-centereddesign.com 21
22
User Interviews Formats – Closed ended – Open ended – Likert scale questions Strategic Responding – The intent of the interview is so obvious that the participants attempt to respond in a way they believe will help User-Centered Design www.user-centereddesign.com 22
23
“Ethnographic Observation” Field Study Natural Environment for the User Can be time consuming and logistically prohibitive User-Centered Design www.user-centereddesign.com 23
24
How to Design & Conduct a User-based Test User-Centered Design www.user-centereddesign.com
25
The Rules You can do “some friends and a six-pack” testing User-based testing should be developed using the model of a scientific experiment (not all are) User-based testing should follow the ethical guidelines for the treatment of human subject testing (but not all) User-based testing should not be constrained by the rules of reliability and repeatability (more on this later) User-Centered Design www.user-centereddesign.com
26
Test Set-up What’s the hypothesis? – Required for research – Required for usability testing? Define Your Variables – Dependent and Independent Variables – Confounding Variables – Operationalize Your Variables User-Centered Design www.user-centereddesign.com 26
27
Participant Issues User-types – Users versus user surrogates – All profiles or specific user profiles/personas? – Critical segments? How many? – Relationship to statistical significance – “Discount Usability” – who’s rule? User-Centered Design www.user-centereddesign.com 27
28
Participant Issues (concluded) Selecting subjects – Screeners – Getting Subjects Convenience Sampling Recruiting Participant stipends Over recruiting Scheduling User-Centered Design www.user-centereddesign.com 28
29
Test Set-up (concluded) Select a Protocol (Part A) – Within versus Between Subject Designs – Mostly based on the number of items to be tested – Might be based on time commitment Selecting a Protocol (Part B) – Non-interrupted think aloud (with or without a critical incidence analysis) – Interrupted think aloud – Interrupted performance-based – Non-interrupted performance-based (with or without a critical incidence analysis) Special Cases – Co-discovery – “Group” User-Centered Design www.user-centereddesign.com 29
30
Defining Task Scenarios Areas of concern, redesign, or of interest Interrelationship issues Short, unambiguous tasks to be performed Wording is critical – In the user’s own terms – Does not contain “seeds” to the correct solution Enough to form a complete test but able to stay within the time limit – Flexibility is key – Variations ARE allowed User-Centered Design www.user-centereddesign.com 30
31
Defining Task Scenarios (concluded) Scenarios are contrived for testing, may not be representative of real world usage patterns, and are NOT always required User-Centered Design www.user-centereddesign.com 31
32
Preparing Test Materials Consent form! Video release form! Receipt and confidentiality agreement! Demographic survey Facilitator’s Guide – Introductory comments – Participant task descriptions – Questionnaires, SUS, Cooper-Harper, etc. Note Taker’s Forms User-Centered Design www.user-centereddesign.com 32
33
Piloting the Design Getting subjects – Convenience sampling – Almost anyone will do Collect data Check task wording Check timing User-Centered Design www.user-centereddesign.com 33
34
Facilitating Rogerian principles apply – Unconditional Positive Regard – Empathy – Congruence Rogerian techniques are used – Minimal encouragers – Reflections – Summarization – Open ended questions Objectiveness User-Centered Design www.user-centereddesign.com
35
Collecting Data Collecting data – The data is NOT in the interface, the data is in the user! Collecting observed data – Behavior – Reactions Collecting participant comments Collecting subjective data – Pre-test data – Post scenario data – Post test data User-Centered Design www.user-centereddesign.com 35
36
User-Centered Design www.user-centereddesign.com What are we looking for in a usability evaluation?
37
Perceptual issues (our brain’s versus our senses)… User-Centered Design www.user-centereddesign.com
38
+
39
+
59
Unity
60
Continuation
61
Proximity
62
Prägnanz (Common fate) User-Centered Design www.user-centereddesign.com
63
Similarity
64
Closure
71
Before: Where do you start?
72
After: Notice 2 Gestalt Principles in Use – Similarity and Visual Assisted Groupings
75
…our perceptual abilities are limited in the presence of noise… User-Centered Design www.user-centereddesign.com
76
76 THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG’S BACK.
77
User-Centered Design www.user-centereddesign.com 77 THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG’S BACK.
78
User-Centered Design www.user-centereddesign.com 78 The quick brown fox jumped over the lazy dog’s back.
79
…our presumptions effects what and how we “see”things… User-Centered Design www.user-centereddesign.com
81
Müller-Lyer Illusion
86
User-Centered Design www.user-centereddesign.com 86
87
User-Centered Design www.user-centereddesign.com 87
91
Ames Room
93
User-Centered Design www.user-centereddesign.com 93
100
…our attention and cognitive abilities are limited and specialized… User-Centered Design www.user-centereddesign.com
101
Jack and Jill went went up the Hill to fetch a a pail of milk
102
Bruner and Miller (1955) study showed users either L, M, Y, & K or 16, 17,10, & 12 then asked them to identify: I3
103
FINISHED FILES ARE THE RE- SULT OF YEARS OF SCIENTIF- IC STUDY COMBINED WITH THE EXPERIENCE OF MANY YEARS
104
Stroop Test
106
Red Green Blue Orange Yellow Black
107
Stroop
108
Orange Yellow Green Black Blue Red
109
User-Centered Design www.user-centereddesign.com Test Your Attention
110
User-Centered Design www.user-centereddesign.com Person Swap “Experiment”
111
…our memory affects our abilities… User-Centered Design www.user-centereddesign.com
112
…our psychology affects our abilities… User-Centered Design www.user-centereddesign.com
113
…and we need protection from ourselves… User-Centered Design www.user-centereddesign.com
114
114
115
4 Hueristics User-Centered Design www.user-centereddesign.com 115
116
Reporting the Results User-Centered Design www.user-centereddesign.com 116
117
People don’t understand statistics “There’s a 20% change of snow on the ground on Christmas day. And since we hadn’t had snow on the ground for 4 years, we should have it this year.” 30% change of rain in the DC area There’s as much of a chance of the lottery producing 0-0-0-0-0 as any other number There’s a greater chance of having a tree falling on you in a thunderstorm than being struck by lightning. But there’s a greater chance of being struck by lightning than winning the Virginia lottery User-Centered Design www.user-centereddesign.com 117
118
People don’t provide useful statistics 4 out of 5 dentist surveyed say… The average number of clicks to accomplish [a task in a test] was 5 The average time to complete a task was 20 seconds “40% of the people who try to find data on [the site] can’t find what they are looking for” (based on the results of a single study) User-Centered Design www.user-centereddesign.com 118
119
Predictive Statistics Predictive (or inferential statistics) is a means drawing conclusions about a population based on data from a sample population This type of statistic can only be reported when sampling is truly representative of the underlying population and specific procedures must be followed to ensure the results are capable of drawing correct conclusions about the population To use predictive statistics, the test should be both valid and reliable User-Centered Design www.user-centereddesign.com 119
120
Validity Validity is the degree to which the results of a research study provide trustworthy information about the truth or falsity of the hypothesis (from Cherulnik, P.D. (2001). Methods for Behavioural Research: A Systematic Approach User-Centered Design www.user-centereddesign.com 120
121
Validity (continued) Internal validity refers to the situation where “experimental treatments make a difference in this specific experimental instance” (from Cambell, D.T. & Stanley, J.C. (1963) Experimental and Quasi-experimental Designs for Research – Failure to counter balance – Learning curve effects (ease-of-learning versus ease-of-use) – Disruptive protocols External validity asks the question of “generalizability” – Sample size issues – Representative issues – Nonrepresentative research context (e.g., the Hawthorne effect) User-Centered Design www.user-centereddesign.com 121
122
Validity (concluded) Construct validity issues – Content validity and face validity are closely related. Content validity is an assessment that the test is valid as assessed by experts. Face validity is an assessment that the test is valid as assessed by the casual observer (a non expert) – Inadequate operationalization refers to a failure to define what will be measured in a way that is necessary and sufficient – Measurement artifacts refers to measurement biases such as experimenter expectancy or strategic responding during think aloud protocols – Concurrent validity refers to the relationship between this test and other tests that purport to measure the same data User-Centered Design www.user-centereddesign.com 122
123
Reliability Reliability is the ability of a test to show the same results if conducted multiple times – Test-retest reliability – Repeatability – Reproducibility User-Centered Design www.user-centereddesign.com 123
124
Use of Confidence Intervals “You just finished a usability test. You had 5 participants attempt a task in a new version of your software. All 5 out of 5 participants completed the task. You rush excitedly to tell your manager the results so you can communicate this success to the development team. Your manager asks, ‘OK, this is great with 5 users, but what are the chances that 50 or 1000 will have a 100% completion rate?’ ”- Jeff Sauro The confidence level tells the informed reader the likelihood that another sample will provide the same results. In other words, if you ran the test again, what value are you likely to get next time? User-Centered Design www.user-centereddesign.com 124
125
Use of Confidence Intervals (concluded) The confidence interval is determined (in large part) by the sample size. The more people you test, the more variables you have included, so the better your confidence (i.e., the smaller the confidence interval) Since time and cost are directly proportional to the sample size, there is a trade off between the margin or error/confidence interval and the cost of the evaluation User-Centered Design www.user-centereddesign.com 125
126
Descriptive Statistics Descriptive statistics summarize data without implying or inferring anything beyond the sample tested – This type of statistic can always be reported – This includes the mean, median, mode, variance, standard deviation, and covariance – But should be reported in an appropriate way User-Centered Design www.user-centereddesign.com 126
127
Descriptive Statistics Providing the average (e.g., 5.5) is not sufficient… User-Centered Design www.user-centereddesign.com 127 Average = 5.5
128
Descriptive Statistics Adding the standard deviation is only slightly more “helpful” User-Centered Design www.user-centereddesign.com 128 Average = 5.5, SD=3.9 Average = 5.5, SD=.5
129
Descriptive Statistics But the data often shows other patterns such as bimodal distributions. In these cases, the average and standard deviation are not adequate… User-Centered Design www.user-centereddesign.com 129
130
Means, Medians, and Modes Some people suggest the use of the median in lieu of the average (or mean) Some people suggest the use of the mode User-Centered Design www.user-centereddesign.com 130
131
Effectiveness Data Effectiveness data can be operationalized in a number of ways but is generally operationalized as success or failure to complete a task Completion rate as a pass/fail criteria can be measured objectively if the criteria is pre-determined and is not subjective Best estimates, error rate, and the confidence interval can be calculated easily for pass/fail measure of completion rate using a Binomial calculation User-Centered Design www.user-centereddesign.com 131
132
Completion Rate (concluded) Example: – If 4 out of 8 complete a task, you can say, with 95% certainty, that somewhere between 21% to 78% of all users will complete the task – If 8 out of 8 complete a task, you can only say, with 95% certainty, that somewhere between 71% to 100% of all users will complete the task – If 7 out of 8 complete the task, its 51%-99% – If 6 out of 8 complete the task, its 40%-93% If you want to state that 90% of the users will complete a specific task (+/-5%), and all conditions of validity and reliability have been met… – You need to test 156 people and 142 of them complete the task – (For a confidence interval of 5% (+/- 2.5%, you’d need 450 people) User-Centered Design www.user-centereddesign.com 132
133
User-Centered Design www.user-centereddesign.com Efficiency Data – Time on Task Efficiency data can be operationalized in number of ways – time on task being the most common Time on task can be measured objectively External time is important to management, but is not necessarily important users and time on task does not correlate with effectiveness (except in extreme cases) Though the data is not necessarily normally distributed, even with fairly large samples, a t-test can be used to complete time provided it is valid – It is not valid in think aloud or exploratory protocols – It can be invalidated by sampling issues
134
User-Centered Design www.user-centereddesign.com
135
Satisfaction Data Satisfaction data can be operationalized in a number of ways, but is always opinion data – Standardized survey instrument (e.g. SUS) – Simple Likert scale assessments Questionnaires suffer from numerous issues that threaten their validity – Halo effect, leniency effect, strictness effect, bias Satisfaction data does not correlate with performance except in extreme situations
136
User-Centered Design www.user-centereddesign.com Satisfaction Data (continued) Take a poll on participants comparing the new product against the old version of the product. People might be asked to comment on the statement ‘The new design is an improvement over the old design.’ and given a choice of answers from “Definitely, it’s the tops.” to “No definitely not, it’s awful.” The data would be a collection of opinions. Assume the following scale and results…
137
User-Centered Design www.user-centereddesign.com Satisfaction Data (continued) Though it appears to be one, this is not an interval scale since you cannot state that an opinion rated as 3 is three times weaker than the opinion rated as 1. There is, however, an order to the different opinions (its an ordinal scale). Descriptive statistics (average, standard deviation) cannot be calculated, nor can parametric statistics be used to analyzed the results since the data is not normally distributed You can use a non parametric analysis such as a rank sum analysis
138
User-Centered Design www.user-centereddesign.com Rank Sum Analysis The hypothesis you would want to test would be: “The participants consider the new product an improvement.” A quick ‘eyeball’ test shows that none of those questioned thought it was awful and only one person thought it not very good, so a first impression is that people generally approve. If you start by assuming that in the population there is no opinion one way or the other, and that people’s responses are symmetrically distributed about ‘no opinion’, you can test the hypothesis that people think the shopping center is an asset, with the null hypothesis that people have no opinion about it, their response being the median value 4.
139
User-Centered Design www.user-centereddesign.com Rank Sum Analysis (continued) You need to be careful to choose the appropriate test statistic for the problem you are tackling – For a one tailed test, where the alternative hypothesis is that the median is greater than a given value, the test statistic is W-. For a one tailed test, where the alternative hypothesis is that the median is less than a given value, the test statistic is W+. – For a two tailed test the test statistic is the smaller of W+ and W As people who think it an improvement will give a rating of less than 4, the null and alternative hypotheses can be stated as follows. – H0 : the median response is 4 – H1 : the median response is less than 4 – 1 tail test, Significance level 5%
140
User-Centered Design www.user-centereddesign.com Rank Sum Analysis (continued) List the value Find the difference between each value and the median. Ignore the zeros and rank the absolute values of the remaining scores. Ignore the signs, start with the smallest difference and give it rank 1. Where two or more differences have the same value find their mean rank, and use this. Now check that W+ + W- are the same as ½ n(n+1), where n is the number in the sample (having ignored the zeros). In this case n = 10. – ½ n(n+1)= ½ x 10 x 11 = 55 – W+ + W- = 9·5 + 45·5 = 55
141
User-Centered Design www.user-centereddesign.com Rank Sum Analysis (concluded) Compare the test statistic with the critical value in the tables. If the null hypothesis were true, and the median is 4, you would expect W+ and W- to have roughly the same value. There are two possible test statistics here, W+ = 9·5 and W- = 45·5, and you have to decide which one to use. We are interested in W=, the sum of the ranks of ratings greater than 4. W+ is much less than W- which suggests that more people felt the shopping center was an asset. It could also suggest that those who expressed a negative view expressed a very strong one, with lots of high numbers in the ratings. Now you need to compare the value of W+, the test statistic, with the critical value from the table. Given that W+ is small the key question becomes “Is W+ significantly smaller than would happen by chance?” The table helps you decide this by supplying the critical value. For a sample of 10, at the 5% significance level for a 1 tailed test, the value is 10. As W+ is 9·5, which is less than this, the evidence suggests that we can reject the null hypothesis. Your conclusion is that the evidence shows, at the 5% significance level, that the public thinks the design is better than the old design
142
Summary With discount usability, or any exploratory approach, any predictive statistic calculated are unlikely to be valid (or will not be worth the effort) Descriptive statistics are valid with small numbers, but Histograms are a better way to show data from small samples User-Centered Design www.user-centereddesign.com 142
143
Summary (continued) The “Real” data from small number testing comes from the “Principle of Inter-ocular Drama” and the observer’s ability to explain it User-Centered Design www.user-centereddesign.com 143
144
Histograms A histograms is probably the best way to present the data since it shows the most data of interest… User-Centered Design www.user-centereddesign.com 144 Score Number who got that score
145
Histograms (continued) User-Centered Design www.user-centereddesign.com SUS Cooper Harper
146
User-Centered Design www.user-centereddesign.com 146 Histograms (concluded) DC – MCH Data Memphis – MCH Data Memphis – SUS DataDC – SUS Data
147
User-Centered Design www.user-centereddesign.com Bold form labels draws users eyes away from the form and reduces usability. Consider removing the bold and possibly bolding the content.
148
User-Centered Design www.user-centereddesign.com There are 50 hyper links on the home page (not including primary nav.) representing four levels within the clinical trial section and direct links to other parts of NCI
149
User-Centered Design www.user-centereddesign.com The redesigned page as it looks today.
150
State Pages 150 User-Centered Design www.user-centereddesign.com The placement of this group suggests secondary content, not primary content. Consider moving this to the left side of the page.
151
User-Centered Design www.user-centereddesign.com Participants had difficulty understanding what content was searched Many thought all content in Clinical Trials would be searched, not just ongoing trials A few participants wanted to use the global NCI search to search Clinical Trials (consider labelling this “Search NCI” or “NCI Search” Some participants responded to the term “Find” even when the search form was on the page.
152
“Disciplines” 152 User-Centered Design www.user-centereddesign.com Participants (without prior exposure) failed to recognized the five primary disciplines as navigational elements. The most common expectation (if noticed at all) was that the links would provide definitions of the terms.
153
Summary (concluded) Effectiveness, efficiency, and satisfaction are not correlated with each other Effectiveness (as defined here) is correlated with the root definition of usability so efficiency and satisfaction should only be used to add supplementary information after acceptable levels of effectiveness has been proven or is presumed to be acceptable User-Centered Design www.user-centereddesign.com 153
154
Conclusions Lack of understanding is on both sides… Actual Contract Requirement: The selected vendor will be required to demonstrate that one design (or one version of a design) is better than another design (or version) using 8 participants per study Test a design of a system using 32 subject total representing 8 user groups to show the new design is better than the old design. Use a think aloud protocol and collect mouse clicks and time data. The old design will not be made available for comparison. User-Centered Design www.user-centereddesign.com 154
155
Conclusions (concluded) Any testing is better than no testing The more you know about experimental design the better your testing will be, but the more you know about users the better the data you can get from any testing Testing with human subject is serious business and should not be conducted lightly User-Centered Design www.user-centereddesign.com 155
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.