An Introduction to Usability Testing Bill Killam, MA CHFP Adjunct Professor University of Maryland User-Centered Design.

An Introduction to Usability Testing Bill Killam, MA CHFP Adjunct Professor University of Maryland bkillam@user-centereddesign.com User-Centered Design  www.user-centereddesign.com

The Background User-Centered Design  www.user-centereddesign.com

Definitions  “Usability testing” is the common name for multiple forms both user and non-user based system evaluation  Done for many, many years prior, but popularized in the media by Jakob Neilson in the 1990’s User-Centered Design  www.user-centereddesign.com 3

4 What does “usability” mean?  ISO 9126 – A set of attributes that bear on the effort needed for use, and on the individual assessment of such use, by a stated or implied set of users  ISO 9241 Definition – “Extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.”

User-Centered Design  www.user-centereddesign.com 5 What does “usability” mean?  Jakob Neilson – Satisfaction – Efficiency – Learnability – Low Errors – Memorability  Ben Shneiderman – Ease of learning – Speed of task completion – Low error rate – Retention of knowledge over time – User satisfaction

User-Centered Design  www.user-centereddesign.com 6 Usability Defined  Accessibility – A precursor to usability: if users cannot gain access to the product, its usability is a moot point  Functional Suitability – Does the product contain the functionality required by the user?  Functional Discoverability – Can the user “discover” the functions of a product?  Ease-of-learning – Can the user figure out how to exercise the functionality provided?  Ease-of-use – Can the user exercise the functionality accurately and efficiently once its learned (includes accessibility issues)? – Can users use it safely?  Ease-of-recall – Can the knowledge of operation be easily maintained over time?  Safety – Can the user operate the system in relative safety, and recover from errors?  Subjective Preference – Do user’s like using it?

Usability Testing is… …any of a number of methodologies used to assess, based on our understanding of human capabilities and limitations, how a product’s design contributes or hinders its use when used by the intended users to perform the intend tasks in the intended environment User-Centered Design  www.user-centereddesign.com

Usability Testing is not…  …basic or even applied research  …large scale effort that can demonstrate internal and external reliability and validity  …require statistical analysis to determine the significance of the results  …the user-centered design process (its only one of the standard tools in the process) User-Centered Design  www.user-centereddesign.com 8

Usability Testing is not…  …Market Research – Qualitative & Quantitative market research is an attempt to understanding the actual or potential user population for a product of service Size of markets Reasons for purchasing/not purchasing User-Centered Design  www.user-centereddesign.com 9

Usability Testing is not…  …User Acceptance Testing – A process to ensure the functional specifications have been met – It occurs after development, just before a product is shipped – It tells you the functionality exists regardless of if it is “usable” – It may capture some subjective preference data, but not truly based on the products usability User-Centered Design  www.user-centereddesign.com 10

When is usability assessed?  On an existing product to determine if usability problems exist  During the design phase of a product  During the development phase of a product to assess proposed changes  Once a design is completed to determine if goals were met (Each of these represent different purposes with different approaches and are often funded from different sources.) User-Centered Design  www.user-centereddesign.com

The Methods User-Centered Design  www.user-centereddesign.com

Types of Usability Testing  Non-User Based – Compliance Reviews – Expert Reviews (Heuristic Evaluations) – Cognitive Walkthroughs  User-based – Ethnographic Observation – User Surveys/Questionnaires – Think Aloud Protocols – Performance-based Protocols – Co-Discover Protocols User-Centered Design  www.user-centereddesign.com 13

Non-User Based Testing User-Centered Design  www.user-centereddesign.com

Compliance Testing  Possible (within limits) to be performed by anyone  Can remove the low level usability issues that often mask more significant usability issues User-Centered Design  www.user-centereddesign.com 15

Compliance Testing (concluded)  Style Guide-based Testing – Checklists – Interpretation Issues – Scope Limitations  Available Standards – Commercially GUI & Web Standards and Style Guides – Domain Specific GUI & Web Standards and Style Guides – Internal Standards and Style Guides  Interface Specification Testing – May revert to user acceptance testing User-Centered Design  www.user-centereddesign.com 16

Expert Review  Aka: Heuristic Evaluation  One or two usability experts review a product, application, etc.  Free format review or structured review based on heuristics  Subjective but based on sound usability and design principles  Highly dependent on the qualifications of the reviewer(s) User-Centered Design  www.user-centereddesign.com 17

Expert Review (Concluded)  Nielson’s 10 Most Common Mistakes Made by Web Developers (three versions)  Shneiderman’s 8 Golden Rules  Constantine & Lockwood Heuristics  Forrester Group Heuristics  Norman’s 4 Principles of Usability User-Centered Design  www.user-centereddesign.com 18

Cognitive Walkthrough  Team Approach  Best if a diverse population of reviewers  Issues related to cognition (understanding) more than presentation  Also low cost usability testing  Highly dependent on the qualifications of the reviewer(s) User-Centered Design  www.user-centereddesign.com 19

User Based Testing User-Centered Design  www.user-centereddesign.com

User Surveys  Standardized vs. Custom made  Formats – Closed ended – Open ended – Scaled questions  Cognitive Testing of the Survey itself  User bias in evaluations – Leniency Effect – Central tendency – Strictness Bias  Strategic Responding – the intent of the survey is so obvious that the participants attempt to respond in a way they believe will help  Self Selection versus Directed Surveys User-Centered Design  www.user-centereddesign.com 21

User Interviews  Formats – Closed ended – Open ended – Likert scale questions  Strategic Responding – The intent of the interview is so obvious that the participants attempt to respond in a way they believe will help User-Centered Design  www.user-centereddesign.com 22

“Ethnographic Observation”  Field Study  Natural Environment for the User  Can be time consuming and logistically prohibitive User-Centered Design  www.user-centereddesign.com 23

How to Design & Conduct a User-based Test User-Centered Design  www.user-centereddesign.com

The Rules  You can do “some friends and a six-pack” testing  User-based testing should be developed using the model of a scientific experiment (not all are)  User-based testing should follow the ethical guidelines for the treatment of human subject testing (but not all)  User-based testing should not be constrained by the rules of reliability and repeatability (more on this later) User-Centered Design  www.user-centereddesign.com

Test Set-up  What’s the hypothesis? – Required for research – Required for usability testing?  Define Your Variables – Dependent and Independent Variables – Confounding Variables – Operationalize Your Variables User-Centered Design  www.user-centereddesign.com 26

Participant Issues  User-types – Users versus user surrogates – All profiles or specific user profiles/personas? – Critical segments?  How many? – Relationship to statistical significance – “Discount Usability” – who’s rule? User-Centered Design  www.user-centereddesign.com 27

Participant Issues (concluded)  Selecting subjects – Screeners – Getting Subjects Convenience Sampling Recruiting  Participant stipends  Over recruiting  Scheduling User-Centered Design  www.user-centereddesign.com 28

Test Set-up (concluded)  Select a Protocol (Part A) – Within versus Between Subject Designs – Mostly based on the number of items to be tested – Might be based on time commitment  Selecting a Protocol (Part B) – Non-interrupted think aloud (with or without a critical incidence analysis) – Interrupted think aloud – Interrupted performance-based – Non-interrupted performance-based (with or without a critical incidence analysis)  Special Cases – Co-discovery – “Group” User-Centered Design  www.user-centereddesign.com 29

Defining Task Scenarios  Areas of concern, redesign, or of interest  Interrelationship issues  Short, unambiguous tasks to be performed  Wording is critical – In the user’s own terms – Does not contain “seeds” to the correct solution  Enough to form a complete test but able to stay within the time limit – Flexibility is key – Variations ARE allowed User-Centered Design  www.user-centereddesign.com 30

Defining Task Scenarios (concluded)  Scenarios are contrived for testing, may not be representative of real world usage patterns, and are NOT always required User-Centered Design  www.user-centereddesign.com 31

Preparing Test Materials  Consent form!  Video release form!  Receipt and confidentiality agreement!  Demographic survey  Facilitator’s Guide – Introductory comments – Participant task descriptions – Questionnaires, SUS, Cooper-Harper, etc.  Note Taker’s Forms User-Centered Design  www.user-centereddesign.com 32

Piloting the Design  Getting subjects – Convenience sampling – Almost anyone will do  Collect data  Check task wording  Check timing User-Centered Design  www.user-centereddesign.com 33

Facilitating  Rogerian principles apply – Unconditional Positive Regard – Empathy – Congruence  Rogerian techniques are used – Minimal encouragers – Reflections – Summarization – Open ended questions  Objectiveness User-Centered Design  www.user-centereddesign.com

Collecting Data  Collecting data – The data is NOT in the interface, the data is in the user!  Collecting observed data – Behavior – Reactions  Collecting participant comments  Collecting subjective data – Pre-test data – Post scenario data – Post test data User-Centered Design  www.user-centereddesign.com 35

User-Centered Design  www.user-centereddesign.com What are we looking for in a usability evaluation?

Perceptual issues (our brain’s versus our senses)… User-Centered Design  www.user-centereddesign.com

Continuation

Proximity

Prägnanz (Common fate) User-Centered Design www.user-centereddesign.com

Similarity

Closure

Before: Where do you start?

After: Notice 2 Gestalt Principles in Use – Similarity and Visual Assisted Groupings

…our perceptual abilities are limited in the presence of noise… User-Centered Design  www.user-centereddesign.com

76 THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG’S BACK.

User-Centered Design  www.user-centereddesign.com 77 THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG’S BACK.

User-Centered Design  www.user-centereddesign.com 78 The quick brown fox jumped over the lazy dog’s back.

…our presumptions effects what and how we “see”things… User-Centered Design  www.user-centereddesign.com

Müller-Lyer Illusion

User-Centered Design  www.user-centereddesign.com 86

Ames Room

…our attention and cognitive abilities are limited and specialized… User-Centered Design  www.user-centereddesign.com

Jack and Jill went went up the Hill to fetch a a pail of milk

Bruner and Miller (1955) study showed users either L, M, Y, & K or 16, 17,10, & 12 then asked them to identify: I3

FINISHED FILES ARE THE RE- SULT OF YEARS OF SCIENTIF- IC STUDY COMBINED WITH THE EXPERIENCE OF MANY YEARS

Stroop Test

Red Green Blue Orange Yellow Black

Stroop

Orange Yellow Green Black Blue Red

User-Centered Design  www.user-centereddesign.com Test Your Attention

User-Centered Design  www.user-centereddesign.com Person Swap “Experiment”

…our memory affects our abilities… User-Centered Design  www.user-centereddesign.com

…our psychology affects our abilities… User-Centered Design  www.user-centereddesign.com

…and we need protection from ourselves… User-Centered Design  www.user-centereddesign.com

4 Hueristics User-Centered Design  www.user-centereddesign.com 115

Reporting the Results User-Centered Design  www.user-centereddesign.com 116

People don’t understand statistics  “There’s a 20% change of snow on the ground on Christmas day. And since we hadn’t had snow on the ground for 4 years, we should have it this year.”  30% change of rain in the DC area  There’s as much of a chance of the lottery producing 0-0-0-0-0 as any other number  There’s a greater chance of having a tree falling on you in a thunderstorm than being struck by lightning. But there’s a greater chance of being struck by lightning than winning the Virginia lottery User-Centered Design  www.user-centereddesign.com 117

People don’t provide useful statistics  4 out of 5 dentist surveyed say…  The average number of clicks to accomplish [a task in a test] was 5  The average time to complete a task was 20 seconds  “40% of the people who try to find data on [the site] can’t find what they are looking for” (based on the results of a single study) User-Centered Design  www.user-centereddesign.com 118

Predictive Statistics  Predictive (or inferential statistics) is a means drawing conclusions about a population based on data from a sample population  This type of statistic can only be reported when sampling is truly representative of the underlying population and specific procedures must be followed to ensure the results are capable of drawing correct conclusions about the population  To use predictive statistics, the test should be both valid and reliable User-Centered Design  www.user-centereddesign.com 119

Validity  Validity is the degree to which the results of a research study provide trustworthy information about the truth or falsity of the hypothesis (from Cherulnik, P.D. (2001). Methods for Behavioural Research: A Systematic Approach User-Centered Design  www.user-centereddesign.com 120

Validity (continued)  Internal validity refers to the situation where “experimental treatments make a difference in this specific experimental instance” (from Cambell, D.T. & Stanley, J.C. (1963) Experimental and Quasi-experimental Designs for Research – Failure to counter balance – Learning curve effects (ease-of-learning versus ease-of-use) ‏ – Disruptive protocols  External validity asks the question of “generalizability” – Sample size issues – Representative issues – Nonrepresentative research context (e.g., the Hawthorne effect) ‏ User-Centered Design  www.user-centereddesign.com 121

Validity (concluded)  Construct validity issues – Content validity and face validity are closely related. Content validity is an assessment that the test is valid as assessed by experts. Face validity is an assessment that the test is valid as assessed by the casual observer (a non expert) ‏ – Inadequate operationalization refers to a failure to define what will be measured in a way that is necessary and sufficient – Measurement artifacts refers to measurement biases such as experimenter expectancy or strategic responding during think aloud protocols – Concurrent validity refers to the relationship between this test and other tests that purport to measure the same data User-Centered Design  www.user-centereddesign.com 122

Reliability  Reliability is the ability of a test to show the same results if conducted multiple times – Test-retest reliability – Repeatability – Reproducibility User-Centered Design  www.user-centereddesign.com 123

Use of Confidence Intervals  “You just finished a usability test. You had 5 participants attempt a task in a new version of your software. All 5 out of 5 participants completed the task. You rush excitedly to tell your manager the results so you can communicate this success to the development team. Your manager asks, ‘OK, this is great with 5 users, but what are the chances that 50 or 1000 will have a 100% completion rate?’ ”- Jeff Sauro  The confidence level tells the informed reader the likelihood that another sample will provide the same results. In other words, if you ran the test again, what value are you likely to get next time? User-Centered Design  www.user-centereddesign.com 124

Use of Confidence Intervals (concluded) ‏  The confidence interval is determined (in large part) by the sample size. The more people you test, the more variables you have included, so the better your confidence (i.e., the smaller the confidence interval) ‏  Since time and cost are directly proportional to the sample size, there is a trade off between the margin or error/confidence interval and the cost of the evaluation User-Centered Design  www.user-centereddesign.com 125

Descriptive Statistics  Descriptive statistics summarize data without implying or inferring anything beyond the sample tested – This type of statistic can always be reported – This includes the mean, median, mode, variance, standard deviation, and covariance – But should be reported in an appropriate way User-Centered Design  www.user-centereddesign.com 126

Descriptive Statistics  Providing the average (e.g., 5.5) is not sufficient… User-Centered Design  www.user-centereddesign.com 127 Average = 5.5

Descriptive Statistics  Adding the standard deviation is only slightly more “helpful” User-Centered Design  www.user-centereddesign.com 128 Average = 5.5, SD=3.9 Average = 5.5, SD=.5

Descriptive Statistics  But the data often shows other patterns such as bimodal distributions. In these cases, the average and standard deviation are not adequate… User-Centered Design  www.user-centereddesign.com 129

Means, Medians, and Modes  Some people suggest the use of the median in lieu of the average (or mean)  Some people suggest the use of the mode User-Centered Design  www.user-centereddesign.com 130

Effectiveness Data  Effectiveness data can be operationalized in a number of ways but is generally operationalized as success or failure to complete a task  Completion rate as a pass/fail criteria can be measured objectively if the criteria is pre-determined and is not subjective  Best estimates, error rate, and the confidence interval can be calculated easily for pass/fail measure of completion rate using a Binomial calculation User-Centered Design  www.user-centereddesign.com 131

Completion Rate (concluded) ‏  Example: – If 4 out of 8 complete a task, you can say, with 95% certainty, that somewhere between 21% to 78% of all users will complete the task – If 8 out of 8 complete a task, you can only say, with 95% certainty, that somewhere between 71% to 100% of all users will complete the task – If 7 out of 8 complete the task, its 51%-99% – If 6 out of 8 complete the task, its 40%-93%  If you want to state that 90% of the users will complete a specific task (+/-5%), and all conditions of validity and reliability have been met… – You need to test 156 people and 142 of them complete the task – (For a confidence interval of 5% (+/- 2.5%, you’d need 450 people) ‏ User-Centered Design  www.user-centereddesign.com 132

User-Centered Design  www.user-centereddesign.com Efficiency Data – Time on Task  Efficiency data can be operationalized in number of ways – time on task being the most common  Time on task can be measured objectively  External time is important to management, but is not necessarily important users and time on task does not correlate with effectiveness (except in extreme cases) ‏  Though the data is not necessarily normally distributed, even with fairly large samples, a t-test can be used to complete time provided it is valid – It is not valid in think aloud or exploratory protocols – It can be invalidated by sampling issues

User-Centered Design  www.user-centereddesign.com

Satisfaction Data  Satisfaction data can be operationalized in a number of ways, but is always opinion data – Standardized survey instrument (e.g. SUS) ‏ – Simple Likert scale assessments  Questionnaires suffer from numerous issues that threaten their validity – Halo effect, leniency effect, strictness effect, bias  Satisfaction data does not correlate with performance except in extreme situations

User-Centered Design  www.user-centereddesign.com Satisfaction Data (continued) ‏  Take a poll on participants comparing the new product against the old version of the product. People might be asked to comment on the statement ‘The new design is an improvement over the old design.’ and given a choice of answers from “Definitely, it’s the tops.” to “No definitely not, it’s awful.” The data would be a collection of opinions. Assume the following scale and results…

User-Centered Design  www.user-centereddesign.com Satisfaction Data (continued) ‏  Though it appears to be one, this is not an interval scale since you cannot state that an opinion rated as 3 is three times weaker than the opinion rated as 1. There is, however, an order to the different opinions (its an ordinal scale).  Descriptive statistics (average, standard deviation) cannot be calculated, nor can parametric statistics be used to analyzed the results since the data is not normally distributed  You can use a non parametric analysis such as a rank sum analysis

User-Centered Design  www.user-centereddesign.com Rank Sum Analysis  The hypothesis you would want to test would be: “The participants consider the new product an improvement.”  A quick ‘eyeball’ test shows that none of those questioned thought it was awful and only one person thought it not very good, so a first impression is that people generally approve. If you start by assuming that in the population there is no opinion one way or the other, and that people’s responses are symmetrically distributed about ‘no opinion’, you can test the hypothesis that people think the shopping center is an asset, with the null hypothesis that people have no opinion about it, their response being the median value 4.

User-Centered Design  www.user-centereddesign.com Rank Sum Analysis (continued) ‏  You need to be careful to choose the appropriate test statistic for the problem you are tackling – For a one tailed test, where the alternative hypothesis is that the median is greater than a given value, the test statistic is W-. For a one tailed test, where the alternative hypothesis is that the median is less than a given value, the test statistic is W+. – For a two tailed test the test statistic is the smaller of W+ and W  As people who think it an improvement will give a rating of less than 4, the null and alternative hypotheses can be stated as follows. – H0 : the median response is 4 – H1 : the median response is less than 4 – 1 tail test, Significance level 5%

User-Centered Design  www.user-centereddesign.com Rank Sum Analysis (continued) ‏  List the value  Find the difference between each value and the median.  Ignore the zeros and rank the absolute values of the remaining scores.  Ignore the signs, start with the smallest difference and give it rank 1. Where two or more differences have the same value find their mean rank, and use this.  Now check that W+ + W- are the same as ½ n(n+1), where n is the number in the sample (having ignored the zeros). In this case n = 10. – ½ n(n+1)= ½ x 10 x 11 = 55 – W+ + W- = 9·5 + 45·5 = 55

User-Centered Design  www.user-centereddesign.com Rank Sum Analysis (concluded)  Compare the test statistic with the critical value in the tables. If the null hypothesis were true, and the median is 4, you would expect W+ and W- to have roughly the same value. There are two possible test statistics here, W+ = 9·5 and W- = 45·5, and you have to decide which one to use. We are interested in W=, the sum of the ranks of ratings greater than 4. W+ is much less than W- which suggests that more people felt the shopping center was an asset. It could also suggest that those who expressed a negative view expressed a very strong one, with lots of high numbers in the ratings.  Now you need to compare the value of W+, the test statistic, with the critical value from the table. Given that W+ is small the key question becomes “Is W+ significantly smaller than would happen by chance?” The table helps you decide this by supplying the critical value. For a sample of 10, at the 5% significance level for a 1 tailed test, the value is 10. As W+ is 9·5, which is less than this, the evidence suggests that we can reject the null hypothesis.  Your conclusion is that the evidence shows, at the 5% significance level, that the public thinks the design is better than the old design

Summary  With discount usability, or any exploratory approach, any predictive statistic calculated are unlikely to be valid (or will not be worth the effort) ‏  Descriptive statistics are valid with small numbers, but Histograms are a better way to show data from small samples User-Centered Design  www.user-centereddesign.com 142

Summary (continued)  The “Real” data from small number testing comes from the “Principle of Inter-ocular Drama” and the observer’s ability to explain it User-Centered Design  www.user-centereddesign.com 143

Histograms  A histograms is probably the best way to present the data since it shows the most data of interest… User-Centered Design  www.user-centereddesign.com 144 Score Number who got that score

Histograms (continued) ‏ User-Centered Design  www.user-centereddesign.com SUS Cooper Harper

User-Centered Design  www.user-centereddesign.com 146 Histograms (concluded) ‏ DC – MCH Data Memphis – MCH Data Memphis – SUS DataDC – SUS Data

User-Centered Design  www.user-centereddesign.com Bold form labels draws users eyes away from the form and reduces usability. Consider removing the bold and possibly bolding the content.

User-Centered Design  www.user-centereddesign.com There are 50 hyper links on the home page (not including primary nav.) representing four levels within the clinical trial section and direct links to other parts of NCI

User-Centered Design  www.user-centereddesign.com The redesigned page as it looks today.

State Pages 150 User-Centered Design www.user-centereddesign.com The placement of this group suggests secondary content, not primary content. Consider moving this to the left side of the page.

User-Centered Design  www.user-centereddesign.com Participants had difficulty understanding what content was searched Many thought all content in Clinical Trials would be searched, not just ongoing trials A few participants wanted to use the global NCI search to search Clinical Trials (consider labelling this “Search NCI” or “NCI Search” Some participants responded to the term “Find” even when the search form was on the page.

“Disciplines” 152 User-Centered Design www.user-centereddesign.com Participants (without prior exposure) failed to recognized the five primary disciplines as navigational elements. The most common expectation (if noticed at all) was that the links would provide definitions of the terms.

Summary (concluded)  Effectiveness, efficiency, and satisfaction are not correlated with each other  Effectiveness (as defined here) is correlated with the root definition of usability so efficiency and satisfaction should only be used to add supplementary information after acceptable levels of effectiveness has been proven or is presumed to be acceptable User-Centered Design  www.user-centereddesign.com 153

Conclusions  Lack of understanding is on both sides…  Actual Contract Requirement: The selected vendor will be required to demonstrate that one design (or one version of a design) is better than another design (or version) using 8 participants per study  Test a design of a system using 32 subject total representing 8 user groups to show the new design is better than the old design. Use a think aloud protocol and collect mouse clicks and time data. The old design will not be made available for comparison. User-Centered Design  www.user-centereddesign.com 154

Conclusions (concluded)  Any testing is better than no testing  The more you know about experimental design the better your testing will be, but the more you know about users the better the data you can get from any testing  Testing with human subject is serious business and should not be conducted lightly User-Centered Design  www.user-centereddesign.com 155

An Introduction to Usability Testing Bill Killam, MA CHFP Adjunct Professor University of Maryland User-Centered Design.

Similar presentations

Presentation on theme: "An Introduction to Usability Testing Bill Killam, MA CHFP Adjunct Professor University of Maryland User-Centered Design."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to Usability Testing Bill Killam, MA CHFP Adjunct Professor University of Maryland User-Centered Design.

Similar presentations

Presentation on theme: "An Introduction to Usability Testing Bill Killam, MA CHFP Adjunct Professor University of Maryland User-Centered Design."— Presentation transcript:

Similar presentations

About project

Feedback