SBD: Usability Evaluation

Name: SBD: Usability Evaluation
Uploaded: 2017-10-14T16:46:07+00:00
Duration: PTM17S57
Channel: Dale Jennings
Description: SBD: Usability Evaluation

SBD: Usability Evaluation
Chris North cs3724: HCI

Scenario-Based Design
ANALYZE analysis of stakeholders, field studies claims about current practice Problem scenarios Scenario-Based Design DESIGN Activity scenarios metaphors, information technology, HCI theory, guidelines iterative analysis of usability claims and re-design Information scenarios Interaction scenarios PROTOTYPE & EVALUATE summative evaluation formative evaluation Usability specifications

Evaluation Formative vs. Summative Analytic vs. Emprical

Usability Engineering
Reqs Analysis Design Evaluate Develop many iterations

Usability Engineering
Formative evaluation Summative evaluation

Usability Evaluation Analytic Methods: Empirical Methods:
Usability inspection, Expert review Heuristic Evaluation Cognitive walk-through GOMS analysis Empirical Methods: Usability Testing Field or lab Observation, problem identification Controlled Experiment Formal controlled scientific experiment Comparisons, statistical analysis

User Interface Metrics
Ease of learning learning time, … Ease of use perf time, error rates… User satisfaction surveys… Not “user friendly”

Usability Testing

Usability Testing Formative: helps guide design
Early in design process when architecture is finalized, then its too late! A few users Usability problems, incidents Qualitative feedback from users Quantitative usability specification

Usability Specification Table
Scenario task Worst case Planned Target Best case (expert) Observed Find most expensive house for sale? 1 min. 10 sec. 3 sec. ??? sec …

Usability Test Setup Set of benchmark tasks Consent forms
Easy to hard, specific to open-ended Coverage of different UI features E.g. “find the 5 most expensive houses for sale” Different types: learnability vs. performance Consent forms Not needed unless video-taping user’s face (new rule) Experimenters: Facilitator: instructs user Observers: take notes, collect data, video tape screen Executor: run the prototype if faked Users 3-5 users, quality not quantity

Usability Test Procedure
Goal: mimic real life Do not cheat by showing them how to use the UI! Initial instructions “We are evaluating the system, not you.” Repeat: Give user a task Ask user to “think aloud” Observe, note mistakes and problems Avoid interfering, hint only if completely stuck Interview Verbal feedback Questionnaire ~1 hour / user

Usability Lab E.g. McBryde 102

Data Note taking Verbal protocol: think aloud
E.g. user keeps clicking on the wrong button…” Verbal protocol: think aloud E.g. user thinks that button does something else… Rough quantitative measures HCI metrics: e.g. task completion time, .. Interview feedback and surveys Video-tape screen & mouse Eye tracking, biometrics?

Analyze Initial reaction: Mature reaction: Identify usability problems
“stupid user!”, “that’s developer X’s fault!”, “this sucks” Mature reaction: “how can we redesign UI to solve that usability problem?” the user is always right Identify usability problems Learning issues: e.g. can’t figure out or didn’t notice feature Performance issues: e.g. arduous, tiring to solve tasks Subjective issues: e.g. annoying, ugly Problem severity: critical vs. minor

Cost-Importance Analysis
Importance 1-5: (task effect, frequency) 5 = critical, major impact on user, frequent occurance 3 = user can complete task, but with difficulty 1 = minor problem, small speed bump, infrequent Ratio = importance / cost Sort by this 3 categories: Must fix, next version, ignored Problem Importance Solutions Cost Ratio I/C

Refine UI Simple solutions vs. major redesigns
Solve problems in order of: importance/cost Example: Problem: user didn’t know he could zoom in to see more… Potential solutions: Better zoom button icon, tooltip Add a zoom bar slider (like moosburg) Icons for different zoom levels: boundaries, roads, buildings NOT: more “help” documentation!!! You can do better. Iterate Test, refine, test, refine, test, refine, … Until? Meets usability specification

Project: Usability Evaluation
>=3 users: Not (tainted) HCI students Simple data collection (Biometrics optional!) Exploit this opportunity to improve your design Report: Procedure (users, tasks, specs, data collection) Usability problems identified, specs not met Design modifications

Controlled Experiments

Usability test vs. Controlled Expm.
Formative: helps guide design Single UI, early in design process Few users Usability problems, incidents Qualitative feedback from users Controlled experiment: Summative: measure final result Compare multiple UIs Many users, strict protocol Independent & dependent variables Quantitative results, statistical significance

What is Science? Measurement Modeling

Scientific Method Form Hypothesis Collect data Analyze
Accept/reject hypothesis How to “prove” a hypothesis in science? Easier to disprove things, by counterexample Null hypothesis = opposite of hypothesis Disprove null hypothesis Hence, hypothesis is proved

Empirical Experiment Typical question: Spotfire vs. TableLens
Which visualization is better in which situations? Spotfire vs. TableLens

Cause and Effect Goal: determine “cause and effect” Procedure:
Cause = visualization tool (Spotfire vs. TableLens) Effect = user performance time on task T Procedure: Vary cause Measure effect Problem: random variation Cause = vis tool OR random variation? random variation Real world Collected data uncertain conclusions

Stats to the Rescue Goal: Hypothesis: Null hypothesis: Stats: Hence:
Measured effect unlikely to result by random variation Hypothesis: Cause = visualization tool (e.g. Spotfire ≠ TableLens) Null hypothesis: Visualization tool has no effect (e.g. Spotfire = TableLens) Hence: Cause = random variation Stats: If null hypothesis true, then measured effect occurs with probability < 5% (e.g. measured effect >> random variation) Hence: Null hypothesis unlikely to be true Hence, hypothesis likely to be true

Variables Independent Variables (what you vary), and treatments (the variable values): Visualization tool Spotfire, TableLens, Excel Task type Find, count, pattern, compare Data size (# of items) 100, 1000, Dependent Variables (what you measure) User performance time Errors Subjective satisfaction (survey) HCI metrics

Example: 2 x 3 design n users per cell Task1 Task2 Task3 Spot-fire
Ind Var 2: Task Type Task1 Task2 Task3 Spot-fire Table-Lens Ind Var 1: Vis. Tool Measured user performance times (dep var)

Groups “Between subjects” variable
1 group of users for each variable treatment Group 1: 20 users, Spotfire Group 2: 20 users, TableLens Total: 40 users, 20 per cell “With-in subjects” (repeated) variable All users perform all treatments Counter-balancing order effect Group 1: 20 users, Spotfire then TableLens Group 2: 20 users, TableLens then Spotfire Total: 40 users, 40 per cell

Issues Eliminate or measure extraneous factors Randomized Fairness
Identical procedures, … Bias User privacy, data security IRB (internal review board)

Procedure For each user: * n users Sign legal forms
Pre-Survey: demographics Instructions Do not reveal true purpose of experiment Training runs Actual runs Give task measure performance Post-Survey: subjective measures * n users

Data Measured dependent variables Spreadsheet: User Spotfire TableLens
task 1 task 2 task 3

Step 1: Visualize it Dig out interesting facts Qualitative conclusions
Guide stats Guide future experiments

Step 2: Stats Task1 Task2 Task3 Spot-fire 37.2 54.5 103.7 Table-Lens
Ind Var 2: Task Type Task1 Task2 Task3 Spot-fire 37.2 54.5 103.7 Table-Lens 29.8 53.2 145.4 Ind Var 1: Vis. Tool Average user performance times (dep var)

TableLens better than Spotfire?
Problem with Averages: lossy Compares only 2 numbers What about the 40 data values? (Show me the data!) Avg Perf time (secs) Spotfire TableLens

The real picture Need stats that compare all data Avg Perf time (secs)
Spotfire TableLens

Statistics t-test ANOVA: Analysis of Variance Result:
Compares 1 dep var on 2 treatments of 1 ind var ANOVA: Analysis of Variance Compares 1 dep var on n treatments of m ind vars Result: p = probability that difference between treatments is random (null hypothesis) “statistical significance” level typical cut-off: p < 0.05 Hypothesis confidence = 1 - p

In Excel

p < 0.05 Woohoo! Found a “statistically significant” difference
Averages determine which is ‘better’ Conclusion: Cause = visualization tool (e.g. Spotfire ≠ TableLens) Vis Tool has an effect on user performance for task T … “95% confident that TableLens better than Spotfire …” NOT “TableLens beats Spotfire 95% of time” 5% chance of being wrong! Be careful about generalizing

p > 0.05 Hence, no difference? NOT! How?
Vis Tool has no effect on user performance for task T…? Spotfire = TableLens ? NOT! Did not detect a difference, but could still be different Potential real effect did not overcome random variation Provides evidence for Spotfire = TableLens, but not proof Boring, basically found nothing How? Not enough users Need better tasks, data, …

Data Mountain Robertson, “Data Mountain” (Microsoft)

Data Mountain: Experiment
Data Mountain vs. IE favorites 32 subjects Organize 100 pages, then retrieve based on cues Indep. Vars: UI: Data mountain (old, new), IE Cue: Title, Summary, Thumbnail, all 3 Dependent variables: User performance time Error rates: wrong pages, failed to find in 2 min Subjective ratings

Data Mountain: Results
Spatial Memory! Limited scalability?

SBD: Usability Evaluation

Similar presentations

Presentation on theme: "SBD: Usability Evaluation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SBD: Usability Evaluation

Similar presentations

Presentation on theme: "SBD: Usability Evaluation"— Presentation transcript:

Similar presentations

About project

Feedback