Shneiderman and Plaisant Chapter 4

Shneiderman and Plaisant Chapter 4
Evaluation Shneiderman and Plaisant Chapter 4

Introduction Design Evaluate Implement Iterative design
Current “best practice” Specialization of Boehm’s spiral model Cost increases in radial dimension Earlier, about design … “Radically transformational” So, count on multiple passes Requirements Know task and user Guidelines, principles This time, evaluation How, after all, to know if usable Design Evaluate Implement

Overview Overview and orientation Then, to Shneiderman …
Evaluation Plans, Acceptance Testing, and Life Cycle Expert reviews Usability testing and techniques Goal is to “engineer” a good interface, constrained by time and cost Acceptance tests Evaluation during active use “Controlled psychologically oriented experiments” Elements of science, as applied to interface evaluation

Introduction Usability of interface design – a key component …from 2nd week! Evaluation required to know how/if “usable” By whatever means … reviews, surveys, etc. Again, what makes sense (is appropriate for) programmer/expert not right for general user population … an early point “Know thy user” and know how “thy user” performs with the system and how the system performs with “thy user” and the way to know is by evaluation In Shneiderman-ese: “Designers can become so entranced with their creations that they may fail to evaluate them adequately.” “Experienced designers have attained the wisdom and humility to know that extensive testing is a necessity.” “If feedback is the “breakfast of champions”, then testing is the “dinner of gods”

Overview and “orientation”, then to Shneiderman presentation

Evaluation: Naturalistic approach
Observation occurs in realistic setting Real life Saw in task analysis Watch user do task and catalog activities Ethnographic observation Problems Hard to arrange and do Time consuming May not generalize beyond one organization Which may be fine

Evaluation: Experimental Approach
Experimenter controls all environmental factors study relations by manipulating independent variables observe effect on one or more dependent variables nothing else changes E.g., hypothesis: There is no difference in user performance (time and error rate) when selecting an item from a pull down or a pull right menu of 4 items File Edit View Insert New Open Close Save File Edit View Insert New Open Close Save

Some terms: Validity External validity Internal validity
Do results of “study” (data gathering) apply, or generalize, beyond the specific setting and with the specific conditions under which the data were gatherer? confidence that results of applies to real situations usually good in natural settings Internal validity Are results of “study” (data gathering) “robust” and repeatable? For the particular setting or type of data gathering are inferences sound Confidence in our explanation of experimental results Usually good in experimental settings Trade-off: Natural vs. Experimental Precision and direct control over experimental design vs. Desire for maximum generalizability in real life situations

Usability Engineering Approach
Observe people using systems in simulated settings People brought in to artificial setting that simulates aspects of real world setting People given specific tasks to do Observations / measures made as people do their tasks Look for problem areas / successes Good for uncovering ‘big effects’

Is the test result relevant to the usability of real products in real use outside of lab? Problems non-typical users tested non-typical tasks different physical environment different social context motivation towards experimenter vs. motivation towards boss Partial Solution use real users task-centered system design tasks environment similar to real situation

How many users to observe? Observing many users is expensive But individual differences matter best user 10x faster than slowest best 25% of users ~2x faster than slowest 25% Partial solution Reasonable number of users tested Reasonable range of users Big problems usually detected with handful of users Small problems / fine measures need many users

“Discount Usability Evaluation” (or, “cheap”, “quick and dirty”)
Low cost methods to gather usability problems Approximate: capture most large and many minor problems How? Qualitative: Simply observe user interactions Gather user explanations and opinions Produces a description, usually in non-numeric terms Anecdotes, transcripts, problem areas, critical incidents… Quantitative Count, log, measure something of interest in user actions Speed, error rate, counts of activities,

“Discount Usability Evaluation”
Methods: Inspection Extracting the conceptual model Direct observation think-aloud constructive interaction Query techniques (interviews and questionnaires) Continuous evaluation (user feedback and field studies)

Inspection Designer, or some other atypical person tries the system (or prototype) Does the system “feel right”? Benefits can catch some major problems in early versions Problems not reliable as completely subjective not valid as introspector is a non-typical user intuitions and introspection are often wrong Inspection methods help task centered walkthroughs (more later) heuristic evaluation (much more later)

Conceptual Model Extraction Learning about the user’s mental model
Method show the user static images of the prototype or screens during use ask the user explain the function of each screen element how they would perform a particular task What? Initial conceptual model how person perceives a screen the very first time it is viewed Formative conceptual model How person perceives a screen after its been used for a while Value? good for eliciting people’s understanding before & after use poor for examining system exploration and learning

Direct Observations

Direct Observations Evaluator observes users interacting with system
in lab: user asked to complete a set of pre-determined tasks in field: user goes through normal duties Value excellent at identifying big design/interface problems validity depends on how controlled/contrived the situation is

Simple Observation Method
User is given the task Evaluator just watches the user Problem: Does not give insight into the user’s decision process or attitude

Hmm, what does this do? I’ll try it… Ooops, now what happened?
Think Aloud Method Users speak their thoughts while doing the task what they are trying to do why they took an action how they interpret what the system did gives insight into what the user is thinking most widely used evaluation method in industry (more later) may alter the way users do the task unnatural (awkward and uncomfortable) hard to talk if they are concentrating Hmm, what does this do? I’ll try it… Ooops, now what happened?

Constructive Interaction Method
Two people work together on a task monitor their normal conversations removes awkwardness of think-aloud Co-discovery learning use semi-knowledgeable “coach” and novice only novice uses the interface novice ask questions coach responds gives insights into two user groups Oh, I think you clicked on the wrong icon Now, why did it do that?

Recording Observations
How do we record user actions for later analysis? otherwise risk forgetting, missing, or misinterpreting events paper and pencil primitive but cheap observer records events, comments, and interpretations hard to get detail (writing is slow) 2nd observer helps… audio recording good for recording think aloud talk hard to tie into on-screen user actions video recording can see and hear what a user is doing one camera for screen, rear view mirror useful… initially intrusive

Coding sheet example... E.g., tracking a person’s use of an editor x x
General actions Graph editing Errors text scrolling image new delete modify correct miss editing editing node node node error error Time x 09:00 x 09:02 x 09:05 x 09:10 09:13

Interviews

Interviews Good for pursuing specific issues Problems:
vary questions to suit the context probe more deeply on interesting issues as they arise good for exploratory studies via open-ended questioning often leads to specific constructive suggestions Problems: accounts are subjective time consuming evaluator can easily bias the interview prone to rationalization of events/thoughts by user user’s reconstruction may be wrong

How to Interview Plan a set of central questions
a few good questions gets things started avoid leading questions focuses the interview could be based on results of user observations Let user responses lead follow-up questions follow interesting leads vs. bulldozing through question list

Retrospective Testing Interviews
Post-observation interview to perform an observational test create a video record of it have users view the video and comment on what they did clarify events that occurred during system use excellent for grounding a post-test interview avoids erroneous reconstruction users often offer concrete suggestions Do you know why you never tried that option? I didn’t see it. Why don’t you make it look like a button?

Critical Incidence Interviews
People talk about incidents that stood out usually discuss extremely annoying problems with fervor not representative, but important to them often raises issues not seen in lab tests Tell me about the last big problem you had with Word I can never get my figures in the right place. Its really annoying. I spent hours on it and I had to…

Now, to Shneiderman … … and the (large) organizational orientation

Evaluation Plan Metrics of Usability
“Evaluation plan” should be part of system development … and life cycle … in larger projects Also, part of “acceptance tests”: Objective measurable goals for hardware and software performance So, for system performance Response time, functionality, reliability, … And for usability, user-experience Values on specific metrics Also, part of “maintenance” after deployment As noted earlier, metrics of usability include: Time to learn specific tasks Speed of task performance Rate of errors Retention of commands or task sequences over time Frequency of help/assistance requests Subjective user satisfaction

Evaluation Plan – Depends of Project
Evaluation plan varies, depending on project Range of costs might be from 10%-20% of a project down to 5% Range of evaluation plans might be from years to a few days test Will have different kinds of elements depending on: Stage of design (early, middle, late) Key screens, prototype, final system Novelty of project Well defined vs. exploratory Number of expected users Criticality of the interface E.g., life-critical medical system vs. museum exhibit support Costs of product and finances allocated for testing Time available Experience of design and evaluation team

“Step-by-Step Usability Guide” from Shneiderman, web site example – find the evaluation elements …

But, Testing is not a Panacea, so …
Nonetheless, testing can’t eliminate all problems In essence, plan for remaining problems/challenges As part of evaluation plan! Cost of eliminating error (or enhancing performance) does not increase linearly I.e., the obvious things are easy, the hard require more resources to refine Design/cost decision made about what amount of costs to allocate Still, some things extremely hard to test E.g., user performance under stress

Expert Reviews, 1 Recall, “discount usability” …
Expert reviews can be from “just asking for feedback” to structured techniques, e.g., heuristic review (evaluation), guidelines review … of course, you have to have an expert, and large organizations do Expert needs to be familiar with domain and design goals One-half day to one week effort But lengthy training period may sometimes be required to explain task domain or operational procedures Even informal demos to colleagues or customers can provide some useful feedback More formal expert reviews have proven to be effective

Expert Reviews, 2 Can be scheduled at several points in development process When experts are available When design team is ready for feedback Different experts tend to find different problems in an interface 3-5 expert reviewers can be highly productive, as can complementary usability testing Caveats: Experts may not have an adequate understanding of task domain or user communities Conflicting advice Even experienced expert reviewers have great difficulty knowing how typical users, especially first-time users will really behave

Expert Review Techniques
Heuristic evaluation General review for adherence of interface to principles of successful design, e.g., Nielsen E.g., “error messages should be informative”, “feedback provided” Adherence to some theory or model, e.g., object-action model Guidelines review Check for conformance with guidelines Given complexity of guidelines, can be significant effort Consistency inspection E.g., of interface terminology, fonts, color schemes, input/output format Formal usability inspection/review – as part of SE process Structure forum for critiqueing (if not courtroom style …) Might be occasion to request exception for guideline deviation Bird’s-eye view of interface By, e.g., full set of printed screens on wall Inconsistencies, organization, etc. more evident Cognitive walkthrough …

Heuristic Evaluation Recall, Nielsen’s Heuristics
Meet expectations 1. Match the real world 2. Consistency & standards 3. Help & documentation User is boss 4. User control & freedom 5. Visibility of system status 6. Flexibility & efficiency Errors 7. Error prevention 8. Recognition, not recall 9. Error reporting, diagnosis, and recovery Keep it simple 10. Aesthetic & minimalistic design

Heuristic Evaluation cf. Nielsen, useit.com article
A small number of experts / evaluators either use or observe use of system and provide list of problems, based on heuristics A type of “discount usability testing” Recall, principles of Nielsen, Shneiderman, Togazinni, and others Some evaluators find some problems, others find others Nielsen recommends 3-5 evaluators Steps : Inspect UI thoroughly Compare UI against heuristics List usability problems Explain and justify each problem with heuristics

How To Do Heuristic Evaluation Details
Justify every problem with a heuristic “Too many choices on the home page - Aesthetic & minimalistic Design” Can’t just say “I don’t like the colors”, but can justify List every problem Even if an interface element has multiple problems Go through the interface at least twice Once to get the feel of the system Again to focus on particular interface elements Don’t limit to a single heuristic set (“8 Golden Rules”, Nielsen, etc.) Others: affordances, visibility, perceptual elements, color principles But, a particular heuristic set, e.g., Nielsen’s, is easier to compare against

Example . -Shopping cart icon is not balanced with its background whitespace (Aesthetic & minimalist design) -Good: user is greeted by name (Visibility of system status) -Red is used both for help messages and for error messages (Consistency, Match real world) -“There is a problem with your order”, but no explanation or suggestions for resolution (Error reporting) -ExtPrice and UnitPrice are strange labels (Match real world) -Remove Hardware button inconsistent with Remove checkbox (Consistency) -"Click here“ is unnecessary (Aesthetic & minimalist design) -No “Continue shopping" button (User control & freedom) -Recalculate is very close to Clear Cart (Error prevention) -“Check Out” button doesn’t look like other buttons (Consistency, both internal & external) -Uses “Cart Title” and “Cart Name” for the same concept (Consistency) -Must recall and type in cart title to load (Recognition not recall, Error prevention, Flexibility & efficiency)

Example Shopping cart icon not balanced with its background whitespace
(Aesthetic & minimalist design) Good: user is greeted by name (Visibility of system status) Red is used both for help messages and for error messages (Consistency, Match real world) “There is a problem with your order”, but no explanation or suggestions for resolution (Error reporting) ExtPrice and UnitPrice are strange labels (Match real world) Remove Hardware button inconsistent with Remove checkbox (Consistency)

Example "Click here“ is unnecessary (Aesthetic & minimalist design)
There is no “Continue shopping" button (User control & freedom) Recalculate is very close to Clear Cart (Error prevention) “Check Out” button doesn’t look like other buttons – which may be ok, trade-off ` (Consistency, both internal & external) Uses “Cart Title” and “Cart Name” for the same concept (Consistency) Must recall and type in cart title to load (Recognition not recall, Error prevention, Flexibility & efficiency)

Heuristic Evaluation is Not User Testing
Evaluators not the user either Maybe closer to being a typical user than you are, though Analogy: code inspection vs. testing Heuristic evaluation finds problems that user testing often misses E.g., inconsistent fonts But user testing is the “gold standard” for usability

Hints for Better Heuristic Evaluation
Use multiple evaluators Different evaluators find different problems The more the better, but diminishing returns Nielsen recommends 3-5 evaluators Alternate heuristic evaluation with user testing Each method finds different problems Heuristic evaluation is cheaper Use “observer” with evaluator Adds cost, but cheap enough anyway Take notes Provide domain guidance, where needed It’s OK for observer to help evaluator As long as the problem has already been noted This wouldn’t be OK in a user test

Writing Good Heuristic Evaluations (fyi)
Heuristic evaluations must communicate well to developers and managers Include positive comments as well as criticisms “Good: Toolbar icons are simple, with good contrast and few colors (minimalist design)” Be tactful Not: “the menu organization is a complete mess” Better: “menus are not organized by function” Be specific Not: “text is unreadable” Better: “text is too small, and has poor contrast (black text on dark green background)”

Suggested Report Format (fyi)
What to include: Problem Heuristic Description Severity Recommendation (if any) Screenshot (if helpful)

Formal Evaluation Formal evaluation typically much larger effort
Again, consider a large scale SE project Will look at some elements of formal evaluation Training, evaluation, severity ratings, debriefing

Formal Evaluation Process
1. Training Meeting for design team & evaluators Introduce application Explain user population, domain, scenarios 2. Evaluation Evaluators work separately Generate written report, or oral comments recorded by an observer Focus on generating problems, not on ranking their severity yet 1-2 hours per evaluator 3. Severity Rating Evaluators prioritize all problems found (not just their own) Take the mean of the evaluators ratings 4. Debriefing Evaluators & design team discuss results, brainstorm solutions

Severity Ratings Contributing factors Severity scale
Frequency: how common? Impact: how hard to overcome? Persistence: how often to overcome? Severity scale 1. Cosmetic: need not be fixed 2. Minor: needs fixing but low priority 3. Major: needs fixing and high priority 4. Catastrophic: imperative to fix

Evaluating Prototypes
Heuristic evaluation works on: Sketches Paper prototypes Unstable prototypes “Missing-element” problems are harder to find on sketches Because you’re not actually using the interface, you aren’t blocked by feature’s absence Look harder for them

Cognitive Walkthrough
Experts “walks through” the design, as a user would to carry out specific tasks Identifies potential problems using psychological principles E.g., “user has to remember action too long to successfully recall” Principle that short term memory is limited – more later Usually performed by expert in cognitive psychology Evaluates design on how well it supports user in learning task, etc. Analysis focuses on goals and knowledge: Does the interface design lead user to generate correct goals? E.g., at low level, having arrow in list box helps user form goal to click and select among alternatives For each task walkthrough considers What impact will interaction have on user? What cognitive processes are required? What learning problems may occur?

Kinds of User Tests Formative evaluation Field study
Find problems for next iteration of design Evaluates prototype or implementation, in lab, on chosen tasks Qualitative observations (usability problems) Field study Find problems in context Evaluates working implementation, in real context, on real tasks Mostly qualitative observations Controlled experiment Tests a hypothesis, e.g., interface X is faster than interface Y Evaluates working implementation, in controlled lab environment, on chosen tasks Mostly quantitative observations (time, error rate, satisfaction)

Usability Testing and Laboratories, 1
Usability testing and laboratories since early 1980s Speeded up projects and cut costs – which led to acceptance and implementation Usability testing is a unique practice Roots of techniques in experimental psychology Again, interface design is like engineering … which draws on science, but is a practice Not testing hypotheses about theories Rather, goal is to refine interfaces rapidly A variable or few at a time approach not appropriate Too slow, costly “User interface architect” Works with usability laboratory Carry out elements of evaluation plan “Pilot test”

Usability Testing and Laboratories, 2
Small usability lab Two areas: one for the participants to perform the tasks another, separated by a half-silvered mirror, for the testers and observers Participants should be chosen to represent intended user communities Consider … background in computing, experience with task, motivation, education, ability with natural language used in interface

Usability Testing – Techniques, 1
“Thinking aloud protocols” Surprisingly straightforward and effective technique Users simply say what they are doing User observed performing task user asked to describe what he is doing and why, what he thinks is happening etc. Well studied methodology Advantages simplicity - requires little expertise can provide useful insight can show how system is actually use Disadvantages subjective selective act of describing may alter task performance

Videotaping Useful for later review and showing designers or managers problems users encounter Sessions can be “coded” by observers for data reduction Paper mockups Sketches, story-boards Actually, used often! Different skills (and costs) for programming and sketching “Discount usability testing” – overview earlier Shneiderman’s, and others, term for “quick and dirty” “Rapid usability testing” Rapid, perhaps lo fidelity, prototype “Global” task performance Competitive usability testing Compares new design with existing or others Essentially, “incremental” testing of changes with existing as baseline

Universal usability testing Considers diversity of hardware platforms and users E.g., ambient light levels, network speed, age groups, color-blindness Field test and portable labs Puts logging software, usability task, video equipment, etc. where will be used Cost benefits and validity Remote usability testing Web-based application natural, e-feedback in general Can-you-break-this tests … like it says …

Usability Testing – Limitations
Emphasizes first-time users After all, bringing in people to laboratory In fact, often solicited in large labs in newspaper Short sessions show only first part of learning curve … Only possible to see part of complete system functionality Testing should be performed in environment in which system to be used Office, home, outside … not in laboratory Misses context of use Testing should be performed for long duration And such long duration testing should be part of plan, but is often not

Ethics of User Testing Users are human beings
Human subjects have been seriously abused in past Research involving user testing is now subject to close scrutiny Institutional Review Board (IRB) must approve user studies Pressures on a user Performance anxiety Feel like an intelligence test Comparing self with other subjects Feeling stupid in front of observers Competing with other subjects Informed consent statement: I have freely volunteered to participate in this experiment. I have been informed in advance what my task(s) will be and what procedures will be followed. I have been given the opportunity to ask questions, and have had my questions answered to my satisfaction. I am aware that I have the right to withdraw consent and to discontinue participation at any time, without prejudice to my future treatment. My signature below may be taken as affirmation of all the above statements; it was given prior to my participation in this study.

Ethics of User Testing Treat the User With Respect
Time Don’t waste Comfort Make the user comfortable Informed consent Inform the user as fully as possible Privacy Preserve the users privacy Control The user can stop at any time

Ethics of User Testing Before a Test
Time Pilot-test all materials and tasks Comfort (psychological and physical) “We’re testing the system; we’re not testing you” “Any difficulties you encounter are the system’s fault. We need your help to find these problems.” Privacy “Your test results will be completely confidential” Information Brief about purpose of study Inform about audio taping, videotaping, other observers Answer any questions beforehand (unless biasing) Control “You can stop at any time.”

Ethics of User Testing During the Test
Time Eliminate unnecessary tasks Comfort, calm, relaxed atmosphere Take breaks in long session Never act disappointed Give tasks one at a time First task should be easy, for an early success experience Privacy E.g., user’s boss shouldn’t be watching Information Answer questions (where won’t bias) Control User can give up a task and go on to the next User can quit entirely

Ethics of User Testing After the Test
Comfort Say what they’ve helped you do Information Answer questions that you had to defer to avoid biasing the experiment Privacy Don’t publish user-identifying information Don’t show video or audio without user permission

Formative Evaluation Integral part of spiral model
… vs. summative evaluation Used in design cycle, not just for end product Find some users Should be representative of the target user class(es), based on user analysis Give each user some tasks Should be representative of important tasks, based on task analysis Watch user do the tasks Roles in formative evaluation User Facilitator Observers

Thinking Aloud Protocol User’s Role
E.g., user should think aloud What they think is happening What they’re trying to do Why they took an action Problems Feels odd Thinking aloud may alter behavior Disrupts concentration Another approach: pairs of users Two users working together are more key to converse naturally Also called co-discovery, constructive interaction

Thinking Aloud Protocol Facilitator’s Role
Does the briefing Provides the tasks Coaches the user to think aloud by asking questions “What are you thinking?” “Why did you try that?” Controls the session and prevents interruptions by observers

Thinking Aloud Protocol Observer’s Role
Be quiet Don’t help, don’t explain, don’t point out mistakes Take notes Watch for critical incidents: events that strongly affect task performance or satisfaction Usually negative Errors Repeated attempts Curses May be positive “Cool” “Oh, now I see”

Thinking Aloud Protocol Recording Observations
Pen & paper notes Prepared forms can help Audio recording For think-aloud Video recording Usability abs often set up with two cameras, one for user’s face, one for screen User may be self-conscous Good for closed-circuit view by observers in another room Generates too much data Retrospective testing: go back through the video with the user, disicussng critical incidents Screen capture & event logging Cheap and unobtrusive

How Many Users? (supplementary)
Landauer-Nielsen model Every tested user finds a fraction L of usability problems (typical L = 31%) If user tests are independent, then n users will find a fraction 1-(1-L)^n So 5 users will find 85% of the problems Which is better: Using 15 users to find 99% of problems with one design iteration Using 5 users to find 85% problems with each of three design iterations For multiple user classes, get 3-5 users from each class

Flaws in Nielsen-Landauer Model (supplementary)
L may be much smaller than 31% Spool & Schroeder study of a CD-purchasing web site found L=8%, so 5 users only find 35% of problems L may vary from problem to problem Different problems have different probabilities of being found, caused by: Individual differences Interface diversity Task complexity Lesson: you can’t predict with confidence how many users may be needed

Usability Testing - Other Techniques
Physiological methods Eye tracking Head or desk mounted equipment tracks position of eye Eye movement reflects amount of cognitive processing a display requires Measurements include Fixations: eye maintains stable position. Number and duration indicate level of difficulty with display Saccades: rapid eye movement from one point of interest to another Scan paths: moving straight to a target with a short fixation at the target is optimal

Usability Testing – “Heat Maps”, 1 Eye tracking
Eye movement (gaze) data mapped to false color “Eyetracking Web Usability”, Nielsen, 2009

Usability Testing – “Heat Maps”, 2 Eye tracking
Eye movement (gaze) data mapped to false color “Search Engine Optimization for Dummies”, Koneka

Usability Testing - Other Techniques
Physiological measurements Emotional response linked to physical changes These may help determine a user’s reaction to an interface Measurements include: heart activity, including blood pressure, volume and pulse. activity of sweat glands: Galvanic Skin Response (GSR) electrical activity in muscle: electromyogram (EMG) electrical activity in brain: electroencephalogram (EEG) Difficulty in interpreting these physiological responses more research needed

Survey Instruments and Questionnaires (briefly)
Familiar, inexpensive and generally acceptable companion for usability tests and expert reviews Advantages quick and reaches large user group can be analyzed more rigorously Disadvantages less flexible less probing Long, detailed example in text Keys to successful surveys: Clear goals in advance – what information is required Development of focused items that help attain the goals Styles of question General, open-ended, scalar, multiple-choice, ranked Users could be asked for their subjective impressions about specific aspects of interface such as representations of: task domain objects and actions syntax of inputs and design of displays.

Surveys and Questionnaires, 2 (briefly)
Other goals would be to ascertain: users background (age, gender, origins, education, income) experience with computers (specific applications or software packages, length of time, depth of knowledge) job responsibilities (decision-making influence, managerial roles, motivation) personality style (introvert vs. extrovert, risk taking vs. risk aversive, early vs. late adopter, systematic vs. opportunistic) reasons for not using an interface (inadequate services, too complex, too slow) familiarity with features (printing, macros, shortcuts, tutorials) their feeling state after using an interface (confused vs. clear, frustrated vs. in-control, bored vs. excited) Online surveys avoid cost of printing and extra effort needed for distribution and collection of paper forms Many people prefer to answer a brief survey displayed on a screen, instead of filling in and returning a printed form, although there is a potential bias in the sample.

Acceptance Test (briefly)
As noted at outset: For large implementation projects, customer or manager usually sets objective and measurable goals for hardware and software performance If completed product fails to meet these acceptance criteria, system must be reworked until success is demonstrated “deliverables” include Again, measurable criteria for user interface can be established and might include: Time to learn specific functions Speed of task performance Rate of errors by users Human retention of commands over time Subjective user satisfaction In large system, there may be 8 or 10 such tests to carry out on different components of interface and with different user communities Once acceptance testing has been successful, there may be a period of field testing before national or international distribution

Evaluation During Active Use (briefly)
Recall, evaluation plan should include evaluation throughout software’s life cycle Successful active use requires constant attention from dedicated managers, user-services personnel, and maintenance staff “Perfection is not attainable, but percentage improvements are possible” Idea of “gradual interface dissemination” useful for minimal disruption Continue to fix problems and refine design (including user interface) Taken further by alpha and beta testing Many techniques available: Interviews and focus group discussions Continuous user-performance data logging Online suggestion box or trouble reporting Discussion group and newsgroup Interviews with individual users can be productive because the interviewer can pursue specific issues of concern. Group discussions are valuable to ascertain the universality of comments

Evaluation During Active Use, 2 (briefly)
Continuous user-performance data logging The software architecture should make it easy for system managers to collect data about The patterns of system usage Speed of user performance Rate of errors Frequency of request for online assistance A major benefit is guidance to system maintainers in optimizing performance and reducing costs for all participants Online or telephone consultants Many users feel reassured if they know there is a human assistance available On some network systems, the consultants can monitor the user's computer and see the same displays that the user sees Online suggestion box or trouble reporting Electronic mail to the maintainers or designers. For some users, writing a letter may be seen as requiring too much effort Discussion group and newsgroup Permit postings of open messages and questions Some are independent, e.g. America Online and Yahoo! Topic list Sometimes moderators Social systems Comments and suggestions should be encouraged.

Controlled Psychologically-oriented Experiments - Context

Recall, … goals for the engineering practice of interface design and implementation differ from goals for science …psychology, physics (or, for that matter a science of HCI) Goal of interface design is to design and implement (“good”) interfaces rapidly, or, in a pragmatic, cost effective manner Hence, different techniques appropriate

Goal of interface design is to design and implement (“good”) interfaces rapidly, or, in a pragmatic, cost effective manner Hence, different techniques appropriate Following from this, Shneiderman suggests that: As scientific and engineering progress is often stimulated by improved techniques for precise measurement, Rapid progress in the designs of interfaces will be stimulated as researchers and practitioners evolve suitable human-performance measures and techniques For example: Appliances have energy efficiency ratings, and Interfaces might have measures such as learning time for tasks, user satisfaction ratings

Goal of interface design is to design and implement (“good”) interfaces rapidly, or, in a pragmatic, cost effective manner Hence, different techniques appropriate A second principle Shneiderman suggests is to “adapt” elements of “the scientific method” to HCI, or interface design Bears looking at, but understand that he is speaking of both “science” and “empirical investigation” In fact, this is how you, as students educated in science and engineering should think!

Controlled Psychologically-oriented Experiments Shneiderman - “Empirical Investigation for Interface Design” How to “ adapt the scientific method, or empirical investigation, as applied to HCI and interface design Deal with a practical problem and consider the theoretical framework To help the user learn how to navigate through the information presented he/she should be shown which items to select as objects and where committing to some action will take him/or her State a lucid and testable hypothesis “By changing (color, font) of the item it will be more easily selected as shown by the time to perform the task decreasing by 2 seconds” Identify a small number of independent variables that are to be manipulated Those things to change (manipulate), e.g., color, font Carefully choose the dependent variables that will be measured Those things to measure, e.g., time to complete task

Controlled Psychologically-oriented Experiments Shneiderman - “Empirical Investigation for Interface Design” How to “ adapt the scientific method, or empirical investigation, as applied to HCI and interface design Judiciously select subjects and carefully or randomly assign subjects to groups As noted below – one of several “biasing factors” Control for biasing factors (non-representative sample of subjects or selection of tasks, inconsistent testing procedures) So that any change in value of dependent variable is not attributable to anything except the difference in independent variable Apply statistical methods to data analysis So you know what to expect by chance, measurement error, etc. Resolve the practical problem, refine the theory, and give advice to future researchers

Recall, idea that goals for the engineering practice of interface design and implementation differ from goals for science of psychology (or, for that matter a science of HCI) Goal of interface design is to design and implement (“good”) interfaces rapidly, or, in a pragmatic, cost effective manner Hence, different techniques appropriate Following from this, Shneiderman suggests that: As scientific and engineering progress is often stimulated by improved techniques for precise measurement, Rapid progress in the designs of interfaces will be stimulated as researchers and practitioners evolve suitable human-performance measures and techniques For example: Appliances have energy efficiency ratings, and Interfaces might have measures such as learning time for tasks, user satisfaction ratings A second principle Shneiderman suggests is to “adapt” elements of “the scientific method” to HCI, or interface design Bears looking at, but understand that he is speaking of both “science” and “empirical investigation” In fact, this is how you, as students educated in science and engineering should think!

Controlled Psychologically-oriented Experiments - “Empirical Investigation for Interface Design” (supplemental) “The scientific method (Shneiderman)”, or, empirical investigation, as applied to HCI and interface design Deal with a practical problem and consider the theoretical framework To help the user learn how to navigate through the information presented he/she should be shown which items to select as objects and where committing to some action will take him/or her State a lucid and testable hypothesis By changing (color, font) of the item it will be more easily selected as shown by the time to perform the task decreasing by 2 seconds Identify a small number of independent variables that are to be manipulated Those things to change (manipulate), e.g., color, font Carefully choose the dependent variables that will be measured Those things to measure, e.g., time to complete task Judiciously select subjects and carefully or randomly assign subjects to groups As noted below – one of several “biasing factors” Control for biasing factors (non-representative sample of subjects or selection of tasks, inconsistent testing procedures) So that any change in value of dependent variable is not attributable to anything except the difference in independent variable Apply statistical methods to data analysis So you know what to expect by chance, measurement error, etc. Resolve the practical problem, refine the theory, and give advice to future researchers

End Materials from: Shneiderman publisher site:
&5/ John Klemmer’s Intro. to HCI Design course MIT OpenCourseware, Robert Miller’s User Interface Design and Implementation Lecture materials by Saul Greenberg, University of Calgary, AB, Canada.

Shneiderman and Plaisant Chapter 4

Similar presentations

Presentation on theme: "Shneiderman and Plaisant Chapter 4"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shneiderman and Plaisant Chapter 4

Similar presentations

Presentation on theme: "Shneiderman and Plaisant Chapter 4"— Presentation transcript:

Similar presentations

About project

Feedback