Intro to Evaluation See how (un)usable your software really is…

Why evaluation is done? Summative assess an existing system judge if it meets some criteria Formative assess a system being designed gather input to inform design Summative or formative? Depends on maturity of system how evaluation results will be used Same technique can be used for either

Other distinctions Form of results of obtained Quantitative Qualitative Who is experimenting with the design End users HCI experts Approach Experimental Naturalistic Predictive

Evaluation techniques Predictive Evaluation Interviews Questionnaires Observation Experiment Discount Evaluation techniques Heuristic eval Cognitive walkthrough

Predictive Evaluation Predict user performance and usability Rules or formulas based on experimentation Quantitative Predictive In a bit…

Interviews & Questionnaires Ask users what they think about your prototype / design / ideas Qualitative & quantitative Subjective End users or other stakeholders Often accompanies other methods to get subjective feedback

Observation Watch users perform tasks with your interface Qualitative & quantitative Objective Experimental or naturalistic Variations Think-aloud Cooperative evaluation

Experiments Test hypotheses about your interface Quantitative Objective Experimental Examine dependent variables against independent variables Often used to compare two designs or compare performance between groups Next week…

Discount usability techniques Fast and cheap method to get broad feedback Use HCI experts instead of users Qualitative mostly Heuristic evaluation Several experts examine interface using guiding heuristics (like the ones we used in design) Cognitive Walkthrough Several experts assess learnability of interface for novices

And still more techniques Diary studies Users relate experiences on a regular basis Can write down, call in, etc. Experience Sampling Technique Interrupt users with very short questionnaire on a random-ish basis Good to get idea of regular and long term use in the field (real world)

A “typical” usability study Bring users into a lab Introduce them to your interface Give them a script or several tasks and ask to complete them Look for errors & problems, performance, etc. Interview or questionnaire after to get additional feedback

Usability Lab http://www.surgeworks.com/services/observ ation_room2.htm Large viewing area in this one- way mirror which includes an angled sheet of glass the improves light capture and prevents sound transmission between rooms. Doors for participant room and observation rooms are located such that participants are unaware of observers movements in and out of the observation room.

A “typical” usability study Questionnaire (biographical data) Observation of several tasks Sometimes as part of an experiment Interview (for additional feedback)

Evaluation is Detective Work Goal: gather evidence that can help you determine whether your usability goals are being met Evidence (data) should be: Relevant Diagnostic Credible Corroborated

Data as Evidence Relevant Appropriate to address the hypotheses e.g., Does measuring “number of errors” provide insight into how effective your new air traffic control system supports the users’ tasks? Diagnostic Data unambiguously provide evidence one way or the other e.g., Does asking the users’ preferences clearly tell you if the system performs better? (Maybe)

Data as Evidence Credible Are the data trustworthy? Gather data carefully; gather enough data Corroborated Do more than one source of evidence support the hypotheses? e.g. Both accuracy and user opinions indicate that the new system is better than the previous system. But what if completion time is slower?

General Recommendations Identify evaluation goals Include both objective & subjective data e.g. “completion time” and “preference” Use multiple measures, within a type e.g. “reaction time” and “accuracy” Use quantitative measures where possible e.g. preference score (on a scale of 1-7) Note: Only gather the data required; do so with minimum interruption, hassle, time, etc.

Evaluation planning Decide on techniques, tasks, materials What are usability criteria? How much required authenticity? How many people, how long How to record data, how to analyze data Prepare materials – interfaces, storyboards, questionnaires, etc. Pilot the entire evaluation Test all materials, tasks, questionnaires, etc. Find and fix the problems with wording, assumptions Get good feel for length of study

Recruiting Participants Various “subject pools” Volunteers Paid participants Students (e.g., psych undergrads) for course credit Friends, acquaintances, family, lab members “Public space” participants - e.g., observing people walking through a museum Email, newsgroup lists Must fit user population (validity) Note: Ethics, Consent apply to *all* participants, including friends & “pilot subjects”

Performing the Study Be well prepared so participant’s time is not wasted Explain procedures without compromising results Session should not be too long, subject can quit anytime Never express displeasure or anger Data to be stored anonymously, securely, and/or destroyed Expect anything and everything to go wrong!! (a little story)

Consent Why important? People can be sensitive about this process and issues Errors will likely be made, participant may feel inadequate May be mentally or physically strenuous What are the potential risks (there are always risks)?

Data Analysis Start just looking at the data Were there outliers, people who fell asleep, anyone who tried to mess up the study, etc.? Identify issues: Overall, how did people do? “5 W’s” (Where, what, why, when, and for whom were the problems?) Compile aggregate results and descriptive statistics

Making Conclusions Where did you meet your criteria? Where didn’t you? What were the problems? How serious are these problems? What design changes should be made? But don’t make things worse… Prioritize and plan changes to the design Iterate on entire process

Example: Heather’s study Software: MeetingViewer interface fully functional Criteria – learnability, efficiency, see what aspects of interface get used, what might be missing Resources – subjects were students in a research group, just me as evaluator, plenty of time Wanted completely authentic experience

Heather’s software

Heather’s evaluation Task: answer questions from a recorded meeting, use my software as desired Think-aloud Video taped, software logs Also had post questionnaire Wrote my own code for log analysis Watched video and matched behavior to software logs

Example materials

Data analysis Basic data compiled: Time to answer a question (or give up) Number of clicks on each type of item Number of times audio played Length of audio played User’s stated difficulty with task User’s suggestions for improvements More complicated: Overall patterns of behavior in using the interface User strategies for finding information

Data representation example

Data presentation

Some usability conclusions Need fast forward and reverse buttons (minor impact) Audio too slow to load (minor impact) Target labels are confusing, need something different that shows dynamics (medium impact) Need more labeling on timeline (medium impact) Need different place for notes vs. presentations (major impact)

Your turn: during break In your project groups Which usability goals are important for you? How might you measure each one? Which techniques would help get those measurements?

Reminder: (some) usability goals Learnability Predictability Synthesizability Familiarity Generalizability Consistency Error prevention Recoverability Observability Responsiveness Task conformance Flexibility Customizability Substitutivity Satisfying Engaging Motivating Efficient Aesthetic

Predictive Models Translate empirical evidence into theories and models that can influence design. Performance measures Quantitative Time prediction Working memory constraints Competence measures Focus on certain details, others obscured

Two Types of User Modeling Stimulus-Response Practice law Fitt’s law Cognitive – human as interpreter/predictor – based on Model Human Processor (MHP) Key-stroke Level Model Low-level, simple GOMS (and similar) Models Higher-level (Goals, Operations, Methods, Selections)

Power law of practice T n = T 1 n -a T n to complete the nth trial is T 1 on the first trial times n to the power -a; a is about.4, between.2 and.6 Skilled behavior - Stimulus-Response and routine cognitive actions Typing speed improvement Learning to use mouse Pushing buttons in response to stimuli NOT learning

Power Law: T n = T 1 n -a If first trial (T 1 ) takes 5 seconds, how long will future trials take? When will improvements level off? (a = -0.4)

Uses for Power Law of Practice Use measured time T 1 on trial 1 to predict whether time with practice will meet usability criteria, after a reasonable number of trials How many trials are reasonable? Predict how many practices will be needed for user to meet usability criteria Determine if usability criteria is realistic

Fitts’ Law Models movement times for selection tasks Paul Fitts: war-time human factors pioneer Basic idea: Movement time for a well- rehearsed selection task Increases as the distance to the target increases Decreases as the size of the target increases

Moving Move from START to STOP D W START STOP Index of Difficulty: ID = log 2 ( 2D/W ) (in unitless bits) width of target distance

Movement Time MT = a + b*ID or MT = a + b log 2 (2D/W) MT ID Empirical measurement establishes constants a and b Different for different devices and different ways the same device is used.

Questions What do you do in 2D? h x l rect: one way is ID = log 2 (d/min(w, l) + 1) Should take into account direction of approach

Applications When does it apply? How used in interface design?

GOMS Goals, Operators, Methods, Selection Rules Card, Moran, & Newell (1983) Assumptions Human activity is problem solving Decompose into subproblems Determine goals to “attack” problem Know sequence of operations used to achieve the goals Timing values for each operation

GOMS: Components Goals State to be achieved Operators Elementary perceptual, cognitive, motor acts Not so fine-gained as Model Human Processor Methods Procedures for accomplishing a (sub)goal e.g., move cursor via mouse or keys Selection Rules if-then rules that determine which method to use

GOMS: Limitations GOMS is not so well suited for: Tasks where steps are not well understood Inexperienced users Why?

GOMS: Application NYNEX telephone operation system GOMS analysis used to determine critical path, time to complete typical task Determined that new system would actually be slower Abandoned, saving millions of dollars

Keystroke Level Model (KLM) Chapter 12.5 Low-level GOMS variant Also developed by Card, Moran, and Newell (1983) Skilled users performing routine tasks Assumes error-free performance Analyze only observable behaviors Keystrokes, mouse movements Assigns times to basic human operations - experimentally verified

KSLM Accounts for Keystroking T K Mouse button pressT B Pointing (typically with mouse) T P Hand movement between keyboard and mouse T H Drawing straight line segments T D “Mental preparation” T M System Response time T R

Step One : MS Word Find Command Use Find Command to locate a six character word H (Home on mouse) P (Edit) B (click on mouse button - press/release) P (Find) B (click on mouse button) H (Home on keyboard) 6K (Type six characters into Find dialogue box) K (Return key on dialogue box starts the find)

Using KSLM - Step Two Place M operators Rule 0a. In front of all K’s that are NOT part of argument strings (ie, not part of text or numbers) Rule 0b. In front of all P’s that select commands (not arguments)

Step Two : MS Word Find Command H (Home on mouse) MP (Edit) B (click on mouse button) MP (Find) B (click on mouse button) H (Home on keyboard) 6K (Type six characters) MK (Return key on dialogue box starts the find) Rule 0b: P selects command Rule 0a: K is argument

Using KSLM - Step 3 Remove M’s according to heuristic rules (Rules relate to chunking of actions) Rule 1. Anticipated by prior operation – H MP ->HP (pointing to menu item is anticipated by moving hand to mouse) Rule 2. If string of MKs is a single cognitive unit (such as a command name), delete all but first – MKMKMK -> MKKK (same as M3K) (type “run is a chunk) Rule 3. Redundant terminator, such as )) or rtn rtn Rule 4. If K terminates a constant string, such as command-rtn, then delete M M2K(ls)MK(rtn) -> M2K(ls)K(rtn) (typing “ls” command in Unix followed by rtn is a chunk)

H (Home on mouse) MP (Edit) B (click on mouse button) MP (Find) B (click on mouse button) H (Home on keyboard) 6K (Type six characters) MK (Return key on dialogue box starts the find) Step 3 : MS Word Find Command Rule 4 Keep M Rule 1 delete M H anticipates P

Using KSLM - Step 4 Plug in real numbers from experiments K:.08 sec for best typists,.28 average, 1.2 if unfamiliar with keyboard B: down or up - 0.1 secs; click - 0.2 secs P: 1.1 secs H: 0.4 secs M: 1.35 secs R: depends on system; often less than.05 secs

Step 4 : MS Word Find Command H (Home on mouse) P (Edit) B (click on mouse button - press/release) MP (Find) B (click on mouse button) H (Home on keyboard) 6K (Type six characters into Find dialogue box) MK (Return key on dialogue box starts the find) Timings H = 0.40, P = 1.10, B = 0.20, M = 1.35, K = 0.28 2H, 2P, 2B, 2M, 7K Predicted time = 8.06 secs

Example: MS Windows Menu Selection Get hands on mouse Select from menu bar with click of mouse button The “pull down” menu appears Select desired item from the pull down menu

Step 1: MS Windows Menu H (Home on mouse) P (point to menu bar item) B (left-click with mouse button) P (point to menu item) B (left-click with mouse button)

Step 2: MS Windows Menu - Add M’s H (get hand on mouse) MP (point to menu bar item) B (left-click with mouse button) MP (point to menu item) B (left-click with mouse button) Rule 0b: P selects command

Step 3: MS Windows Menu - Delete M’s H (get hand on mouse) MP (point to menu bar item) B (left-click with mouse button) MP (point to menu item) B (left-click with mouse button) Keep M Rule 1 M anticipated by P

Step 4: MS Windows Menu Calculate Time H (get hand on mouse) P (point to menu bar item) B (left-click with mouse button) MP (point to menu item) B (left-click with mouse button) Textbook timings (all in seconds) H = 0.40, P = 1.10, B = 0.20, M = 1.35 H, 2P, 2B, 1 M Total predicted time = 4.35 sec

Alternative Menu Selection Operator sequence H(mouse)P(to menu item)B(down)PB(up) Now place Ms H(mouse)MP(to menu item)B(down)MPB(up) Selectively remove Ms H(mouse)MP(to menu item)B(down)MPB(up) Textbook timings (all in seconds) H = 0.40, P = 1.10, B = 0.10 for up or down, M = 1.35 H, 2P, 2 B, 1 M Total predicted time = 4.15 sec Alternative is predicted to be.2 secs faster than typical, about 5% Rule 0b Rule 1 Delete H anticipates P

KSLM Comparison Problem Are keyboard accelerators always faster than menu selection? Use MS Windows to compare Menu selection of File/Print (previous example estimated 4.35 secs.) Keyboard accelerator ALT-F to open the File pull down menu P key to select the Print menu item Assume hands start on keyboard

KSLM Comparison: Keyboard Accelerator for Print Use Keyboard for ALT-F P (hands already there) K(ALT)K(F)K(P) MK(ALT)MK(F)MK(P) MK(ALT)K(F)MK(P) 2M + 3K = 2.7 + 3K Times for K based on typing speed Good typist, K = 0.12 s, total time = 3.06 s Poor typist, K = 0.28 s, total time = 3.54 s Non-typist, K = 1.20 s, total time = 6.30 s Time with mouse was 4.35 sec Conclusion: Accelerator keys not faster than mouse for non- typists First K anticipates second K

What if you started on the mouse? Use Keyboard for ALT-F P (hands already there) H K(ALT)K(F)K(P) H MK(ALT) K(F)MK(P) H + 2M + 3K = 3.1 + 3K Times for K based on typing speed Good typist, K = 0.12 s, total time = 3.46 s Poor typist, K = 0.28 s, total time = 3.94 s Time with mouse was 4.35 sec Conclusion: Yup, still faster for reasonable typists.

Comparison Consider: compare selecting a menu item in a right-click popup menu vs. selecting same menu item from menu in menu bar What would Fitt’s Law say? What would KSLM say?

One more practice Draw through text and make it bold By pointing to BOLD icon in floating palette By selecting BOLD from pull-down menu

Next week Observation and Experiments Evaluation plan feedback Highly recommended: start on your evaluation plan, put on a Wiki page or slide We’ll give each other feedback The more you have, the more feedback you get

Intro to Evaluation See how (un)usable your software really is…

Similar presentations

Presentation on theme: "Intro to Evaluation See how (un)usable your software really is…"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intro to Evaluation See how (un)usable your software really is…

Similar presentations

Presentation on theme: "Intro to Evaluation See how (un)usable your software really is…"— Presentation transcript:

Similar presentations

About project

Feedback