Intro to Evaluation See how (un)usable your software really is…

Why evaluation is done? Summative – assess an existing system – judge if it meets some criteria Formative – assess a system being designed – gather input to inform design Summative or formative? – Depends on maturity of system how evaluation results will be used – Same technique can be used for either

Other distinctions Form of results of obtained – Quantitative – Qualitative Who is experimenting with the design – End users – HCI experts Approach – Experimental – Naturalistic – Predictive

Evaluation techniques Predictive Evaluation Interviews Questionnaires Observation Experiment Discount Evaluation techniques – Heuristic eval – Cognitive walkthrough

Predictive Evaluation Predict user performance and usability Rules or formulas based on experimentation – Quantitative – Predictive Fitt’s Law Keystroke Level Model

Fitts’ Law (p. 713) Models movement times for selection tasks Paul Fitts: war-time human factors pioneer Basic idea: Movement time for a well- rehearsed selection task – Increases as the distance to the target increases – Decreases as the size of the target increases

Movement Time MT = a + b log 2 (2D/W) Empirical measurement establishes constants a and b Different for different devices and different ways the same device is used. D W START STOP

Questions What do you do in 2D? – h x l rect: one way is ID = log 2 (d/min(w, l) + 1) – Should take into account direction of approach

Applications When does it apply? How used in interface design?

Keystroke Level Model (KLM) Page 708 Skilled users performing routine tasks – Assumes error-free performance Analyze observable behaviors – Keystrokes, mouse movements Assigns times to basic human operations - experimentally verified

Experimentally verified timings Keystroking :.22 sec for good typists,.28 average, 1.2 if unfamiliar with keyboard Mouse Button press: down or up - 0.1 secs; click - 0.2 secs Pointing (typically with mouse): 1.1 secs Hand movement between keyboard and mouse: 0.4 secs Drawing straight line segments “Mental preparation”: 1.35 secs System Response time: depends on system

Example : MS Word Find Command Use Find Command to locate a six character word – H (Hand to mouse) – P (Edit) – B (click on mouse button - press/release) – M (mental prep) – P (Find) – B (click on mouse button) – H (Hand on keyboard) – 6K (Type six characters into Find dialogue box) – M (mental prep) – K (Return key on dialogue box starts the find) 2H + 2P + 2B + 2M + 7K Predicted time = 7.86 secs (for average typists)

KLM Comparison Problem Are keyboard accelerators always faster than menu selection? Assume hands start on keyboard So let’s compare – Menu selection of File/Print – Keyboard accelerator ALT-F to open the File pull down menu P key to select the Print menu item

File: Print Menu selection H (get hand on mouse) P (point to File menu bar item) B (left-click with mouse button) M (mental prep) P (point to Print menu item) B (left-click with mouse button) Timings – H + 2P + 2B + 1 M Total predicted time = 4.35 sec

Keyboard Accelerator for Print Use Keyboard for ALT-F P (hands already there) – M – K(ALT) – K(F) – M – K(P) – = 2M + 3K = 2.7 + 3K Times for K based on typing speed – Average typist, K = 0.28 s, total time = 3.54 s – Non-typist, K = 1.20 s, total time = 6.30 s – Time with mouse was 4.35 sec Conclusion: Accelerator keys may not be faster than mouse for non-typists

What if you started on the mouse? H MK(ALT)K(F)MK(P) H + 2M + 3K = 3.1 + 3K Times for K based on typing speed – Good typist, K = 0.22 s, total time = 3.76 s – Average typist, K = 0.28 s, total time = 3.94 s Menu selection was 4.35 sec when starting on keyboard – To start on mouse its 4.35 – H = 3.95 seconds Conclusion: Hmmm… not faster for average typists

What to remember about KLM Command execution can be broken down into smallest human movements/actions – Such as key press, mouse click, move mouse, etc. Can estimate performance time for different ways of executing commands – For experts, on a well rehearsed task, with no errors Good for ballpark estimates, and for comparing command options

Interviews & Questionnaires Ask users what they think about your prototype / design / ideas – Qualitative & quantitative – Subjective – End users or other stakeholders Often accompanies other methods to get subjective feedback

Observation Watch users perform tasks with your interface – Qualitative & quantitative – Objective – Experimental or naturalistic Variations – Think-aloud – Cooperative evaluation

Experiments Test hypotheses about your interface – Quantitative – Objective – Experimental Examine dependent variables against independent variables – Often used to compare two designs or compare performance between groups Next time…

Discount usability techniques Fast and cheap method to get broad feedback – Use HCI experts instead of users – Qualitative mostly Heuristic evaluation – Several experts examine interface using guiding heuristics (like the ones we used in design) Cognitive Walkthrough – Several experts assess learnability of interface for novices

And still more techniques Diary studies – Users relate experiences on a regular basis – Can write down, call in, etc. Experience Sampling Technique – Interrupt users with very short questionnaire on a random- ish basis Good to get idea of regular and long term use in the field (real world)

A “typical” usability study Bring users into a lab Questionnaire to get demographic info Introduce them to your interface Give them a script or several tasks and ask to complete them – Look for errors & problems, performance, etc. Interview or questionnaire after to get additional feedback

Usability Lab http://www.surgeworks.com/services/observ ation_room2.htm Large viewing area in this one- way mirror which includes an angled sheet of glass the improves light capture and prevents sound transmission between rooms. Doors for participant room and observation rooms are located such that participants are unaware of observers movements in and out of the observation room.

General Recommendations Identify evaluation goals Include both objective & subjective data – e.g. “completion time” and “preference” Use multiple measures, within a type – e.g. “reaction time” and “accuracy” Use quantitative measures where possible – e.g. preference score (on a scale of 1-7) Note: Only gather the data required; do so with minimum interruption, hassle, time, etc.

Making Conclusions Where did you meet your criteria? Where didn’t you? What were the problems? How serious are these problems? What design changes should be made? – But don’t make things worse… Prioritize and plan changes to the design Iterate on entire process

Example: Heather’s study Software: MeetingViewer interface fully functional Criteria – learnability, efficiency, see what aspects of interface get used, what might be missing Resources – subjects were students in a research group, just me as evaluator, plenty of time Wanted completely authentic experience

Heather’s software

Heather’s evaluation Task: answer questions from a recorded meeting, use my software as desired Think-aloud Video taped, software logs Also had post questionnaire Wrote my own code for log analysis Watched video and matched behavior to software logs

Example materials

Data analysis Basic data compiled: – Time to answer a question (or give up) – Number of clicks on each type of item – Number of times audio played – Length of audio played – User’s stated difficulty with task – User’s suggestions for improvements More complicated: – Overall patterns of behavior in using the interface – User strategies for finding information

Data presentation

Some usability conclusions Need fast forward and reverse buttons (minor impact) Audio too slow to load (minor impact) Target labels are confusing, need something different that shows dynamics (medium impact) Need more labeling on timeline (medium impact) Need different place for notes vs. presentations (major impact)

Movie ticket kiosk A flash prototype, with 4 sample movies and a couple of times for each. Have a touch screen laptop. Plan to do a standard usability study Usability principles? What metrics to look for?

Reminder: (some) usability goals Learnability Predictability Familiarity Generalizability Consistency Error prevention Recoverability Observability Responsiveness Visibility Flexibility Customizability Satisfying Engaging Motivating Efficient Aesthetic

Evaluation planning Decide on techniques, tasks, materials – What are usability criteria? – How do you measure? How many people, how long How to record data, how to analyze data Prepare materials – interfaces, storyboards, questionnaires, etc. Pilot the entire evaluation – Test all materials, tasks, questionnaires, etc. – Find and fix the problems with wording, assumptions – Get good feel for length of study

Your turn: assignment Due Wednesday: evaluation plan – What are your goals? – How you will test each one – basic idea – Early drafts of any materials tasks you want people to do, questionnaires, interview questions, etc. Completed draft goes in part 2 writeup

Your turn: in class In your project groups Think about your usability goals you identified previously: – What is best way to measure each one? Ask the user? Observe the user? What measurement?

Reminder: (some) usability goals Learnability Predictability Familiarity Generalizability Consistency Error prevention Recoverability Observability Responsiveness Visibility Flexibility Customizability Satisfying Engaging Motivating Efficient Aesthetic

Intro to Evaluation See how (un)usable your software really is…

Similar presentations

Presentation on theme: "Intro to Evaluation See how (un)usable your software really is…"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intro to Evaluation See how (un)usable your software really is…

Similar presentations

Presentation on theme: "Intro to Evaluation See how (un)usable your software really is…"— Presentation transcript:

Similar presentations

About project

Feedback