Presentation is loading. Please wait.

Presentation is loading. Please wait.

STAT – User testing and evaluation 1 User testing and evaluation Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.

Similar presentations


Presentation on theme: "STAT – User testing and evaluation 1 User testing and evaluation Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4."— Presentation transcript:

1 STAT – User testing and evaluation 1 User testing and evaluation Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4

2 STAT – User testing and evaluation 2 Introduction We previously discussed measures of central tendency and dispersion, as descriptive statistics parameters, and we did a couple of statistic inferential tests. We will now look at a bit more about usability and user testing, and how statistics is applied there.

3 STAT – User testing and evaluation 3 Usability testing Historical developments: 1911 - Taylor study of 'which is the best way to do a job?' and 'what should constitute a day's work' to determine time standards for basic tasks 1911 - Frank (an Industrial Engineer) and Lillian (a Psychologist) Gobreth, studied the motions involved in bricklaying - reduced motions from 18 to 5, the development of the 'therblig' unit of motion Late 1940, psychologist at Write Patterson Air Force Base studies crashes to determine the cause, it was not what was expected A study on the effect of a redundant, high-centered taillight on rear end car crashes

4 STAT – User testing and evaluation 4 Usability testing Usability testing is a means for measuring how well people can use some human-made object (such as a web page, a computer interface, a document, or a device) for its intended purpose, i.e. usability testing measures the usability of the object. –Usability testing is the common name for user-based system evaluation. Popularized in the media by Jakob Neilson and usually thought of as related to web site design in the 1990's, Usability is defined as "the ability of a specific group of users to perform a specific set of activities within a specific environment with effectiveness, efficiency, and satisfaction (ISO 9241 standard) Systems are made up of users performing some activity within a context. [We] can't redesign users but we can design equipment, so our goal as designers is to 'design equipment to optimize system performance‘

5 STAT – User testing and evaluation 5 Usability testing If usability testing uncovers difficulties, such as people having difficulty understanding instructions, manipulating parts, or interpreting feedback, then developers should improve the design and test it again. During usability testing, the aim is to observe people using the product in as realistic a situation as possible, to discover errors and areas of improvement. In respect to usability, we look at the following factors: –Functional Suitability - does the product contain the functionality required by the user? –Ease-of-learning - can the user figure out how to exercise the functionality provided –Ease-of-use - can the user exercise the functionality accurately and efficiently (includes accessibility issues) –Ease-of-recall - can the knowledge of operation be easily maintained over time? –Subjective Preference

6 STAT – User testing and evaluation 6 Usability testing Four principles of usability –Functional 'Visibility' through an obvious 'visible' structure and adequate feedback –Good (conceptual) model to predict the effects of our actions (examples: Real World Metaphors - Trash Can icon, Shopping Carts) –Design for the Intended user (and not for yourself) –Design for Errors and Slips (or 'Don't Blame the User')

7 STAT – User testing and evaluation 7 Usability testing Setting up a usability test involves –carefully creating a scenario, or realistic situation, wherein the person performs a list of tasks using the product being tested while observers watch and take notes. –Several other test instruments such as scripted instructions, paper prototypes, and pre- and post-test questionnaires are also used to gather feedback on the product being tested. For example, to test the attachment function of an e-mail program, a scenario would describe a situation where a person needs to send an e-mail attachment, and ask him or her to undertake this task. The aim is to observe how people function in a realistic manner, so that developers can see problem areas, and what people like. Techniques popularly used to gather data during a usability test include think aloud protocol and eye tracking.

8 STAT – User testing and evaluation 8 Usability testing Two types of usability testing during design: Non-User Based –Expert Review –Compliance Reviews –Heuristic Evaluations –Cognitive Walkthroughs User-based –User Surveys –Ethnographic Observation –Performance-based –Think Aloud –Co-Discover

9 STAT – User testing and evaluation 9 User-based usability testing Several types of user based tests: –User surveys, which are inexpensive and can be conducted remotely, [and] can provide trend data –Ethnographic observation can be understood as contextual inquiery, which "places methodological emphasis on the rigorous description of the ways in which situated action is produced everyday –Performance-based testing sets a goal for the users to perform, and measures their performance, such as time taken for completion of the task thus it provides an 'objective' measure - good for comparative evaluations but also it is not a complete picture of usability, possibly misleading –Think-aloud protocols are probably the most common form of usability testing in use today, designed to capture participant's understanding –Co-discovery protocols are variation on Think Out Loud protocol, with a multiple participant perspective

10 STAT – User testing and evaluation 10 User-based usability testing Subjective assessment tells the evaluator how the users feel about the software being tested. This is distinct from how efficiently or effectively they perform with the software. The usual method of assessment is to used a standardized opinion questionnaire to avoid criticisms of subjectivity. Subjective measures part of user-based evaluation: –Self-reported ease-of-use measures (summative evaluations) SUS QUIS SUMI –Aesthetic value –User preferences Standard questionnaires

11 STAT – User testing and evaluation 11 User-based usability testing Subjective assessment tells the evaluator how the users feel about the software being tested. This is distinct from how efficiently or effectively they perform with the software. The usual method of assessment is to used a standardized opinion questionnaire to avoid criticisms of subjectivity. Subjective measures part of user-based evaluation: –Self-reported ease-of-use measures (summative evaluations) SUS QUIS SUMI –Aesthetic value –User preferences Standard questionnaires

12 STAT – User testing and evaluation 12 User-based usability testing Usability metrics or benchmarks - formal measurements that are used as guides to the level of usability of a product. Metrics include: –how fast a user can perform a task, –number of errors made on a task, –learning time, and –subjective ratings. Criterion testing - user testing that measures user performance to determine whether a target performance level has been reached. –A criterion test measures whether the game meets the requirements of the test. –For example: 'starting from the options button, the average player will be able to locate the screen resolution option within 10 seconds'.

13 STAT – User testing and evaluation 13 User-based usability testing Another division of user-based testing: –Formative evaluation, which is a user test performed during iterative design with the goal of finding usability problems to fix on the next design iteration. The purpose of formative evaluation is finding usability problems in order to fix them in the next design iteration. Formative evaluation doesn't need a full working implementation, but can be done on a variety of prototypes. –Field study - Running a test in a lab environment on tasks of your invention may not tell you enough about how well your interface will work in a real context on real tasks. A field study can answer these questions, –Controlled experiment - goal is to test a quantifiable hypothesis about one or more interfaces. Controlled experiments happen under carefully controlled conditions using carefully-designed tasks - often more carefully chosen than formative evaluation tasks. Hypotheses can only be tested by quantitative measurements of usability, like time elapsed, number of errors, or subjective satisfaction. Tests a hypothesis (e.g., interface X is faster than interface Y) Evaluates working implementation, in controlled lab environment, on chosen tasks Mostly quantitative observations (time, error rate, satisfaction)

14 STAT – User testing and evaluation 14 Formative evaluation The basic steps: –(1) find some representative users (should be representative of the target user class[es], based on user analysis); –(2) give each user some representative tasks (should be representative of important tasks, based on task analysis); and –(3) watch the user do the tasks. There are three roles in a formative evaluation test: –a user, –a facilitator, and –some observers.

15 STAT – User testing and evaluation 15 Review Arithmetic mean Median Mode Range Variance Standard deviation Quantiles, Interquartile range Probability distributions – uniform, normal (Gaussian) and T Standard error – inference about unreliability Confidence interval Single sample t-test Two sample F-test and t-test. Measures of Central tendency (location) Measure of Statistical variability (dispersion - spread) Descriptive statistics Inferential statistics


Download ppt "STAT – User testing and evaluation 1 User testing and evaluation Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4."

Similar presentations


Ads by Google