CIF Usability Test Detailed Method Outline

Slides:

Advertisements

Similar presentations

Adesh Singh, Madhuri Kolhatkar, Jayanth Ananthakrishnan

Advertisements

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.

Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group Part 1: Usability Testing.

Configuration management

Software change management

Configuration management

Jacqui Dowd Introduction to LibQUAL+ University of Westminster 5 th February 2010 LibQUAL+ v LibQUAL Lite at the University of Glasgow.

Configuration Management

Brief introduction to Primary Care (GP) SOAR users Last updated: 24 th August 2012.

Module 2.4 Workforce Transition Planning and Tool March

User Interface Design.

Test Inventory A “successful” test effort may include: –Finding “bugs” –Ensuring the bugs are removed –Show that the system or parts of the system works.

Software Quality Assurance Plan

Damian Gordon.  Static Testing is the testing of a component or system at a specification or implementation level without execution of the software.

Each individual person is working on a GUI subset. The goal is for you to create screens for three specific tasks your user will do from your GUI Project.

1 © 2006 by Smiths Group: Proprietary Data Smiths Group Online Performance Review Tool Training.

Chapter 14: Usability testing and field studies. 2 FJK User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept.

12-1 MM2711 Introduction to Marketing Marketing Research Week 12.

Usable Privacy and Security Carnegie Mellon University Spring 2008 Lorrie Cranor 1 Designing user studies February.

Testing and Modeling Users Kristina Winbladh & Ramzi Nasr.

System Implementation

Systems Analysis and Design in a Changing World, 6th Edition

Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.

Software Development, Programming, Testing & Implementation.

Statistical Process Control

Training on measures… We strongly recommended that you watch these on demand videos because all directions, cover sheets, etc. are housed here. OR.

The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.

Proposal Writing.

Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.

What is Business Analysis Planning & Monitoring?

Mantova 18/10/2002 "A Roadmap to New Product Development" Supporting Innovation Through The NPD Process and the Creation of Spin-off Companies.

Introduction to Software Quality Assurance (SQA)

BTS730 Communications Management Chapter 10, Information Technology Management, 5ed.

1 BTEC HNC Systems Support Castle College 2007/8 Systems Analysis Lecture 9 Introduction to Design.

® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ

Dobrin / Keller / Weisser : Technical Communication in the Twenty-First Century. © 2008 Pearson Education. Upper Saddle River, NJ, All Rights Reserved.

Incorporating Pragmatic Usability Testing Into a Software Test Plan Carla Merrill, Ph.D. Focused Design focuseddesign.com

Medium Size Software, Inc. SQA Plan: The Batch Processing Application.

Part 1-Intro; Part 2- Req; Part 3- Design  Chapter 20 Why evaluate the usability of user interface designs?  Chapter 21 Deciding on what you need to.

Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.

1 Cronbach’s Alpha It is very common in psychological research to collect multiple measures of the same construct. For example, in a questionnaire designed.

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley The Resonant Interface HCI Foundations for Interaction Design First Edition.

Instructions for Preparing Laboratory Reports EnvE 214 Fluid Mechanics and Thermal Sciences Fall 2005.

To accompany Quantitative Analysis for Management, 9e by Render/Stair/Hanna 17-1 © 2006 by Prentice Hall, Inc. Upper Saddle River, NJ Chapter 17.

On-line briefing for Program Directors and Staff 1.

Conducting Usability Tests 4 Step Process. Step 1 – Plan and Prep Step 2 – Find Participants Step 3 – Conduct the Session Step 4 – Analyze Data and Make.

COMP 208/214/215/216 – Lecture 8 Demonstrations and Portfolios.

Chapter 3: Software Project Management Metrics

Inspection and Review The main objective of an Inspection or a Review is to Detect Defects. (Today -there may be some other goals or broader definition.

27/3/2008 1/16 A FRAMEWORK FOR REQUIREMENTS ENGINEERING PROCESS DEVELOPMENT (FRERE) Dr. Li Jiang School of Computer Science The.

Usability Evaluation, part 2. REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter?

Software Quality Assurance SOFTWARE DEFECT. Defect Repair Defect Repair is a process of repairing the defective part or replacing it, as needed. For example,

June 5, 2007Mohamad Eid Usability Testing Chapter 8.

Program Evaluation Making sure instruction works..

1 Evaluating the User Experience in CAA Environments: What affects User Satisfaction? Gavin Sim Janet C Read Phil Holifield.

Usability Engineering Dr. Dania Bilal IS 587 Fall 2007.

Copyright © 2009 Pearson Education, Inc. Slide 4- 1 Practice – Ch4 #26: A meteorologist preparing a talk about global warming compiled a list of weekly.

Evaluation / Usability. ImplementDesignAnalysisEvaluateDevelop ADDIE.

Component D: Activity D.3: Surveys Department EU Twinning Project.

Analysis Manager Training Module

What every benchmarking coordinator needs to know

Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts

Systems Analysis and Design

Software Documentation

Usability Evaluation, part 2

Multi Rater Feedback Surveys FAQs for Participants

Multi Rater Feedback Surveys FAQs for Participants

Collaboration with Google Drive

Data Collection An Overview of the AEMS Sampling Plan and

Week Three Review.

Presentation transcript:

CIF Usability Test Detailed Method Outline Dilip Chetan, author Sam Rajkumar, author Velynda Prakhantree, reviewer, group lead Amy Chen, reviewer Somnath Lokesh, reviewer March 2006

Overview What is CIF Usability Test? Features of a CIF Test When to conduct CIF When not to conduct CIF Pros & Cons Metrics & Definitions Investment Methodology Data Analysis Reporting Follow up Appendix & Resources

What is a CIF Usability Test? A formal, lab-based test to benchmark the usability of a product via performance and subjective data Goal: to measure performance of users on a set of core tasks to predict how usable the product will be with real customers. Users should be domain experts but not be experienced with the application The name CIF, or Common Industry Format, refers to the standardized method for reporting usability test findings that is used across the software industry CIF was developed by a group of 300+ software organizations led by NIST CIF became an ANSI standard in December, 2001 (ANSI/NCITS 354-2001)

Features of a CIF Test Always tested on beta or live released code. Code must be stable enough to collect meaningful timing data Conducted in a controlled environment, almost always in a usability lab Users should not be experienced with the application being tested Number of users: 10+ (with 8 being absolute minimum) Deliverables: Quick findings presentation Report in standard CIF format

When to conduct a CIF Test When your focus is on gauging the overall usability of a product (live beta or released code), and not on probing users’ thought processes When you want to create a standardized benchmark for comparing future versions of the product, or possibly similar products When you have sufficient time (this is a time intensive activity) – at least 3 months, typically longer

When NOT to conduct a CIF Test User Evaluation When the goals are to get some measures of user performance across participants, while maintaining the ability to probe the user’s thought process as needed during tasks. When you are not concerned with measuring performance. When the product is not sufficiently interactive, and needs to be driven by the test administrator. When more than one participant is providing feedback on the prototype at the same time. When there is a desire (and resources) to rapidly iterate through designs in a short time. When your primary questions do not require a task-based method. When there is a desire to elicit feedback from users experienced with the application (though demo/interview is most common, other methods may be used). Formative Assessment RITE Demo w/ Structured Interview

Pros Provides quantitative measures of software usability, including time on task Allows for benchmarking – future comparisons Uses an industry standardized format for reporting findings Is a very structured method for conducting the test, giving the most experimental control to make conclusions across participants

Cons High investment of time and resources Since time on task is considered, the UE cannot probe the participant during the task to get richer feedback Feedback about usability issues can only be incorporated into future releases Not a means of gathering feedback from users experienced with the application

Required Quantitative Metrics Time on task Number of Errors per participant per task Any click or action off the successful path to task completion is an error There may be more than one successful path Number of accesses to help documentation (online or other help sources) when available Number of Assists per participant per task Any help or guidance offered by the moderator is an assist. See “When to Assist Guidelines” document Users are specifically instructed to “try everything they can” to complete tasks. Users are told to say, “I’m stuck!” if they have exhausted all ideas about how to continue. See “Introductory Script” Unassisted task completion rate (successfully completes task with 0 assists and within the timeout criterion) Assisted completion rate (successfully completes task with 2 or fewer assists and within the timeout criterion)

Required Qualitative Metrics SUMI Software Usability Measurement Inventory – a standardized tool for measuring user perceptions of usability. See Appendix 1 for more details Oracle “4 Universal Scales” 4-item Likert scale for how “Easy to Use,” “Attractive,” “Useful,” and “Clear and Understandable” users perceive the product

Investment Activity Time* Notes Kick-off with product team and proposal (example) ½ week Secure testable environment 3-8 weeks Agree on user profile and create screener 1 week Prepare task script/scenarios** 1-3 weeks Seed necessary data for tasks Recruit participants 3 weeks Pilot test with 2-3 U|X or product team members Conduct benchmark session with expert user 1 day Conduct CIF test Preliminary data analysis Give quick findings presentation to key team members Finish analysis and give complete findings to entire product team 1-2 weeks Publish final report Notes Total time investment averages 3-6 months from kickoff to publishing a report. Securing a testable environment is often the biggest variable. Securing appropriate participants can also take much longer than anticipated, depending on the user profile. * Times for steps will vary for each test. Some steps may occur in parallel. ** For non self-service applications, a 5-10 minute introductory video is also required.

Methodology - 1 The general methodology involves the following steps: Define a target population Prepare a task list, scenario, and a test script Recruit / Schedule participants Determine expert times and compute timeout criterion. Conduct pilot testing Conduct tests and gather data Analyze the data Prepare list of issues and recommendations Prepare report

Methodology Benchmark Times and Timeout Criterion Definitions Benchmark Time typically refers to the time taken by an expert user to complete a task. Another possibly method of arriving at a benchmark time is to use the expectations of the Product Development team or the customer, of reasonable time to complete a task. Timeout Criterion refers to the maximum reasonable time for a user to complete a task. The task is regarded as a failure if the user exceeds the timeout criterion.

Methodology - Timeout Criterion Methods for arriving at the Timeout Criterion 1. Multiplier approach (see next two slides for details) Timeout Criterion = Benchmark Time (Expert time) X (multiplier between 2 – 5) or 2. The Product Management/Development team’s expectation of the reasonable maximum time for a first time user to complete a task can also be used as a Timeout Criterion. E.g. Product install must be completed in X minutes. or 3. A combination of the above two for different tasks

Methodology – Timeout Criterion Multiplier approach: Deciding on the multiplier The multiplier should be a mutually agreed upon number between the UX Team and the Product Team Typically this is based on the complexity and/or length of the tasks The multiplier can be the same for all tasks, or differ across tasks. For example, if the tasks vary in complexity or length, a multiplier of 3 can be applied to some tasks, while a multiplier of 5 can be applied to others. For example: Use a higher multiplier when a task or set of tasks are short compared to other tasks. Tasks less than a minute in length – use multiplier of 5 Tasks over a minute – use multiplier of 3 Historically, UE’s have used a multiplier of 3 for most CIF tests

Methodology – Timeout Criterion Multiplier approach: Process Again, the goal is to arrive at reasonable maximum time for a first time user to complete the tasks. The task is regarded as a failure if the user exceeds the timeout criterion. Have an expert complete the tasks as efficiently as possible under controlled conditions in the test environment The expert is someone who is familiar with the application –typically a Product Developer or Product Manager For each task, multiply the expert time by a mutually agreed upon factor of between 2 to 5 to arrive at the timeout criterion Example Task Description Benchmark Time (hh:mm:ss) Multiplier Timeout Criterion (hh:mm:ss) A Define an authorized delegate 00:00:25 5 00:02:05 B Create an Expense Report 00:08:40 3 00:26:00 C Update and submit a saved Expense Report 00:01:18 00:03:54

Quantitative Data Analysis Mean performance measures across all tasks and participants Assisted Task Completion Rate Unassisted Task Completion Rate Mean Total Task Time Mean Total Errors Mean Total Assists Mean performance measures across all tasks and participants

Quantitative Data Analysis (continued) Performance Results by task across all participants (example table is shown in following slide) Assisted Task Completion Rate Unassisted Task Completion Rate Task Time Errors Assists Descriptive statistics - The above results are reported using the following descriptive statistics Mean SD Standard error Min and Max

Example – Results by Task across all Participants Performance Result for Task x by participant

Quantitative Data Analysis (continued) Performance Results by Participant across all Tasks (example in following slide) Assisted Task Completion Rate Unassisted Task Completion Rate Task Time Errors Assists Descriptive statistics Mean SD Standard error Min and Max

Example – Results by Participant across all Tasks Summary performance results across all tasks by participant

Qualitative Data Analysis - SUMI Example for SUMI This graph shows medians (the middle score when the scores are arranged in numerical order) and the upper (Ucl) and lower (Lcl) confidence limits. The confidence limits represent the limits within which the theoretical true score lies 95% of the time for this sample of users. Excerpt from Oracle Enterprise Manager 10g Grid Control Release 1 (10.1.0.3), Out of the Box Experience (OOBE) in Red Hat Enterprise Linux 3.0 Tested by: Arin Bhowmick, Darcy Menard, March 2005

Qualitative Data Analysis – Oracle Universal Scales Example for Oracle Universal Scales: Time and Labor Excerpt from Oracle Time and Labor 11.5.10. Tested by: Frank Y. Guo and Sajitha Narayan, December 2004-March 2005

Priority / # Users Affected Design Recommendation Qualitative Data Analysis - Table of Usability issues and design recommendations # Issue Priority / # Users Affected Design Recommendation Status / Bug # 1. Eight users needed to be assisted to see and go to the Tracking subtab. HIGH 8/10 Incorporate tracking functionality into what is currently termed the Workbench. Remove the Tracking subtab. Accepted #37654897 2. Four users requested that the system automatically re-price the delivery when a new order was added. MEDIUM 4/10 Incorporate automatic repricing. Not Accepted Technology does not support. 3. Two users did not remember to click the Audit and Payment button, and needed assistance. LOW 2/10 More design discussion is needed for the entire Audit and Payment section. Pending Design Discussion

Report Sections Executive Summary Product Description Test Objectives Methods Participants (User profile and screener) Context of product use in the test Task order Test facility Participants’ computing environment Test administrator tools Experimental design Procedure Participant general instructions Participant task instructions Usability metrics – Effectiveness, Efficiency, and Satisfaction

Report (Continued) Results Data Analysis Presentation of Results Data Scoring Data Reduction Statistical Analysis Presentation of Results Performance Results Satisfaction Results SUMI Results Design Issues What worked What did not work Table of Usability issues and design recommendations Appendices

Usability Issues Reporting The body of the report (in the results section) will include complete descriptions of the most outstanding issues, along with screenshots and callouts as needed to adequately describe the issues A separate table (see example on slide 22) will include ALL the issues and design recommendations, with a column for tracking the status of the recommendation going forward. Additional screenshots / callouts (as needed) may be included following this table, and referenced within table

Follow-up Results Presentation to product team Commonly, a “Quick Findings” presentation is given to key team members within a week of testing. A complete findings presentation to all team members may follow after 1-2 weeks Logging bugs for the agreed-upon usability recommendations is required A member of the product team should ideally agree to do this prior to running the test (negotiable with team, UE may also file bugs if desired) Bug logging instructions document. Send final report to the product team, including the director-level or above Publish final report to Apps UX website and the UI lab doc drop

Appendix – Introduction to SUMI The SUMI Questionnaire is a professionally produced, standardized tool for measuring user’s perceptions of the usability of commercial software, based on users’ hands-on exposure to the software. SUMI output scores are standard scores falling along a normal distribution in which the mean is set at 50 and the standard deviation is 10. Thus, scores can range from 0 to 100, but the majority (66%) of SUMI scores will lie between 40 and 60. For any of the SUMI scales, products falling below 50 are considered to be substandard for the software industry. For a score equal to 40, only 16% of software scores are worse. But for a score of 60, only 16% of software scores are better. The SUMI questionnaire results are displayed in terms of 6 dimensions plotted in a profile graph: global, efficiency, affect, helpfulness, control, and learnability. The global scale measures the usability of the software as a whole, and is a weighted combination of the other dimensions. Efficiency measures the degree to the software enables users to get their work done. Affect measures how likeable and enjoyable it is to interact with the software. Helpfulness measures how well the software assists users, particularly when they encounter errors. Control measures how well the software lets users make it do what they want or expect. Learnability measures how easy it is to master the software, including the ability to grasp new functions and features.

Resources CIF Testing Process CIF Report Template – in progress Sample Report 1 Sample Report 2 including Help access (Old format) Introductory Script Introductory Video Script When To Assist document (Under review) Oracle Universal Scales SUMI (scroll down to SUMI information section) CIF FAQs – in progress