IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley.

IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley

IBM Almaden, Oct 2000 The Usability Gap 196M new Web sites in the next 5 years [Nielsen99] ~20,000 user interface professionals [Nielson99]

IBM Almaden, Oct 2000 The Usability Gap Most sites have inadequate usability [Forrester, Spool, Hurst] (users can’t find what they want 39-66% of the time) 196 M new Web sites in the next 5 years [Nielsen99] A shortage of user interface professionals [Nielson99]

IBM Almaden, Oct 2000 Usability effects the bottom line IBM case study [1999] Spent $millions to redesign site 84% decrease in help usage 400% increase in sales Attributed to improvements in information architecture

IBM Almaden, Oct 2000 Usability effects the bottom line IBM case study [1999] Spent $millions to redesign site 84% decrease in help usage 400% increase in sales Attributed to improvements in information architecture Creative Good Study [1999] Studied 10 e-commerce sites 59% attempts failed If 25% of these had succeeded -> estimated additional $3.9B in sales

IBM Almaden, Oct 2000 Talk Outline Web Site Design Automated Usability Evaluation Our approach WebTANGO Some Empirical Results Wrap-up Joint work with Melody Ivory & Rashmi Sinha

IBM Almaden, Oct 2000 Web Site Design (Newman et al. 00) Information design structure, categories of information Navigation design interaction with information structure Graphic design visual presentation of information and navigation (color, typography, etc.) Courtesy of Mark Newman

IBM Almaden, Oct 2000 Information Architecture includes management and more responsibility for content User Interface Design includes testing and evaluation Web Site Design (Newman et al. 00) Courtesy of Mark Newman

IBM Almaden, Oct 2000 Web Site Design Process Discovery Assemble information relevant to project Design Exploration Explore alternative design approaches (information, navigation, and graphic) Design Refinement Select one approach and iteratively refine it Production Create prototypes and specifications Courtesy of Mark Newman Start

IBM Almaden, Oct 2000 Iteration Design Prototype Evaluate

IBM Almaden, Oct 2000 Usability Evaluation Standard Techniques  User studies  Potential users use the interface to complete some tasks  Requires an implemented interface  "Discount" Usability Evaluation  Heuristic Evaluation  Usability expert assesses guidelines

IBM Almaden, Oct 2000 Automated UE We looked at 124 methods AUE is greatly under-explored Only 36% of all methods Fewer methods for the web (28%) Most techniques require some testing Only 18% are free from user testing Only 6% for the web

IBM Almaden, Oct 2000 Survey of Automated UE Predominant methods (Web) Structural analysis (4) Bobby, Scholtz & Laskowski 98, Stein 97 Guideline Reviews (11) Log file analysis (9) Chi et al. 00, Drott 98, Fuller & de Graaff 96, Guzdial et al., Sullivan 97, Theng & Marsden 98 Simulation (2) Webcriteria (Max), Chi et al. 00

IBM Almaden, Oct 2000 Existing Metrics  Web metric analysis tools report on what is easy to measure  Predicted download time  Depth/breadth of site  We want to worry about  Content  User goals/tasks  We also want to compare alternative designs.

IBM Almaden, Oct 2000 Web TANGO Tool for Assessing NaviGation & Organization Goal: automated support for comparing design alternatives How: Assess usability of the information architecture Approximate information-seeking behavior Output quantitative usability metrics

IBM Almaden, Oct 2000 Benefits/Tradeoffs Benefits Less expensive than traditional methods Use early in design process Tradeoffs Accuracy? Validate methodology with user studies Illustrate different problems than traditional methods For comparison purposes only Does not capture subjective measures

IBM Almaden, Oct 2000 Information-Centric Sites museum, history news, magazines government info

IBM Almaden, Oct 2000 Guidelines There are many usability guidelines A survey of 21 sets of web guidelines found little overlap (Ratner et al. 96) Why? Our hypothesis: not empirically validated So … let’s figure out what works!

IBM Almaden, Oct 2000 An Empirical Study: Which features distinguish well-designed web pages?

IBM Almaden, Oct 2000 Methodology Collect quantitative measures from 2 groups Ranked: Sites rated favorably via expert review or user ratings Unranked: Sites that have not been rated favorably Statistically compare the groups Predict group membership

IBM Almaden, Oct 2000 Quantitative Measures Identified 42 aspects from the literature Page Composition (e.g., words, links, images) Page Formatting (e.g., fonts, lists, colors) Overall Page Characteristics (e.g., information & layout quality, download speed)

IBM Almaden, Oct 2000 Metrics Word Count Body Text Percentage Emphasized Body Text Percentage Text Positioning Count Text Cluster Count Link Count Page Size Graphic Percentage Graphics Count Color Count Font Count Reading Complexity

IBM Almaden, Oct 2000 Data Collection Collected data for 2,015 information-centric pages from 463 sites Education, government, newspaper, etc. Data constraints At least 30 words No e-commerce pages Exhibit high self-containment (i.e., no style sheets, scripts, applets, etc.) 1,054 pages fit constraints (52%)

IBM Almaden, Oct 2000 Data Collection Ranked pages Favorably assessed by expert review or user rating on expert-chosen sites Sources: Yahoo! 101 (ER) Web 100 (UR) PC Mag Top 100 (ER) WiseCat’s Top 100 (ER) Webby Awards (ER) & Peoples Voice (UR)

IBM Almaden, Oct 2000 Data Collection Unranked Not favorably assessed by expert review or user rating on expert-chosen sites Do not assume unranked = unfavorable Sources: WebCriteria’s Industry Benchmark Yahoo Business & Economy Category Others

IBM Almaden, Oct 2000 Data Analysis 428 pages 214 ranked pages 840 unranked pages 214 chosen randomly

IBM Almaden, Oct 2000 Findings Several features are significantly associated with ranked sites Several pairs of features correlate strongly Correlations mean different things in ranked vs. unranked pages Significant features are partially successful at predicting if site is ranked

IBM Almaden, Oct 2000 Significant Differences

IBM Almaden, Oct 2000 Significant Differences Ranked pages More text clustering (facilitates scanning) More links (facilitate info-seeking) More bytes (more content  facilitate info seeking) More images (clustering graphics  facilitates scanning) More colors (facilitates scanning) Lower reading complexity (close to best numbers in Spool study  facilitates scanning)

IBM Almaden, Oct 2000 Metric Correlations

IBM Almaden, Oct 2000 Metric Correlations Created hypotheses based on correlations: Ranked Pages Colored display text Link clustering  Both patterns on all pages in random sample Unranked Pages Display text coloring plus body text emphasis or clustering Link coloring or clustering Image links, simulated image maps, bulleted links  At least 2 patterns in 70% of random sample Confirmed by sampling

IBM Almaden, Oct 2000 Two Examples

IBM Almaden, Oct 2000 Ranked Page Colored display text Link clustering

IBM Almaden, Oct 2000 UnRanked Page Body text emphasis Image links

IBM Almaden, Oct 2000 Predicting Web Page Rating Linear Regression Explains 10% of difference between groups 63% Accuracy (better at unranked prediction)

IBM Almaden, Oct 2000 Predicting Web Page Rating Home vs. Non-home pages Text cluster count predicts home page ranking 66% accuracy Consistent with primary goal of home pages Non-home page prediction Consistent with full sample results 4 of 6 metrics (link count, text positioning count, color count, reading complexity)

IBM Almaden, Oct 2000 Another Rating System Web site ratings from RateItAll.com User ratings on 5-point scale (1= Terrible! 5 = Great!) No rating criteria Small set of 59 pages (61% ranked) 54% of pages classified consistently Only 17% unranked with high rating  unranked sites properly labeled 29% ranked with medium rating  difference between expert/non-expert review Ranking predicted by graphics count with 70% accuracy  Carefully design studies with non-experts

IBM Almaden, Oct 2000 Second study (new results) Better rating data Webby Awards Sites organized into categories New metrics computation tool More quantitative measures Process style sheets, inline frames Larger sample of pages

IBM Almaden, Oct 2000 Webby Awards 2000 27 categories We used finance, education, community, living, health, services 100 judges 6 criteria 3 rounds of judging We used first round only 2000 sites initially

IBM Almaden, Oct 2000 Webby Awards 2000 6 criteria Content Structure & navigation Visual design Functionality Interactivity Overall experience Factor analysis: first factor accounted for 91% of the variance Judgements somewhat normally distributed, with skew

IBM Almaden, Oct 2000 New Metrics

IBM Almaden, Oct 2000 Methodology Data collection 1108 pages 163 sites 3 levels per site 14 metrics About 85% accurate Text cluster and text positioning counts less accurate

IBM Almaden, Oct 2000 Preliminary Results Linear regression to predict Webby judges ratings Top 30% vs bottom 30% Prediction accuracy: 72% if categories not taken into account 83% if categories assessed separately

IBM Almaden, Oct 2000 Significant Metrics by Category

IBM Almaden, Oct 2000 Category-based Profiles K-means clustering of good sites, according to the metrics Preliminary results suggest the sites do cluster Can use clusters to create profiles of good and poor sites for each category These can be used as empircally verified guidelines

IBM Almaden, Oct 2000 Ramifications It is remarkable that such simple metrics predict so well Perhaps good design is good overall There may be other factors A foundation for a new methodology Empircal, bottom up Does this reflect cognitive principles? But, no one path to good design

IBM Almaden, Oct 2000 Longer Term Goal: A Simulator for Comparing Site Design

IBM Almaden, Oct 2000 Monte Carlo Simulation  Have a model of information structure  Have a set of user goals  Want to assess navigation structure  Compare alternatives/tradeoffs  Identify bottlenecks  Identify critically important pages/links  Check all pairs of start/end points  Check overall reachability before and after a change.

IBM Almaden, Oct 2000 One Monte Carlo simulation step for Design 1, Task 1. Simulation starts from the home page and the target information is at Renter Support. X

IBM Almaden, Oct 2000 Monte Carlo simulation results for Design 1, Task 1. Simulation runs start from all pages in the site. Average Navigation times are shown for Tasks 2 & 3. X

IBM Almaden, Oct 2000 Monte Carlo Simulation  At each step in the simulation  Assume a probability distribution over a set of next choices.  The next choice is a function of:  The current goal  The understandability of the choice  Prior interaction history  The overall complexity of the page  Varying the distribution corresponds to varying properties of the links  Spot-check important choices

IBM Almaden, Oct 2000 In Summary Automated Usability Assessment should help close the Web Usability Gap We can empirically distinguish between highly rated web pages and other pages Empirical validation of design guidelines Can build profiles of good vs. poor sites Are validating expert judgements with usability assessments via a user study Web use simulation is an under-explored and promising new approach

IBM Almaden, Oct 2000 Current Projects Automating Web Usability (Tango) Melody Ivory, Rashmi Sinha Text Data Mining (Lindi) Barbara Rosario, Steve Tu Metadata in Search Interfaces (Flamenco) Ame Elliott, Andy Chou Web Intranet Search (Cha-Cha) Mike Chen, Jamie Laflen

IBM Almaden, Oct 2000 More information: http://www.cs.berkeley.edu/~ivory/web http://www.sims.berkeley.edu/~hearst

IBM Almaden, Oct 2000

Automated Usability Evaluation  Logging/capture  Pro: Easy  Con: Requires implemented system  Con: Don't know the user task (web)  Con: Don't present alternatives  Con: Don't distinguish error from success  Analytical Modeling  Pro: doable at design phase  Con: models an expert  Con: academic exercise  Simulation

IBM Almaden, Oct 2000 Research Issues: Navigation Predictions Develop model for predicting link selection Requirements Information need (task metadata) Representation of pages (page metadata) Method for selecting links (relevance ranking) Maintaining user’s conceptual model during site traversal (scent [Fur97,LC98,Pir97])

IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley.

Similar presentations

Presentation on theme: "IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley.

Similar presentations

Presentation on theme: "IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback