Presentation on theme: "Data Triangulation in a User Evaluation of the Sealife Semantic Web Browsers Helen Oliver Patty Kostkova Ed de Quincey City eHealth Research Centre (CeRC)"— Presentation transcript:
Data Triangulation in a User Evaluation of the Sealife Semantic Web Browsers Helen Oliver Patty Kostkova Ed de Quincey City eHealth Research Centre (CeRC) City University London
User-Centred Evaluation of Semantic Web Browsers The Semantic Web for Life Sciences –Browse for meaning –Find answers to critical questions faster –Computer scientists love SWBs! First-ever user-centred evaluation of SWBs recruiting REAL-WORLD users –Do real users love SWBs too? Realistic user-centred evaluation has been neglected for SWBs!
User-Centred Evaluation of Semantic Web Browsers Use Triangulation to consider all angles –Essential to our innovative evaluation framework ( Quantitative data: Web server logs Questionnaire results + Qualitative data: Semi-structured interviews ) = (Validation AND Completeness) Triangulation has been neglected in user-centred evaluations of SWBs!
Group A1: Infectious Disease Professionals CORESE-based SWB vs NeLI COHSE vs NeLI
Group A2: Microbiologists GoPubMed/GoGene vs PubMed
Use of Triangulation for Semantic Web Quantitative Data Sources: –Web Form Questionnaires Pre-questionnaire Post-task questionnaires Post-questionnaire –Web Server Logs Qualitative Data Sources: –Semi-Structured Interviews (subset of participants) Evaluation Settings: –Online –Workshops
Value of Data Triangulation in Interpreting the Results Questionnaires –Findability –Usability –System Speed –Relevance –Likeability Web Server Logs –Task Completion Time –Usage of Semantic Links –# of External Pages Viewed –Views of Target Documents Semi-Structured Interviews –Answers to questions we didn’t think to ask… –Observe participants to assess system intuitiveness
Sealife Results COHSE: 67 respondents 39 online 28 in workshops CORESE: 14 respondents 2 online (only 1 completed) 12 in workshops GoPubMed: 137 online 4 in workshop GoGene + Extended GoPubMed: 14 in workshop Qualitative results not statistically significant (few interviews conducted)
Web Server Logs PubMed was faster than GoGene Faster => Better… So, users liked PubMed better than GoGene – right? Web Server Logs Don’t Lie!
Questionnaires Best for: –Likeability –Information Findability –Relevance –System Speed GoPubMed/GoGene –Usability COHSECOHSE Highest Number of Positive Ratings:Highest Number of Positive Ratings: –GoPubMed/GoGene Largest Positive Mode Differences Between Control and Intervention:Largest Positive Mode Differences Between Control and Intervention: –GoPubMed/GoGene Fewest Negative Mode Ratings Compared to Control:Fewest Negative Mode Ratings Compared to Control: –GoPubMed/GoGene NEVER had worse mode scores than PubMed!
Semi-Structured Interviews So the winner is GoPubMed/GoGeneSo the winner is GoPubMed/GoGene COHSE was rated the most usableCOHSE was rated the most usable –what more could we want? Well…Well… –Critiques in GoPubMed/GoGene interviews were about the details –Critiques in COHSE/CORESE interviews were about being able to use the systems at all At first, it turned out that some could not tell control from intervention!At first, it turned out that some could not tell control from intervention! When asked for critiques of COHSE or CORESE, users gave abundant detail… about NeLI!When asked for critiques of COHSE or CORESE, users gave abundant detail… about NeLI! –Yes, but what about COHSE? –Yes, but what about COHSE? “Those awful little boxes? They were really distracting, I didn’t really understand what they were.” Presentations explaining the SWBs improved users’ understandingPresentations explaining the SWBs improved users’ understanding
Validation We were expecting discrepancy between logs, questionnaires, and interviewsWe were expecting discrepancy between logs, questionnaires, and interviews –True for COHSE’s findability ratings Workshop users rated it as adequate or goodWorkshop users rated it as adequate or good Logs showed that none of these users had found the answerLogs showed that none of these users had found the answer –Triangulation revealed discrepancies in plausible results –Otherwise users were generally consistent We suspected one user of giving fake answers because she was exceptionally positive in her questionnaires and interviewWe suspected one user of giving fake answers because she was exceptionally positive in her questionnaires and interview –Task logs showed that she was one of the fastest (1-2 min per task) »…but 2 others were faster! –Logs showed that she activated 4 link boxes »…matching the median for all respondents –Logs showed that she viewed only 1 external page »…but some users didn’t view any and of those who did, 1 page was the mode –Triangulation validated suspicious results
Completeness Logs showed that interviewees who spoke negatively about COHSE often had spent a long time on itLogs showed that interviewees who spoke negatively about COHSE often had spent a long time on it –Longer than 5 minutes –Longer than they spent on the control platform Several users spent more time on GoGene than on PubMed or the extended GoPubMed, but:Several users spent more time on GoGene than on PubMed or the extended GoPubMed, but: –Said GoGene was their favourite –Rated it highly on the questionnaires Triangulation shows the whole pictureTriangulation shows the whole picture –Faster ! => better –Slower ! => worse
Discussion –GoPubMed/GoGene workshop confirmed positive impressions –CORESE workshop confirmed negative questionnaire results –GoPubMed/GoGene workshop also confirmed: That problems with this SWB were the most trivialThat problems with this SWB were the most trivial That somewhat higher questionnaire results masked dramatically better user experiencesThat somewhat higher questionnaire results masked dramatically better user experiences –Impressions that COHSE was more usable were quashed by contact with users at workshop Severity of problems would have gone undetected without interviewsSeverity of problems would have gone undetected without interviews –Low number of interviews means triangulation was not complete Recruitment difficult given time pressures on user baseRecruitment difficult given time pressures on user base Workshops are resource-intensiveWorkshops are resource-intensive Future work: carefully sample a subset for interviewFuture work: carefully sample a subset for interview –Time constraints prevented gathering of observational data in situ Future work: use video and/or eye tracking softwareFuture work: use video and/or eye tracking software
Conclusion –We have developed a method of triangulating quantitative and qualitative data in user-centred evaluation of SWBs This addresses a need for greater attention to a technique which is essential for accurate interpretation of data –Having applied our evaluation framework we triangulated: Quantitative data from the web server logs and from questionnaires Qualitative data from semi-structured interviews eliciting users’ opinions on matters they identified as important
Conclusion –Triangulation was indispensable for an accurate view of the results Log data gave system speedLog data gave system speed –Questionnaires and interviews gave the meaning of the log data Log data showed usage of semantic linksLog data showed usage of semantic links Log data showed whether users found the answersLog data showed whether users found the answers –Questionnaires and interviews revealed discrepancies between what users said and what they did Questionnaires showed system intuitivenessQuestionnaires showed system intuitiveness –Only the interviews showed the full significance of the questionnaire results –Only triangulation could answer the ultimate questions about user satisfaction If any one data source had been left out, the results could have been misinterpretedIf any one data source had been left out, the results could have been misinterpreted