CrowdLogger as a Community Platform for Searcher Behavior Experiments Henry Feild Center for Intelligent Information Retrieval University of Massachusetts.

Slides:



Advertisements
Similar presentations
Advanced SQL Schema Customization & Reporting Presented By: John Dyke As day to day business needs become more complex so does the need for specifically.
Advertisements

Chapter 10 Fine-tuning, Completing, and Publishing Your Project.
CIMCO Integration Software Products
Microsoft TM SharePoint Content Management SystemTutorial By Mazen Abdallah Student Assistant at CNS 2010.
Advanced Web Metrics with Google Analytics By: Carley Brown.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
NEXT. Create Pages in Blogger Another top user-requested feature has just graduated from Blogger In Draft! Blogger now makes it easy to create Pages linked.
Personalization and Search Jaime Teevan Microsoft Research.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Presented by Mina Haratiannezhadi 1.  publishing, editing and modifying content  maintenance  central interface  manage workflows 2.
Management of information. Objectives Discuss the benefits of good management practice Present reference management tools Present bookmark management.
Overview of Search Engines
Using Evernote for Anecdotal Records Sarah Valter Lindbergh Un-conference September 30, 2011.
1 Chapter Overview Introduction to Windows XP Professional Printing Setting Up Network Printers Connecting to Network Printers Configuring Network Printers.
CrowdLogging: Distributed, private, and anonymous search logging Henry Feild James Allan Joshua Glatt Center for Intelligent Information Retrieval University.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
WebQuilt and Mobile Devices: A Web Usability Testing and Analysis Tool for the Mobile Internet Tara Matthews Seattle University April 5, 2001 Faculty Mentor:
Web Design Basic Concepts.
2. Introduction to the Visual Studio.NET IDE 2. Introduction to the Visual Studio.NET IDE Ch2 – Deitel’s Book.
Web 2.0: Concepts and Applications 2 Publishing Online.
1 Web Developer & Design Foundations with XHTML Chapter 6 Key Concepts.
2012 National BDPA Technology Conference Creating Rich Data Visualizations using the Google API Yolanda M. Davis Senior Software Engineer AdvancED August.
Lecture 3 – Data Storage with XML+AJAX and MySQL+socket.io
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
EUBA: The Emory User Behavior Analysis System Eugene Agichtein, Qi Guo and Ryan Kelly Intelligent Information Access Lab
HINARI/Basic Internet Concepts (module 1.1). Instructions - This part of the:  course is a PowerPoint demonstration intended to introduce you to Basic.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
XHTML Introductory1 Forms Chapter 7. XHTML Introductory2 Objectives In this chapter, you will: Study elements Learn about input fields Use the element.
Programming with Microsoft Visual Basic 2012 Chapter 12: Web Applications.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
In addition to Word, Excel, PowerPoint, and Access, Microsoft Office® 2013 includes additional applications, including Outlook, OneNote, and Office Web.
PUBLISHING ONLINE Chapter 2. Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals.
Tutorial 121 Creating a New Web Forms Page You will find that creating Web Forms is similar to creating traditional Windows applications in Visual Basic.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Start the slide show by clicking on the "Slide Show" option in the above menu and choose "View Show”. or – hit the F5 Key.
1 Thomas Lippert Senior Product Manager - Mobile What’s new in SMC 5.0.
Lecture # 6 Forms, Widgets and Event Handling. Today Questions: From notes/reading/life? Share Personal Web Page (if not too personal) 1.Introduce: How.
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
1 OPOL Training (OrderPro Online) Prepared by Christina Van Metre Independent Educational Consultant CTO, Business Development Team © Training Version.
JavaScript, Fourth Edition Chapter 5 Validating Form Data with JavaScript.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Evaluating & Maintaining a Site Domain 6. Conduct Technical Tests Dreamweaver provides many tools to assist in finalizing and testing your website for.
CS5604: Final Presentation ProjOpenDSA: Log Support Victoria Suwardiman Anand Swaminathan Shiyi Wei Department of Computer Science, Virginia Tech December.
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
ITS Lunch & Learn November 13, What is Office 365? Office 365 is Microsoft’s software as a service offering. It includes hosted and calendaring.
Function as a Service An Ad Hoc Approach to Cloud Computing By Keith Downie.
Google Analytics Graham Triggs Head of Repository Systems, Symplectic.
Allison Nichols, Ed.D. Evaluation Specialist.  In this workshop we'll explore creating an online survey using Google Documents. You don't need to buy.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
Guide To Develop Mobile Apps With Titanium. Agenda Overview Installation of Platform SDKs Pros of Appcelerator Titanium Cons of Appcelerator Titanium.
MicrosoftTM SharePoint Content Management SystemTutorial
Web Programming Language
ODMAP Level 2 access.
Objectives Create a folder in Google Drive.
PIWIK JUNIOR TIDAL ASSOCIATE PROF., WEB SERVICES & MULTIMEDIA LIBRARIAN NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY.
Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals Wikis are collections of searchable,
Working in the Forms Developer Environment
ODMAP Level 2 access.
Managing Your Literature Search Using Zotero
Chapter 12: Automated data collection methods
Windows Internet Explorer 7-Illustrated Essentials
Microsoft Office Illustrated Fundamentals
Web Application Development Using PHP
Presentation transcript:

CrowdLogger as a Community Platform for Searcher Behavior Experiments Henry Feild Center for Intelligent Information Retrieval University of Massachusetts Amherst November 28, 2012

Things we like to do in IR Observe and model user behavior Compare search algorithms / interfaces – which do users prefer? – time to completion – which result in more/fewer clicks, etc. Optimized Interleaving for Online Retrieval Evaluation Absence time and user engagement: Evaluating Ranking Functions Modeling and Measuring the Impact of Short and Long-Term Behavior on Search Personalization Personalization of Search Results Using Interaction Behaviors in Search Sessions Search, Interrupted: Understanding and Predicting Search Task Continuation User Evaluation of Query Quality Improving Searcher Models Using Mouse Cursor Activity

What's currently done software: – make a toolbar from scratch – modify the Lemur Search Log Toolbar study: – recruit some users and conduct controlled lab study – install on campus computers, observe users in situ well, in situ specifically in a library setting This is slow, expensive, and generally a lot of effort

What we want a common, open source platform that deals with the basics – interaction data collection – data storage – privacy a common user base – can recruit some new users, but already have a significant pool of participants an interface for implementing novel studies

CrowdLogger overview Web User Researchers User Log User’s computer Aggregate and Decrypt CrowdLogging Server Crowd Log Experiment Router Encrypt Anonymizers Mine Data Current a private data aggregation system – query reformulation pair frequencies – query-click pair frequencies – query frequencies – …

Explicit feedback Interactive experiments System comparisons Labeling

Challenges data management across multiple experiments an API that allows researchers sufficient control over accessing user data and implementing experiments controlling what data is shared with researchers incentivizing users to download the extension and participate in experiments

Data management browser interactions – add/remove/move tab – back/forward buttons – favorites/home/minimize/exit … web page interactions – page un/loads – page focus – clicks – scrolls – mouse movements SERP interactions – query – top 10 results – urls, summaries, etc. complex interactions – opening links in new tabs – search tasks – study data What to logHow to log it Lemur Toolbar format A tab-delimited text file … {event: search, time: , query: wikipedia, se: google} {event: click, time: , destUrl: wikipedia.org, srcUrl: google.com/search?q=wikipedia} {event: load, time: , url: wikipedia.org} … {event: search, time: , query: wikipedia, se: google} {event: click, time: , destUrl: wikipedia.org, srcUrl: google.com/search?q=wikipedia} {event: load, time: , url: wikipedia.org} … CrowdLogger format JSON stored in IndexedDB

Data management {event: search, time: , query: wikipedia, se: google} {event: search, time: , query: wikipedia, se: google, results: [{rank: 1, title: Wikipedia, url: wikipedia.org, snippet: “Wikipedia, the free encyclopedia that anyone can edit.}, {rank: 2,…}]} Benefits of JSON: easily extensible Benefits of IndexedDB versioning built in entries can be updated in place – no need to re-write entire log file can build multiple indexes over data store HTML5 standard

API Categories User Data Historical data - get all clicks - get all searches Real time data - on new search, do … Aggregate User Data Already collected data - get all query rewrites - get all query-click pairs User Interface x Add to CrowdLogger interface - add widget to tools page Stand-alone windows/pages - present dialog when user searches - modify search page ranking Client-server communication... Request server-side computation - run retrieval algorithm for query Access server-side data - send me synonyms for …

API Layer Options API Study extension Study extension Communicate via inter-extension event calls API Study code module Study code module CrowdLogger executes study code module Pros: - flexible study extension Cons: - limited communication - no control over extension - user has to download separate extension Pros: - single module - browser independent Cons: - remote JavaScript execution - requires code approval - potentially complex study code formulation - less study code flexibility

Privacy controls what data get's shared with researchers? under what conditions? What users are comfortable with What is minimally useful to researchers What data is being collected and how it will be used Query rewrites for public release Whatever users are comfortable with User 1: only if shared by 9+ other users (k=10) User 2: k=1 rewrites Feedback on retrieval system preference for researcher use only k=1 anonymized feedback from users User 1: k=5 feedback User 2: k=1 feedback

Privacy controls What will be collected: All search reformulations. For example, if you search for “blueberry pie” and then “blueberry pie recipes”, the pair: “blueberry pie”, “blueberry pie recipes” will be collected. How the collected data will be used: Reformulations will be anonymized and made publically accessible and used to, for example, generate search suggestions for you and other users. Privacy settings: For each search reformulation collected from you, select the anonymization level: the number of other users that must also share the same reformulation for it to be included in the final data set: 4 I have read the consent form and agree to participate in this study. Cancel Continue

Incentivization Provide a service - research prototypes - visualizations - re-finding tools - citizen scientist Provide a service - research prototypes - visualizations - re-finding tools - citizen scientist Financial incentives - gift cards - virtual currency to ‘buy’ research apps Financial incentives - gift cards - virtual currency to ‘buy’ research apps Gamification - study-specific - could also be a service Gamification - study-specific - could also be a service EPS game Google-a-day Search Task Assistant Google Search History

Service: Search Task Assistant As you search, your most recent searches are organized by task Related tasks and searches are updated for each new search

Service: Search Task Assistant

Use case example evaluate a query recommendation algorithm – APIs: x... Extract query reformulations from users - When a user enters a new query, modify the search page to include a list of query suggestions - Ask user to rate the suggestions On each new query, contact a server to compute the recommendations - k=5 for query reformulations - k=1 for feedback Aggregate User Data User DataUser Interface Client-server communication Uploading/Priva cy

Use case example: Privacy consent What will be collected: Your feedback about the quality of search suggestions generated by our algorithm. In addition, you will have the option of providing the search you entered along with the feedback. How the collected data will be used: The feedback will be used only by the researchers in charge of this study (listed in the consent form). Privacy settings: For each set of feedback collected from you, select the anonymization level: the number of other users that must share the same feedback for it to be included in the final data set (max to participate: 1): 1 I have read the consent form and agree to participate in this study. Cancel Continue

Use case example: code snippet api.userData.onQuery( function(data){ var suggestions = getSuggestions(data.query); api.ui.addOverlay(queryData.page, showSuggestions); }); function getSuggestions(query){ return api.clientServer.callServer( QUERY_SUGG_SERVER_URL, query); } function showSuggestions(page, suggestions){ var overlay = page.jQuery(“ ”); … button.click(onFeedbackSubmitted); } function onFeedbackSubmitted(data){ api.upload.private(data.toString, {k: 1}); }

Use case example: interface x Suggestions MacBook Air MacBook Pro Dell laptops Windows 8 Laptops Useful? Yes No Include query (“laptops”) Continue

Other challenges cross-device synchronization mobile device support – requires browsers to allow extensions on mobile platforms neutral code review panel