Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alexander Kotov, UIUC Paul N. Bennett, Microsoft Research * Ryen W. White * Susan T. Dumais * Jaime Teevan * 1.

Similar presentations


Presentation on theme: "Alexander Kotov, UIUC Paul N. Bennett, Microsoft Research * Ryen W. White * Susan T. Dumais * Jaime Teevan * 1."— Presentation transcript:

1 Alexander Kotov, UIUC Paul N. Bennett, Microsoft Research * Ryen W. White * Susan T. Dumais * Jaime Teevan * 1

2 Simple vs. Cross-Session Tasks Simple search tasks: Composed of one/two continuous queries Contain short time intervals between related queries Completed within one search session Cross-session search tasks: Composed of multiple non-continuous queries Cover several aspects Contain long time intervals between related queries Continue across several search sessions Constitute 10% of all sessions, 25% of all querie s [Donato et al. WWW10] 2

3 Typical Cross-Session Tasks Event planning (e.g. wedding, vacation) Shopping research (e.g. electronics, real estate) Academic research Political/personality research How-to (How-do-I) research (fix a car, etc.) Medical self-diagnosis and treatment 3

4 TimeQuery 1/22/2011 1:10pmpeanut butter recipes 1/22/2011 1:13pmpeanut butter cookies 1/22/2011 1:25pmcalories peanut butter cookies 1/22/2011 3:10pmweather nyc 1/22/2011 3:11pmpeanut butter sandwiches 1/22/2011 3:15pmnyc 10-day weather forecast 1/22/2011 3:16pmpb&j 1/22/2011 3:18pmfluffanutter 1/22/2011 3:19pmfluffernutter 1/22/2011 6:15pmsigir 2011 1/22/2011 6:17pmsigir 2010 schedule 1/23/2011 3:17pmnytimes 1/24/2011 3:00pmflight status united 123 1/25/2011 3:29pmfoodtv 1/25/2011 3:31pmfamous pb&j drop recipe Example of Cross-Session Tasks 4 Long time gap between related queries Interleaved with other tasks Task continuation Long periods focused on other tasks No shared terms with previous related queries

5 Supporting Cross-Session Tasks Motivation Relieve the cognitive burden of maintaining the context of search tasks Improve search results by reflecting the long-term intent Improve efficiency by pre-fetching the relevant content Provide support for task resumption Same Task: given a query, identify all previous queries on the same cross-session search task Different models proposed for Same Task before [Jones&Klinkner CIKM08; Donato et al. WWW10]; Task Continuation: given a search task for a user and the last query of the user on the search task, predict whether the user will return to it in the future Solution: classification framework Classifiers: Logistic Regression and MART 5

6 TimeQueryAutomatic LabelAutoDomHumDom 1/22/2011 1:10pmpeanut butter recipes xx 1/22/2011 1:13pmpeanut butter cookiespeanut butter recipesxx 1/22/2011 1:25pmcalories peanut butter cookiespeanut butter recipesxx 1/22/2011 3:10pmweather nyc 1/22/2011 3:11pmpeanut butter sandwichespeanut butter recipesxx 1/22/2011 3:15pmnyc 10-day weather forecastweather nyc 1/22/2011 3:16pmpb&jx 1/22/2011 3:18pmfluffanutterx 1/22/2011 3:19pmfluffernutterx 1/22/2011 6:15pmsigir 2011 1/22/2011 6:17pmsigir 2010 schedulesigir 2011 1/23/2011 3:17pmnytimes 1/24/2011 3:00pmflight status united 123 1/25/2011 3:29pmfoodtvx 1/25/2011 3:31pmfamous pb&j drop recipex Labeling 6 Use query refinement clusters and query graph with similarity threshold to produce automatic labels

7 TimeQueryAutomatic LabelAutoDomHumDom 1/22/2011 1:10pmpeanut butter recipes xx 1/22/2011 1:13pmpeanut butter cookiespeanut butter recipesxx 1/22/2011 1:25pmcalories peanut butter cookiespeanut butter recipesxx 1/22/2011 3:10pmweather nyc 1/22/2011 3:11pmpeanut butter sandwichespeanut butter recipesxx 1/22/2011 3:15pmnyc 10-day weather forecastweather nyc 1/22/2011 3:16pmpb&jx 1/22/2011 3:18pmfluffanutterx 1/22/2011 3:19pmfluffernutterx 1/22/2011 6:15pmsigir 2011 1/22/2011 6:17pmsigir 2010 schedulesigir 2011 1/23/2011 3:17pmnytimes 1/24/2011 3:00pmflight status united 123 1/25/2011 3:29pmfoodtvx 1/25/2011 3:31pmfamous pb&j drop recipex Labeling 7 Focus on early dominant tasks: two distinct queries labeled with the same task in the first two days.

8 TimeQueryAutomatic LabelAutoDomHumDom 1/22/2011 1:10pmpeanut butter recipes xx 1/22/2011 1:13pmpeanut butter cookiespeanut butter recipesxx 1/22/2011 1:25pmcalories peanut butter cookiespeanut butter recipesxx 1/22/2011 3:10pmweather nyc 1/22/2011 3:11pmpeanut butter sandwichespeanut butter recipesxx 1/22/2011 3:15pmnyc 10-day weather forecastweather nyc 1/22/2011 3:16pmpb&jx 1/22/2011 3:18pmfluffanutterx 1/22/2011 3:19pmfluffernutterx 1/22/2011 6:15pmsigir 2011 1/22/2011 6:17pmsigir 2010 schedulesigir 2011 1/23/2011 3:17pmnytimes 1/24/2011 3:00pmflight status united 123 1/25/2011 3:29pmfoodtvx 1/25/2011 3:31pmfamous pb&j drop recipex Labeling 8 Human annotators correct automatic labels for the dominant task (Cohens kappa ranges from 0.86 to 0.92)

9 Dataset10k3kHuman Number of users10,8523,3761,218 Users returning to dominant1,6941,688701 Number of queries119,81466,21928,474 Query pairs1,486,492866,860660,120 Datasets 9 Sample 10k users from one week of browser-based logs of browsing and searching episodes 10k: 15% return to dominant 3k: 50% return to dominant Downsample negatives Editorial labels for a random sample

10 Prediction Tasks 10

11 Same Task Features Query-based: Descriptiveness: query length (terms/chars), Engagement: # clicks on top 10 results, Examination: min/max position of clicked results Session-based: Activity/Engagement: # queries/clicks/time since the beginning of a session, Similarity: presence of same/subset/superset query in session History-based: Activity/Engagement: # sessions/queries/clicks in history Similarity: presence of same/subset/superset query in history Pair-wise: Similarity: # overlapping terms/Jaccard coefficient/Levenshtein edit distance, equal/subset/superset, co-clicked URLs Time: time between two queries, same session 11

12 Same Task Results Baseline: LR using only Levenshtein distance Two classifiers show similar levels of accuracy on auto-labeled data Performance decreases on human-labeled data LR notably dominates MART for human-labeled data 12 3k10kHuman BASELRMARTBASELRMARTBASELRMART Macro statistics Recall 0.41570.64310.72580.43000.66930.75560.32670.60670.5617 Prec. 0.83820.81350.81830.82400.82680.82330.61380.78570.7325 Acc. 0.89700.92270.93310.92450.96460.95340.61040.76290.7118 F1 0.54950.71600.76810.55200.73830.78760.39570.66700.6156

13 Same Task P-R Curves LR dominates MART at the low recall/high precision end of the curves LR has outperforms MART on the human- labeled data in the area of optimal F 1 13

14 Same Task Feature Importance Pair-wise features are more prevalent Term overlap features are among the strongest signals Same task queries are morphologically similar Long, descriptive query terms are indicative of cross-session tasks 14 FeatureWeight Q UERY T ERMS J AC 1.44 N UM Q UERY C HARS 1 1.05 N UM T ERMS O VER 0.93 N UM Q UERY C HARS 2 0.79 S AME S ESS 0.52 H AVE C O C LICK D OM 0.40 N UM Q UERIES S ESS 1 0.31 S UB Q UERY S ESS 2 -0.30 N UM Q UERY T ERMS 2 -0.47 N UM Q UERIES H IST 1 -0.52 N UM Q UERY T ERMS 1 -0.68 L EVEN D IST -0.84

15 Task Continuation Features Same query, session and history features as Same Task Session-based and History-based versions of: Engagement: avg. time between pairs of queries Satisfaction: # queries with dwell time more than 30 secs. # clicked queries / # queries, Complexity: avg. number of unique terms per query, Task relatedness: # co-clicked URLs with the same domain 15

16 Task Continuation Results 16 3k10kHuman BASELRMARTBASELRMARTBASELRMART Recall0.40380.65930.65460.00840.27770.30960.81540.73420.6971 Prec.0.59060.75550.77080.6750.67840.72190.59610.75380.7766 Acc.0.56010.72280.73080.84410.86640.87260.57060.70740.7100 F10.47750.70290.70720.01660.39330.43050.68440.74280.7326 Baseline: LR using only N UM Q UERIES H IST Two classifiers perform similarly on all datasets Recall and precision substantially decrease when moving from smaller balanced auto-labeled dataset ( 3k ) to larger unbalanced one ( 10k ) Recall significantly improves for both classifiers on manually corrected labels

17 Task Continuation P-R Curves Performance significantly decreases on 10 k dataset For human-labeled data MART has a slight advantage in the low recall/high precision region 17

18 Task Continuation Feature Importance Re-finding is common Dominant task related features are important Complexity of the information need and close examination of results are indicative of task continuation Users who search frequently and deeply likely use search for complex tasks 18 FeatureWeight S AME Q UERY H IST 1.11 N UM S ESS H IST 0.60 N UM D OM Q UERIES H IST 0.39 A VG I NTER QT IME H IST 0.24 F REQ D OM Q UERIES H IST 0.24 N UM D WELL 30H IST 0.22 N UM Q UERY H IST 0.21 N UM T OP 10C LICKS -0.16 N UM C LICKS H IST -0.18 N UM Q UERY C HARS -0.21 S UB Q UERY H IST -0.23 S UP Q UERY S ESS -0.40

19 Summary Addressed an important problem of predicting cross-session task continuation Designed large feature sets, reflecting: Query descriptiveness, user engagement, examination depth, user activity, query similarity to previous history, time dependency, task complexity, and user satisfaction Developed feature representation and learning techniques that could be used to accurately predict: Whether two queries are on the same task Whether a user will resume a task in a future session Analyzed feature contributions. Of particular note: Long, descriptive query terms are indicative of cross-session tasks The complexity of the information need and close examination of results are indicative of task resumption 19


Download ppt "Alexander Kotov, UIUC Paul N. Bennett, Microsoft Research * Ryen W. White * Susan T. Dumais * Jaime Teevan * 1."

Similar presentations


Ads by Google