Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time Chao Liu, Ryen White, Susan Dumais Microsoft Research at Redmond.

Similar presentations


Presentation on theme: "Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time Chao Liu, Ryen White, Susan Dumais Microsoft Research at Redmond."— Presentation transcript:

1

2 Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time Chao Liu, Ryen White, Susan Dumais Microsoft Research at Redmond

3 Dwell Time as User Implicit Feedbacks  The most significant indicator of document relevance besides clickthroughs [Kelly and Belkin, SIGIR’01, SIGIR’04]  Leveraged in various applications  Learning to rank [Agichtein et al., SIGIR’06]  Query expansion [Buscher et al., SIGIR’09]  BrowseRank, assuming an exponential dist. [Liu et al., SIGIR’08]  …

4 Questions Addressed in this Study  Questions:  How do we model the dwell time distribution Pr(t|d)?  What does Pr(t|d) tell us about user browsing behaviors?  How is the distribution related to page-level features, and can we predict the distribution based on page-level features?  Takeaways  We propose to model Pr(t|d) using Weibull distributions  The fitted Weibull distribution exhibits a strong negative aging effect, which indicates a “screen-and-glean” browsing behavior  We can predict Pr(t|d) based on page features, which effectively extends the application of dwell time to scenarios where dwell time data is not available

5 Outline  A Primer on Weibull Analysis  Weibull distribution and analysis  Hazard function and aging effects  Weibull Analysis on Dwell Time  Goodness-of-Fit  Screen-and-glean browsing pattern  Screening by categories  Predicting Dwell Time Distribution  Prediction performance  Feature importance  Conclusions

6 Weibull Analysis  Weibull analysis is a method for modeling positive data sets, such as time-to-failure data  Predicting product life,  Comparing reliability of competing product designs  Establishing warranty policies or proactively managing spare parts inventories  Success beyond reliability engineering  Survival analysis, weather forecasting, fading channels in wireless communication, the length of labor strikes, AIDS mortality and earthquake probabilities, etc.  Unfortunately, no prior Weibull analysis on Web data although Web abounds with temporal data  Page dwell time, session length, time-to-first-click, etc

7 Weibull Distribution  2-parameter Weibull distribution  λ : scale parameter  k: shape parameter  Exponential dist. when k = 1

8 Weibull Analysis  Hazard function at time x  Instantaneous failure rate (or hazard rate) at time x  Amount of risk associated with an x-survivor at time x  Hazard function for Weibull distributions

9 Aging Effects from Hazard Function  k = 1: No aging  Constant failure rate  Exponential distribution  0<k<1: Negative aging  Decreasing failure rate  An initial screening has to be passed in order to survive longer  Smaller k means harsher screening  k > 1: Positive aging  Increasing failure rate  Little to no screening at the beginning but life becomes tougher as time goes by

10 Weibull Analysis on Dwell Time and Beyond  Web abounds with temporal data  Time to first click, session length, eye fixation, …  Weibull analysis is way beyond hazard functions  Failure forecasting, corrective actions, … Reliability Analysis Dwell Time AnalysisClick Analysis… Datatime-to-failureTime-to-abandonTime-to-first-click… HazardFailure rateAbandon rateClick rate… E(t|t>t 0 )Mean residual lifeMean residual time on page How soon to click… ……………

11 Outline  A Primer on Weibull Analysis  Weibull distribution and analysis  Hazard function and aging effects  Weibull Analysis on Dwell Time  Goodness-of-Fit  Screen-and-glean browsing pattern  Screening by categories  Predicting Dwell Time Distribution  Prediction performance  Feature importance  Conclusions

12 Goodness-of-Fit Comparison  Dwell time collected for 205,873 pages (URLs) in English (US) market, each of which has a minimum of 10k dwell times  Comparison on Goodness-of-Fit (GoF)  Dwell times for each page are split into training (80%) and testing (20%)  Model fitting on training and evaluated on testing  Metrics: Log-likelihood and Kolmogorov–Smirnov distance

13 Fitting λ and k Strong Negative Aging What’s the initial screening? Screen-and-glean browsing pattern?

14 P( k |Category): Aging Effect w.r.t. Categories Screening is harsher for less-entertaining topics

15 Outline  A Primer on Weibull Analysis  Weibull distribution and analysis  Hazard function and aging effects  Weibull Analysis on Dwell Time  Goodness-of-Fit  Screen-and-glean browsing pattern  Screening by categories  Predicting Dwell Time Distribution  Prediction performance  Feature importance  Conclusions

16 Dwell Time Prediction from Page Features  Why predicting dwell time?  Extend dwell time to pages with less or no dwell time  Enable third parties to leverage dwell time even if they don’t have access to real dwell time data  Gain insights into what elements affect dwell time  Why using only page-level features?  Users decide how long to stay with a page based on the experience and perception, rather than PageRank for example  Advanced features like PageRank and inlink counts may not be available to all parties

17 Experiment Setup  5000 randomly sampled pages with fitted λ and k as the target values  Pages are crawled using a dynamic crawler, which parses the html, executes all dynamic components (e.g., redirections, flashes, javascripts, etc), and finally renders the page  “login” pages are removed as they are likely due to time-out redirection  4771 pages left  Page-level features  HtmlTag: frequencies of 93 Html tags  Content: frequencies of top-1000 terms  Dynamic: statistics from dynamic crawling  Regressor: Multiple Additive Regression Tree (MART)  Effectiveness and feature interpretability

18 Baseline returns the mean λ and k Prediction Results  Comparisons with various feature configurations  Prediction outperforms the baseline  HtmlTag and Dynamic are similar effectively when separated, and complementary to each other when combined  Content > HtmlTag+Dynamic  Content+Dynamic the best: Dynamic captures what users experience after clicks whereas Content shows what users would see in the end

19 Important Features

20 Outline  A Primer on Weibull Analysis  Weibull distribution and analysis  Hazard function and aging effects  Weibull Analysis on Dwell Time  Goodness-of-Fit  Screen-and-glean browsing pattern  Screening by categories  Predicting Dwell Time Distribution  Prediction performance  Feature importance  Conclusions

21 Conclusions  The first Weibull analysis on Web dwell time  Draws an analogy between dwell time and lifetime  Opens the door to Weibull analysis for temporal implicit feedbacks  Dwell time exhibits a strong negative aging effect, which hints a prevalent “screen and glean” browsing pattern  Harsher screening for less-entertaining topics  Feasible to predict dwell time based on page-level features  Extending applicability to less-visited pages and parties without dwell time data  Future work  Improving prediction accuracy through better feature engineering  Weibull analysis for IR

22 Acknowledgments  Yutaka Suzue  Krysta Svore  Qiang Wu  Wen-tau Yih  Xiaoxin Yin  Alice Zheng

23 Q&A Thank You!


Download ppt "Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time Chao Liu, Ryen White, Susan Dumais Microsoft Research at Redmond."

Similar presentations


Ads by Google