Studies of the Onset & Persistence of Medical Concerns in Search Logs Ryen White and Eric Horvitz Microsoft Research, Redmond

Slides:



Advertisements
Similar presentations
Beliefs & Biases in Web Search
Advertisements

Predicting User Interests from Contextual Information
Enhancing Personalized Search by Mining and Modeling Task Behavior
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Andrea M. Landis, PhD, RN UW LEAH
Using the Self Service BMC Helpdesk
Struggling or Exploring? Disambiguating Long Search Sessions
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.
Ryen W. White PhD and Eric Horvitz MD, PhD Microsoft Research {ryenw,
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Beliefs & Biases in Web Search Ryen White Microsoft Research
FindAll: A Local Search Engine for Mobile Phones Aruna Balasubramanian University of Washington.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Recognizing User Interest and Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Modeling Long-Term Search Engine Usage Ryen White, Ashish Kapoor & Susan Dumais Microsoft Research.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
TwitterSearch : A Comparison of Microblog Search and Web Search
HAPORI: CONTEXT-BASED LOCAL SEARCH FOR MOBILE PHONES USING COMMUNITY BEHAVIORAL MODELING AND SIMILARITY Presented By: Brandon Ochs Nicholas D. Lane, Dimitrios.
Tag-based Social Interest Discovery
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Categories of Mental Disorders 1 Child and youth mental health problems can be classified into two broad categories: 1Internalizing problems  withdrawal.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Diagnoses, Decisions, and Outcomes: Web Search as Decision Support for Cancer Michael J. Paul, Johns Hopkins University Ryen W. White and Eric Horvitz,
Personalized Search Xiao Liu
Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Ryen W. White, Dan Morris Microsoft Research, Redmond, USA {ryenw,
 Web-scale pharmacovigilance Maggie Mahan 16 April 2013.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Post-Ranking query suggestion by diversifying search Chao Wang.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Web coordinator workshop. Introduction Meet and greet –Who are you and what was the last website you visited? Comms team – here for support + our role.
A Large Scale Study of Wireless Search Behavior: Google Mobile Search By Maryam Kamvar, Shumeet Baluja Presented by Prashanth Kumar Muthoju, Aditya Varakantam.
Personalizing Search on Shared Devices
ISI Web of Knowledge Early updates
Struggling and Success in Web Search
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Presentation transcript:

Studies of the Onset & Persistence of Medical Concerns in Search Logs Ryen White and Eric Horvitz Microsoft Research, Redmond

Motivation People use the Web to diagnose and understand medical conditions 80% of U.S. Web users perform medical search (Pew, 2011) Studies to date have focused on the impact of the Web on medical search Including cyberchondria (White & Horvitz, TOIS 2009) – Escalations in concerns within a single session (e.g., [headache]  [brain tumor]) Search behavior over time reflects in-world events such as professional diagnosis, etc. [Our Focus]

Our Focus: Logs as a Lens on Health We explore how medical concerns emerge over time and how concerns persist post onset Long-term needed to understand medical search 60% of medical sessions start directly with a condition query Searchers come to search engines with a condition in mind! 95% of these sessions had medical query in prior session(s) Long-term helps us understand medical trajectories Predict emerging concerns, personalize search, guide healthcare use Note: Long-term may be influenced by external events (e.g., diagnosis) – Web not always cause

Contributions Large-scale log analysis to: 1. Characterize how health-related concerns emerge in search activity over time 2. Analyze persistence of concerns post onset, including interruptions to future search tasks 3. Build models to predict condition onset, escalations in concerns, and future interruptions Will focus on #1 and #2 in this talk

Data Anonymized toolbar logs from millions of users 3 month period from Jan – Mar 2011 Extracted search sessions (30 min timeout) Included queries from Google, Yahoo!, Bing Assigned topical labels from ODP to URLs Content-based classifier Identify which URLs are medical

Long-Term Profile Generation Identify searchers with evidence of health seeking intent Searchers who visited at least one URL in Health category Removed outlier users 170k users, 25M sessions, 1B+ URLs 150 search sessions / user, spanning avg. of 76 days / user Automatically labeled queries based on: Symptoms: Merck medical dictionary + synonyms from click graph E.g., headache Benign explanations: ICD10 list + synonyms from click graph E.g., caffeine withdrawal Serious illnesses: ICD10 list + synonyms from click graph E.g., brain tumor

What are some key characteristics of the long-term health-search profiles?

Medical Search Over Time 3% of queries are medical related Users search more often for symptoms and serious illnesses than benign, common explanations Statistic (n=169,513)MeanSD # medical queries (based on clicks) % queries medical3.4%4.3% # medical sessions # medical queries (wordlist method) # queries with serious illnesses # queries with benign explanations # queries with symptoms ODP classes of SERP clicks Dictionary lookup

What are characteristics of how health-related concerns emerge in activity over time?

Condition Onset Explore search behavior before the onset of the first occurrence of a condition – randomly chosen Pre-onset history can contain other conditions 83% searched for at most one other condition prior to onset 61% searched for no conditions pre-onset Also extracted behavior after the condition onset Overall: 159K pre-onset histories, one per user Time Condition Symptoms Pre-onsetPost-onset QQQ qqqqqq Onset condition q

Pre-Onset Behavior Extracted features of search behavior before onset We computed statistics of search behavior, e.g., 20 medical search sessions over 49 days before condition 3% of user’s time online was spent viewing medical content Additionally, we wanted to check if onset condition and the symptoms searched previously were related Extracted symptom-onset pairs, compared with known E.g., [headache]  {migraine, caffeine withdrawal, brain tumor} 79.5% of prior symptoms were related to onset Emergence of conditions appears to extend back over time

Condition Severity Onset condition = benign, more medical searching Web may be contributing as a causal influence Users may be better informed and less likely to escalate Narrow lens of log analysis means that we cannot be sure why FeatureSerious (n=117389)Benign (n=34424) % medical queries3.0% (4.0%)4.0% (4.2%) # medical queries5.30 (12.12)7.69 (15.48) # symptom queries1.81 (6.43)2.34 (7.58) # condition queries1.74 (5.36)2.27 (6.36) Onset condition is … Study the relationship between pre-onset and severity Many search / Web usage behaviors are similar, but …

Feature Trends Compute gradient of line of best fit of the features over time before the condition onset Observations: Condition acuity increases top-to-bottom Benign explanations on bottom are acute e.g., food poisoning and panic attacks Serious illnesses on top are chronic e.g., chronic fatigue syndrome Time qq Q Onset condition Number of symptom queries Average best-fit gradient qq Line of best fit Benign explanations Serious illnesses

How do searches for the condition persist following evidence of initial concern onset?

Post-Onset Search Behavior Computed the same features for search behavior after the onset condition rather than before Medical search increases, symptom searching decreases, and condition searching increases (more focused) Feature% or Avg (SD) % change from pre-onset % URLs medical4.9% % queries medical4.2% % online time on medical pages8.0% # unique symptoms0.50 (1.01) Symptom persistence (days)2.46 (3.42) # unique conditions1.04 (1.29) Condition persistence (days)7.57 (12.49) Most people searched for zero conditions pre-onset

Interruption: Effect of Health Concerns Suspension of ongoing search activity Given a search session, an interruption comprises at least one search for the onset condition sandwiched between two series of 1+ non-medical searches, e.g., We show that interruption happens often: 30.3% of users were interrupted in this way 2.39 interruptions / user, totaling 2 hours and 16 mins. searching 6% of future search sessions interrupted, on 2.68 total days Time Q Onset condition Q QQ nn nnnn nnn n Interruptions Session

Summary and Takeaways

Summary of Presented Findings 80% of pre-onset symptoms relate to onset condition True even though the search queries are weeks apart Medical search increases near to onset Related to acuity of condition Onset type is related to pre-onset behavior e.g., Benign explanations follow more medical searching Medical search behavior post-onset  pre-onset e.g., Symptom search drops, condition search increases Post-onset interruptions occur frequently

Other Things (see paper for more info) Between-session escalations in medical concerns Focus-of-attention Users narrow focus over time – focus on particular conditions of interest Periodicities in medical search Medical search behavior is clustered, not spread out over time Accurate prediction of onset, escalation, interruption (White & Horvitz, 2009) (New in this paper)

Implications & Limitations Onset as cue to use previous clicks for related symptoms Use combination of symptoms searched over time to estimate the onset condition (early warning signs!) Observed over time: [twitching], [twitching eye], [twitch], … Unobserved (yet!): [ALS] [Lou Gherig’s Disease] Given accurate prediction of: Onset: Adjust search experience based on severity or prevalence Escalation: Provide links to broader content (benign explanations) Interruptions: Provide support for more effective task switching Limitations of log analysis: Machines may be shared (not all queries from same user) Limited definitions of concepts (e.g., interruptions)

Conclusions Log-based longitudinal study of online health search Characterized aspects of that behavior before condition onset – relationship with condition type, escalations Studied post-onset behavior and found differences in search behavior, and evidence of interruptions Also built predictive models of onset, escalations, impact Search engines could mine long-term health seeking behavior to provide better medical-search support