Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, Yiling.

Similar presentations


Presentation on theme: "Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, Yiling."— Presentation transcript:

1 Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling Chen 3, Forrest Nelson 2, David M. Pennock 3 Departments of 1 Internal Medicine and 2 Economics, The University of Iowa, Iowa City, IA; 3 Yahoo! Research, New York, NY

2 Disclosures Disclosures: PMP: An Influenza Advisory Board Member for Roche YC and DMP were employees of Yahoo! Research Funding: RWJF,CDC, NIH

3 Motivation There are multiple surveillance system components for influenza in the U.S. including: Influenza Mortality from Influenza and Pneumonia Influenza Like Illness (ILI) Culture Data However… they all report disease activity after it occurs The only local (i.e., state level) data is a weekly influenza activity report from each state

4 Motivation Influenza occurs in regular seasonal cycles, but the character and timing of each season varies Historically, despite the seriousness of the disease and the potential benefit from advance warning, forecasts of influenza activity have not been routinely available in the U.S.

5 Motivation Benefits of an influenza forecast (even a few weeks in advance) include extra time for: Preparing for an increased number of patients admitted for influenza complications Administering prophylactic medications to persons in high-risk groups Vaccinating high-risk individuals and healthcare workers

6 Motivation The Internet is an increasingly important source for medical information Patients/Families Medical Providers Thus, analysis of the volume of internet search traffic may provide information about disease activity over time An analysis of search terms can produce accurate and useful statistics about the unemployment rate Ettredge M, Gerdes J, Karuga G. Using web- based search data to predict macroeconomics statistics. Commun ACM, 2005; 48(11):87--92.

7 Goals The purpose of this project was to: (1) determine the temporal relationship between search terms for influenza and actual disease occurrence (2) determine if and to what extent an increase in search frequency precedes official measures of influenza activity (3) explore the feasibility of building a search based prediction market for infectious diseases

8 Methods De-identified Search query logs were obtained daily from http://search.yahoo.com starting 3/2004 Unique queries originating from the U.S. and containing influenza-related search terms were counted daily Searches had to include either: FLU or INFLUENZA Searches were excluded if they included BIRD, AVIAN, or PANDEMIC We also excluded searches containing SHOT, VACCINATION, VACCINE -- to avoid capturing queries related to influenza vaccination searches

9 Methods Daily search counts were divided by the total number of U.S. searches to get the daily fraction of influenza related searches We then averaged the fraction over the week for every week of the year

10 Methods Influenza Surveillance Data from March 2004 to August 2007 1.Weekly Influenza Culture Data: Proportion of Positive cultures Clinical laboratories throughout the U.S. who are either World Health Organization (WHO) Collaborating Laboratories or National Respiratory and Enteric Virus Surveillance System (NREVSS) laboratories report the total number of respiratory specimens tested and the number positive for influenza types 2.122 Cities Mortality Reporting System: Each week participating cities report the total number of death certificates received and also the number which list pneumonia or influenza as the underlying and/or contributing cause of death. Based on the city data, we obtain influenza mortality data for 9 U.S. census regions and the whole county

11 Searches for Influenza and Positive Influenza Cultures by Week

12 Searches for Influenza and Mortality from Influenza and Pneumonia by Week

13 Search and Positive Cultures We fit a linear model to test the predictability of search frequency on percentage of positive influenza cultures: where t is a time trend (measured in weeks), C t is rate of positive cultures in week t, and s t-x is the search frequency in week t-x To determine the appropriate lag, we examined 0-10 (weeks)

14 Searches and Mortality Using the mortality data, we fit another linear model with the same format: where m t is the number of deaths during week t, and all other variables are as defined earlier To determine the appropriate lag, we examined 0-10 (weeks)

15 Predicted Values for Positive Influenza Cultures Based on Searches and Actual Values by Week

16 Predicted Values for Mortality from Influenza and Pneumonia Based on Searches and Actual Values by Week

17 Culture Results Culture Results Positive Influenza Culture Regression Results X (Lag in weeks)Coefficient:S t-x Std. ErrortP > |t|R2R2 0239636.218301.9913.09<0.0010.4672 1242579.518218.1113.32<0.0010.4723 2239568.618487.3312.96<0.0010.4568 3234749.118848.9712.45<0.0010.4356 4229446.419225.1611.93<0.0010.4134 5223257.319628.8511.37<0.0010.3890 6215900.220064.810.76<0.0010.3618 7206683.520565.410.05<0.0010.3300 8195520.621118.449.26<0.0010.2943 9184502.121619.258.53<0.0010.2610 10173491.322164.17.83<0.0010.2305

18 Mortality Results Mortality Results Influenza Mortality Regression Results X (Lag in weeks) Coefficient:S t-x Std. ErrortP > |t|R2R2 03300788436385.87.56<0.0010.2075 13810620415148.29.18<0.0010.2787 24194847394455.210.63<0.0010.3418 34445665378633.311.74<0.0010.3882 44604043367573.412.53<0.0010.4198 54625652368166.312.56<0.0010.4229 64461079379889.111.74<0.0010.3919 7431486739040511.05<0.0010.3649 84248610396362.510.72<0.0010.3523 93992864410770.29.72<0.0010.3111 103767351422055.38.93<0.0010.2765

19 Limitations With only four years of data, the inferential conclusions that we can make are limited Some proportion of searches may be generated by news reports and not actual disease activity (celebrity effect) Other searches might be for related topics that are not related to influenza activity (e.g., influenza vaccination)

20 Limitations Two U.S. influenza search fraction series: one that excludes vaccination related terms and the other that does not.

21 Limitations Lack of availability of this data to researchers – privacy and proprietary concerns The geographic data gleaned from search terms is extracted from IP addresses and may not always represent actual geographic location We could reproduce our results at a census region level There is a lack of generally available surveillance data against which to compare search data

22 Summary & Conclusions A temporal association exists between search term frequency and influenza disease activity Influenza related search term activity seems to precede an increase in influenza culture data by at least 4 weeks, and deaths from pneumonia and influenza by at least 7 weeks “Search-term surveillance” may provide an inexpensive supplement to more traditional disease-surveillance systems

23 Future Work Search term surveillance is not limited to influenza It could also be used for emerging infectious diseases, re-emerging infectious diseases and also to detect changes in phenomena related to chronic diseases Search term surveillance of symptom based searches (e.g., diarrhea) may help detect outbreaks if search levels rise above an established baseline Search Based Prediction Markets (How this experiment started)

24 Future Directions (Search Markets) Experimental markets called prediction (or decision) markets are created for the sole purpose of making forecasts and have been used successfully in a number of contexts In situations involving uncertainty regarding future events, markets can be used to aggregate information from various individuals to predict future events (i.e., information can be extracted from the prices derived in experimental markets)

25 Future Directions The Iowa Electronic Market (the first prediction market) has a consistent track record of making more accurate forecasts of political elections than any national poll. For 6 presidential elections, the average prediction error has been under 1.5%, while opinion polls for those same elections have had an average error of 2.5%. HEWLETT-PACKARD has used experimental markets to forecast the sales of its printers more accurately than its statisticians. ELI LILLY has designed markets to predict which developmental drugs have the best chance of advancing though clinical trials. GOOGLE has used markets (based on IEM research) to successfully forecast product launch dates, new office openings, and other events of strategic importance. The Iowa Influenza Prediction Market has predicted influenza activity 2-4 weeks in advance. ProMED-mail Iowa H5N1 Market has predicted the number of human cases of avian influenza months in advance.

26 Search Based Prediction Markets for Health Topics Yahoo Tech Buzz Game: a fantasy (i.e., not real money) prediction market for high-tech products, concepts and trends. The participants goal was to predict how popular various technologies will be in the future. Popularity or buzz is measured by Yahoo! Search frequency over time. Predictions were made by buying stock in the products or technologies you believe will succeed, and selling stock in the technologies you think will flop. In other words, you “put your fantasy dollars where your mouth is.” Thus, our original (and current goal) is to build a search market for diseases


Download ppt "Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, Yiling."

Similar presentations


Ads by Google