Presentation is loading. Please wait.

Presentation is loading. Please wait.

Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise.

Similar presentations


Presentation on theme: "Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise."— Presentation transcript:

1 Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise Group C

2 Preview of Coming Attractions  Introduction  The Data  Our Best Model  Verifying The Model  Comparing Our Model with Real Data  Conclusions  Introduction  The Data  Our Best Model  Verifying The Model  Comparing Our Model with Real Data  Conclusions

3 Background  The Flu –Generalization for multiple different viruses –Responsible for  Respiratory illness  Up to 500,000 deaths world wide per year –Virus is able to flourish in those with weaker immune systems  Young  Elderly  Sick  The Flu –Generalization for multiple different viruses –Responsible for  Respiratory illness  Up to 500,000 deaths world wide per year –Virus is able to flourish in those with weaker immune systems  Young  Elderly  Sick

4 Background  Conventional methods for forecasting possible medical catastrophes –Step 1 – Patient realizes they are sick –Step 2 – Patient makes a medical appt. –Step 3 – Patient goes to appointment and is diagnosed –Step 4 – Medical professional sends data to CDC  Conventional methods for forecasting possible medical catastrophes –Step 1 – Patient realizes they are sick –Step 2 – Patient makes a medical appt. –Step 3 – Patient goes to appointment and is diagnosed –Step 4 – Medical professional sends data to CDC

5 The Future is Google  Future methods for predicting local epidemics and world wide pandemics lie within the hands of Google –Google Flu Trends  Weekly data with collection date starting on June 1, 2003  Data is a normalized aggregate of number of searches for “flu” or similar queries in a given area. –Source: http://www.google.org/about/flutrends/download.html http://www.google.org/about/flutrends/download.html –http://www.cdc.gov/flu/weekly/fluactivity.htmhttp://www.cdc.gov/flu/weekly/fluactivity.htm  Future methods for predicting local epidemics and world wide pandemics lie within the hands of Google –Google Flu Trends  Weekly data with collection date starting on June 1, 2003  Data is a normalized aggregate of number of searches for “flu” or similar queries in a given area. –Source: http://www.google.org/about/flutrends/download.html http://www.google.org/about/flutrends/download.html –http://www.cdc.gov/flu/weekly/fluactivity.htmhttp://www.cdc.gov/flu/weekly/fluactivity.htm

6 Google Flu Trends  Idea behind the project was to predict pandemics and epidemics faster than conventional methods  Early detection could lead to a lower rate of infection and subsequent number of deaths  Could save you and your families lives someday  Idea behind the project was to predict pandemics and epidemics faster than conventional methods  Early detection could lead to a lower rate of infection and subsequent number of deaths  Could save you and your families lives someday

7 Google Flu Trends  The Future of Forecasting –Step 1: The sick realize they are sick –Step 2: Patient “Googles” their symptoms –Step 3: Data is aggregated and sent to the CDC  The Future of Forecasting –Step 1: The sick realize they are sick –Step 2: Patient “Googles” their symptoms –Step 3: Data is aggregated and sent to the CDC

8 Pitfalls of the Data  Everyone does not have the internet  Everyone does not know how to use the internet  Everyone does not use Google (≈18%)  New strains of virus, such as H1N1 may not behave similarly to former strains –This may or may not be an issue  Everyone does not have the internet  Everyone does not know how to use the internet  Everyone does not use Google (≈18%)  New strains of virus, such as H1N1 may not behave similarly to former strains –This may or may not be an issue

9 Hypothesis  Google data on the flu can be used to forecast future outbreaks of the flu

10 The Data  Trace shows serious seasonality  Notice the spike in 2003 from increased number of searches due to bird flu scare  Trace shows serious seasonality  Notice the spike in 2003 from increased number of searches due to bird flu scare

11 The Data  Histogram of the data – definitely not normally distributed with huge Jarque-Bera Stat

12 Correlogram of the Data – looks like a possible AR(2) or AR(3)

13 The Data  Unit-Root test – significant at the 1% level but not conclusive

14 Seasonal Differencing was done to make the data more stationary: SDUS=US-US(-52)

15  Histogram of the seasonally differenced data: Still not normal but now more normal with less skewness and is now single peaked.

16 The Data  Correlogram of the seasonal difference – looks like an AR(2)

17 The Data  Unit Root Test – Further evidence of stationary:

18 The Data  First modeled using OLS: Tried AR(1) AR(2) first

19 The Data  Correlogram – orthogonal

20 The Data  Histogram of the residuals – highly kurtotic and negatively skewed.

21 The Data  Serial correlation test – no serial correlation detected.

22 The Data  Correlogram of SQ residuals – shows some significance:

23 The Data  Test for Autoregressive Heteroskedasticity – positive for ARCH:

24 The Data  Trace of the squared residuals – shows spikes meaning ARCH is present:

25 The Data  ARCH GARCH model used:

26 The Data  Correlogram of the residuals – now not orthogonal:

27 The Data  Correlogram of squared residuals – now orthogonal:

28 The Data  Histogram of ARCH GARCH residuals – far less kurtosis and skewness and closer to being normally distributed than before:

29 The Data  Test for ARCH is no longer significant:

30 The Data  Looking back at the OLS estimates and correlogram, there is a spike at lag 9 which could be significant so we added an MA(9) term to see if it would orthogonalize the correlogram in the ARCH GARCH model.

31 The Data

32  We still have highly significant Q-statistics showing orthogonal residuals:

33 The Data  Still a positive test for ARCH:

34 The Data  ARCH GARCH model estimated:

35 The Data  Now the residuals are orthogonal at all visible lags:

36 The Data  Squared residual correlogram is also significant:

37 The Data  Histogram of the residuals – still single peaked, slightly skewed and kurtotic:

38 The Data  No longer a positive test for ARCH:

39 Correlogram of Standardized Residuals

40 Correlogram of Resid Squared

41 Garch Trace

42 Garch Histogram

43 Ordinary residuals

44 Standardized residuals  Lower Kurtosis

45 1 Forecast with 1 Year Time Saved  Good fit

46 95% Confidence Interval Included

47 Recolored Forecast With One Year Saved  Looks like a really good fit!

48 Few Months Ahead Forecast

49 Few months ahead forecast with 95% confidence interval included:

50 Recolored forecast with confidence interval included:

51 1 Year Ahead Forecast  Standard error becomes huge at the end of the time horizon

52 Forecast With Actual Data:

53 Forecast and data with 95% confidence interval:

54 Recolored forecast  Looks the same as the previous year but is actually slightly different:

55 The google search data and actual flu cases  The trace of Google search data and actual cases:

56 The correlation matrix and Granger Test  Highly correlated to the actual flu cases  Both significant at 5% level in Granger Causality Test  Highly correlated to the actual flu cases  Both significant at 5% level in Granger Causality Test

57 Vector Autoregression Model  Lab Confirm cases cause the Google Search -Significant at lag 1, 3, 4, 6, 7, 8, 9  The Google Search causes the Lab Confirm cases-only significant at lag 2  Lab Confirm cases cause the Google Search -Significant at lag 1, 3, 4, 6, 7, 8, 9  The Google Search causes the Lab Confirm cases-only significant at lag 2

58 The response graph

59 Conclusions  Model fits very well  Forecast can be used for more than just the flu, but any medical ailment that is easily contracted. –Could be especially useful in coming months when H1N1 mutates and returns this coming Fall.  Model fits very well  Forecast can be used for more than just the flu, but any medical ailment that is easily contracted. –Could be especially useful in coming months when H1N1 mutates and returns this coming Fall.


Download ppt "Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise."

Similar presentations


Ads by Google