Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise.

Slides:



Advertisements
Similar presentations
Novel H1N1 Influenza A Current Knowledge and Recommendations June
Advertisements

Autocorrelation Functions and ARIMA Modelling
BIRD FLU PANDEMIC PREPAREDNESS.
Prepared by Dr Alissar Rady, WHO Lebanon
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Analysis of Sales of Food Services & Drinking Places Julianne Shan Ho-Jung Hsiao Christian Treubig Lindsey Aspel Brooks Allen Edmund Becdach.
Forecasting CPI Xiang Huang, Wenjie Huang, Teng Wang, Hong Wang Benjamin Wright, Naiwen Chang, Jake Stamper.
Information source: Swine Flu What is Swine Influenza? Swine Influenza (swine flu) is a respiratory disease of pigs caused by type A influenza.
Business Forecasting Chapter 10 The Box–Jenkins Method of Forecasting.
 Refers to an illness caused by any of many different strains of influenza viruses that have adapted to a specific host.  It considers as a flu.  You.
Judith A. Monroe, M.D. State Health Commissioner 28 April
Take Home II: US Residential Natural Gas Price Analysis and 2011 Forecast Group E Lars Hult Eric Johnson Matthew Koson Trung Le Joon Hee Lee Aygul Nagaeva.
Price of Gold and US Dollar Index Dwarakamayi Polakam Jennifer Griffeth Ashley Arlotti Rui Feng Ying Fan Qi He Qi Li Group C Presentation.
1 Power Nine Econ 240C. 2 Outline Lab Three Exercises Lab Three Exercises –Fit a linear trend to retail and food sales –Add a quadratic term –Use both.
TAKE HOME PROJECT 2 Group C: Robert Matarazzo, Michael Stromberg, Yuxing Zhang, Yin Chu, Leslie Wei, and Kurtis Hollar.
The effect of 9/11 on the airline industry ECON 240C – Project 2 Hao Jin ChingChi Huang Bryan Watson Vineet Sharma Hilde Hesjedal.
United States Imports Michael Williams Kevin Crider Andreas Lindal Jim Huang Juan Shan.
Global Warming: Is It True? Peter Fuller Odeliah Greene Amanda Smith May Zin.
1 Takehome One month treasury bill rate.
1 Lecture Eleven Econ 240C. 2 Outline Review Stochastic Time Series –White noise –Random walk –ARONE: –ARTWO –ARTHREE –ARMA(2,2) –MAONE*SMATWELVE.
Project II Troy Dewitt Emelia Bragadottir Christopher Wilderman Qun Luo Dane Louvier.
Introduction to Volatility Models From Ruey. S. Tsay’s slides.
Revenue Passenger Miles (RPM) Brandon Briggs, Theodore Ehlert, Mats Olson, David Sheehan, Alan Weinberg.
1 Econ 240 C Lecture 6. 2 Part I: Box-Jenkins Magic ARMA models of time series all built from one source, white noise ARMA models of time series all built.
1 Identifying ARIMA Models What you need to know.
Car Sales Analysis of monthly sales of light weight vehicles. Laura Pomella Karen Chang Heidi Braunger David Parker Derek Shum Mike Hu.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Dow Jones and Oil Prices ECON 240C Take Home 2 Members: Jessica Aguirre Edward Han Masatoshi Hirokawa Han Liu Lu Mao Christian Mundo Yuejing Wu.
1 Power Nine Econ 240C. 2 Outline Lab Three Exercises Lab Three Exercises –Fit a linear trend to retail and food sales –Add a quadratic term –Use both.
1 Power Nine Econ 240C. 2 Outline Lab Three Exercises Lab Three Exercises –Fit a linear trend to retail and food sales –Add a quadratic term –Use both.
KYIV SCHOOL OF ECONOMICS Financial Econometrics (2nd part): Introduction to Financial Time Series May 2011 Instructor: Maksym Obrizan Lecture notes II.
Economics 240C Forecasting US Retail Sales. Group 3.
1 Takehome One Excaus:Price of US $ in Canadian $
1 Arch-Garch Lab Nine. 2 Producer Price Index for Finished Goods, 1982 =100, –
Global Analysts Eirik Skeid, Anders Graham, Bradley Moore, Matthew Scott Tor Seim, Steven Comstock.
Personal Savings as a Percentage of Disposable Personal Income Take Home II June 4 th, 2009 June 4 th, 2009 Marissa Pittman Morgan Hansen Eric Griffin.
Volatility Models Fin250f: Lecture 5.2 Fall 2005 Reading: Taylor, chapter 9.
Between the extremes of panic and complacency lies the solid ground of vigilance. Margaret Chen Director General of WHO.
Forecasting Crude Oil Prices By: Keith Cochran Joseph Singh Julio Urenda Dave White Justin Adams.
Introduction At the start of most beginning economics courses we learn the economics is a science aimed toward answering the following questions: 1.What.
Modeling Unemployment Rates June 3, 2008 Ryan DeGrazier Chun-Hung Lin Johan Rothe Chun-Kai Wang Anastasia Zavodny.
Forecasting. Aruoba-Diebold-Scotti (ADSA) Business Index, Fed at Philadelphia The Aruoba-Diebold-Scotti business conditions index is designed to track.
1 Power Nine Econ 240C. 2 Outline Lab Three Exercises Lab Three Exercises –Fit a linear trend to retail sales –Add a quadratic term –Use both models to.
He Loves Me, He Loves Me Not A Forecast of U.S. Jewelry Sales Alex Gates Ling-Ching Hsu Shih-Hao Lee Hui Liang Mateusz Tracz Grant Volk June 1, 2010.
The 91 Day T-Bill Rate Steven Carlson Miguel Delgado Helleseter Darren Egan Christina Louie Cambria Price Pinar Sahin.
INTRODUCTION TO INFLUENZA The (Ferret) Sneeze Heard Around The World: The Case Of The Bioengineered Bird Flu Case Study for AAC&U STIRS Project Jill M.
20 Answers About Influenza
BOX JENKINS METHODOLOGY
Stanislaus County It’s Not Flu as Usual It’s Not Flu as Usual Pandemic Influenza Preparedness Renee Cartier Emergency Preparedness Manager Health Services.
Review and Discussion Time line courtesy of:
Sore throat? Sniffles?Sore throat? Sniffles?  Google it! Duh!  During flu season, more people enter search queries concerning the flu.  Each year 90.
Tutorial for solution of Assignment week 39 “A. Time series without seasonal variation Use the data in the file 'dollar.txt'. “
HIV/ AIDS Right now, 1 out of every 300 people in the United States has HIV, the virus that causes AIDS.
Agree Disagree 1._______ ________ 2._______ ________ 3._______ ________ 5._______ ________ 4._______ ________ An epidemic is worse than a pandemic. The.
Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect.
Pandemic Flu Brief Unit Name Rank / Name Unit logo.
It’s Just Not the Flu Anymore Rick Hong, MD Associate Chairman CCHS EMC Medical Director, PHPS.
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the.
Swine Flu & You! Information Regarding the Possible Approaching Swine Flu Pandemic.
Seasonal ARMA forecasting and Fitting the bivariate data to GARCH John DOE.
Technology for Homeland Security PANDEMIC SCENARIO By Kevin G. Coleman
What Is H1N1 (Swine Flu) Pandemic Influenza? Colorized image of H1N1 from a transmission electron micrograph. Source: CDC.
The Box-Jenkins (ARIMA) Methodology
2/25/ lecture 121 STATS 330: Lecture 12. 2/25/ lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Avian Influenza A (H5N1) “Bird Flu”
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS.
CHAPTER 16 ECONOMIC FORECASTING Damodar Gujarati
Forecasting the Return Volatility of the Exchange Rate
Presentation transcript:

Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise Group C

Preview of Coming Attractions  Introduction  The Data  Our Best Model  Verifying The Model  Comparing Our Model with Real Data  Conclusions  Introduction  The Data  Our Best Model  Verifying The Model  Comparing Our Model with Real Data  Conclusions

Background  The Flu –Generalization for multiple different viruses –Responsible for  Respiratory illness  Up to 500,000 deaths world wide per year –Virus is able to flourish in those with weaker immune systems  Young  Elderly  Sick  The Flu –Generalization for multiple different viruses –Responsible for  Respiratory illness  Up to 500,000 deaths world wide per year –Virus is able to flourish in those with weaker immune systems  Young  Elderly  Sick

Background  Conventional methods for forecasting possible medical catastrophes –Step 1 – Patient realizes they are sick –Step 2 – Patient makes a medical appt. –Step 3 – Patient goes to appointment and is diagnosed –Step 4 – Medical professional sends data to CDC  Conventional methods for forecasting possible medical catastrophes –Step 1 – Patient realizes they are sick –Step 2 – Patient makes a medical appt. –Step 3 – Patient goes to appointment and is diagnosed –Step 4 – Medical professional sends data to CDC

The Future is Google  Future methods for predicting local epidemics and world wide pandemics lie within the hands of Google –Google Flu Trends  Weekly data with collection date starting on June 1, 2003  Data is a normalized aggregate of number of searches for “flu” or similar queries in a given area. –Source: –  Future methods for predicting local epidemics and world wide pandemics lie within the hands of Google –Google Flu Trends  Weekly data with collection date starting on June 1, 2003  Data is a normalized aggregate of number of searches for “flu” or similar queries in a given area. –Source: –

Google Flu Trends  Idea behind the project was to predict pandemics and epidemics faster than conventional methods  Early detection could lead to a lower rate of infection and subsequent number of deaths  Could save you and your families lives someday  Idea behind the project was to predict pandemics and epidemics faster than conventional methods  Early detection could lead to a lower rate of infection and subsequent number of deaths  Could save you and your families lives someday

Google Flu Trends  The Future of Forecasting –Step 1: The sick realize they are sick –Step 2: Patient “Googles” their symptoms –Step 3: Data is aggregated and sent to the CDC  The Future of Forecasting –Step 1: The sick realize they are sick –Step 2: Patient “Googles” their symptoms –Step 3: Data is aggregated and sent to the CDC

Pitfalls of the Data  Everyone does not have the internet  Everyone does not know how to use the internet  Everyone does not use Google (≈18%)  New strains of virus, such as H1N1 may not behave similarly to former strains –This may or may not be an issue  Everyone does not have the internet  Everyone does not know how to use the internet  Everyone does not use Google (≈18%)  New strains of virus, such as H1N1 may not behave similarly to former strains –This may or may not be an issue

Hypothesis  Google data on the flu can be used to forecast future outbreaks of the flu

The Data  Trace shows serious seasonality  Notice the spike in 2003 from increased number of searches due to bird flu scare  Trace shows serious seasonality  Notice the spike in 2003 from increased number of searches due to bird flu scare

The Data  Histogram of the data – definitely not normally distributed with huge Jarque-Bera Stat

Correlogram of the Data – looks like a possible AR(2) or AR(3)

The Data  Unit-Root test – significant at the 1% level but not conclusive

Seasonal Differencing was done to make the data more stationary: SDUS=US-US(-52)

 Histogram of the seasonally differenced data: Still not normal but now more normal with less skewness and is now single peaked.

The Data  Correlogram of the seasonal difference – looks like an AR(2)

The Data  Unit Root Test – Further evidence of stationary:

The Data  First modeled using OLS: Tried AR(1) AR(2) first

The Data  Correlogram – orthogonal

The Data  Histogram of the residuals – highly kurtotic and negatively skewed.

The Data  Serial correlation test – no serial correlation detected.

The Data  Correlogram of SQ residuals – shows some significance:

The Data  Test for Autoregressive Heteroskedasticity – positive for ARCH:

The Data  Trace of the squared residuals – shows spikes meaning ARCH is present:

The Data  ARCH GARCH model used:

The Data  Correlogram of the residuals – now not orthogonal:

The Data  Correlogram of squared residuals – now orthogonal:

The Data  Histogram of ARCH GARCH residuals – far less kurtosis and skewness and closer to being normally distributed than before:

The Data  Test for ARCH is no longer significant:

The Data  Looking back at the OLS estimates and correlogram, there is a spike at lag 9 which could be significant so we added an MA(9) term to see if it would orthogonalize the correlogram in the ARCH GARCH model.

The Data

 We still have highly significant Q-statistics showing orthogonal residuals:

The Data  Still a positive test for ARCH:

The Data  ARCH GARCH model estimated:

The Data  Now the residuals are orthogonal at all visible lags:

The Data  Squared residual correlogram is also significant:

The Data  Histogram of the residuals – still single peaked, slightly skewed and kurtotic:

The Data  No longer a positive test for ARCH:

Correlogram of Standardized Residuals

Correlogram of Resid Squared

Garch Trace

Garch Histogram

Ordinary residuals

Standardized residuals  Lower Kurtosis

1 Forecast with 1 Year Time Saved  Good fit

95% Confidence Interval Included

Recolored Forecast With One Year Saved  Looks like a really good fit!

Few Months Ahead Forecast

Few months ahead forecast with 95% confidence interval included:

Recolored forecast with confidence interval included:

1 Year Ahead Forecast  Standard error becomes huge at the end of the time horizon

Forecast With Actual Data:

Forecast and data with 95% confidence interval:

Recolored forecast  Looks the same as the previous year but is actually slightly different:

The google search data and actual flu cases  The trace of Google search data and actual cases:

The correlation matrix and Granger Test  Highly correlated to the actual flu cases  Both significant at 5% level in Granger Causality Test  Highly correlated to the actual flu cases  Both significant at 5% level in Granger Causality Test

Vector Autoregression Model  Lab Confirm cases cause the Google Search -Significant at lag 1, 3, 4, 6, 7, 8, 9  The Google Search causes the Lab Confirm cases-only significant at lag 2  Lab Confirm cases cause the Google Search -Significant at lag 1, 3, 4, 6, 7, 8, 9  The Google Search causes the Lab Confirm cases-only significant at lag 2

The response graph

Conclusions  Model fits very well  Forecast can be used for more than just the flu, but any medical ailment that is easily contracted. –Could be especially useful in coming months when H1N1 mutates and returns this coming Fall.  Model fits very well  Forecast can be used for more than just the flu, but any medical ailment that is easily contracted. –Could be especially useful in coming months when H1N1 mutates and returns this coming Fall.