Presentation on theme: "Google Confidential and Proprietary 1 Predicting the Present With Google Trends Hyunyoung Choi Hal Varian June 2009."— Presentation transcript:
Google Confidential and Proprietary 1 Predicting the Present With Google Trends Hyunyoung Choi Hal Varian June 2009
Google Confidential and Proprietary 22 Problem statement Government agencies and other organizations produce monthly reports on economic activity Retail Sales House Sales Automotive Sales Unemployment Problems with reports Compilation delay of several weeks Subsequent revisions Sample size may be small Not available at all geographic levels Google Trends releases daily and weekly index of search queries by industry vertical Real time data No revisions (but some sampling variation) Large samples Available by country, state and city Can Google Trends data help predict current economic activity? Before release of preliminary statistics Before release of final revision
Google Confidential and Proprietary 3 Categories in Google Trends by Query Shares Note: Queries from to & Growth Comparison w/ the same time window
Google Confidential and Proprietary Real Estate
Google Confidential and Proprietary 5 Geography Category Time window
Google Confidential and Proprietary 6 Real Estate Agencies Rental Listings & Referrals Home Insurance Home Inspections & Appraisal PropertyManagement Home Financing 6 Subcategories under Real Estate by Query Shares
Google Confidential and Proprietary 77 Search on Real Estate Agencies
Google Confidential and Proprietary 88 Searches on Rental Listings & Referrals
Google Confidential and Proprietary 9 Depicting trends Google Trends measures normalized query share of particular category of queries – controls for overall growth Often useful to look at year-on-year changes to eliminate seasonality. Illustrate correlations and covariates. Improving predictions Forecast time series using its own lagged values and add Trends data as a predictor. Statistical significance? Improved fit? Improved forecasts? Identify turning points? 9
Google Confidential and Proprietary yr Mortgage Rate vs. Home Financing
Google Confidential and Proprietary 11 Forecasting primer Basic forecasting models Autoregressive: value at time t depends on Value at time t-1 Seasonal adjustment: value at time t depends on Value at time t-12 For monthly data Transfer function: value at time t depends on Other contemporaneous or lagging variables Seasonal autoregressive transfer model: Value at time t depends on Value at time t-12 (seasonality) Value at time t-1 (recent behavior) Other lagging or contemporaneous variables (such as Google Trends data) Typical question of interest How much more accurate forecasts can you get from additional variables over and above the accuracy you get with the history of the time series itself?
Google Confidential and Proprietary New Home Sales Model Recent Trend with New Home Sales at t-1 Seasonality with New Home Sales at t-12 Recent Search Activity on Real Estate Agencies Rental Listings & Referrals Home Inspections & Appraisal Property Management Home Insurance Home Financing Time Series Google Trends Housing affordability with Average/Median Home Price Exogenous Variables
Google Confidential and Proprietary 13 Predicting the present Monthly release 24 – 28 days after the month Seasonally adjusted National and Regional aggregate Home Inspections & Appraisal Home Insurance Home Financing Property Management Rental Listings & Referrals Real Estate Agencies New Residential Sales from US CensusGoogle Trends Real Estate by Category
Google Confidential and Proprietary 14 New House Sales vs. Real Estate Google Trends
Google Confidential and Proprietary 15 Model: Y t = * Y t - 1 – * us * us96.2 – * AvgP t – 1 Y t : New house sold at t-th month AvgP t – 1 : Average Sales Price of New One-Family Houses Sold at (t-1)-th month us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1 st week us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2 nd week 15 Analysis and Forecasting July 2008 Actual = 515K Predicted = K Z-score = 2.53 August 2008 Prediction = K
Google Confidential and Proprietary 16 Analysis and Forecasting Observations Since 2005 new house sales have been decreasing, with little seasonality Google Trends captures seasonality & recent trends Positive association with Real Estate Agencies (96) Negative association with Rental Listings & Referrals (378) and Average Price
Google Confidential and Proprietary 17 Travel
Google Confidential and Proprietary 18 Hotels & Accommodations Attractions & Activities Air Travel Bus & Rail Cruises & Charters Adventure Travel Car Rental & Taxi Services Vacation Destinations 18 Subcategories under Travel by Query Shares
Google Confidential and Proprietary 19 Travel to Hong Kong Monthly summaries release with 1 month lag Reports Country/Territory of Residence of visitors Data available Hotels & Accommodations Air Travel Car Rental & Taxi Services Cruises & Charters Attractions & Activities Vacation Destinations Australia Caribbean Islands Hawaii Hong Kong Las Vegas Mexico New York City Orlando Adventure Travel Bus & Rail Google Trends Travel by Category Visitors Arrival Statistics from Hong Kong Tourism Board
Google Confidential and Proprietary 20 Visitors Arrival Statistics vs. Google Trends
Google Confidential and Proprietary 21 Analysis and Forecasting Model: log(Y i,t ) = * log(Y i,t-1 ) * log(Y i,t-12 ) * X i,t, * X i,t, * FXrate i,t + η i, + e i,t e i,t ~ N(0, ), η i ~ N(0, ) Y i,t = Arrival to Hong Kong at month t and from i-th country X i,t,1 = Google Trend Search at 1st week of month t and from i-th country X i,t,2 = Google Trend Search at 2nd week of month t and from i-th country X i,t,3 = Google Trend Search at 3rd week of month t and from i-th country FXrate i,t = Hong Kong Dollar per one unit of i-th countrys local currency at month t. Average of first weeks FX rate is used as a proxy to FX rate per each month.
Google Confidential and Proprietary 22 Visitor Arrival Statistics - Actual & Fitted
Google Confidential and Proprietary 23 Analysis and Forecasting Conclusion Arrival at time t is positively associated with arrival at time t-1 and arrival at time t-12. It shows strong seasonality and autocorrelation Arrival at time t is positively associated with searches on [Hong Kong]. Arrival at time t is positively associated with FX rates. When the local currency appreciates relative to Hong Kong Dollar, visitors to Hong Kong increase.
Google Confidential and Proprietary 24 Automobiles
Google Confidential and Proprietary 25 US Auto Sales by Make Monthly summaries released 1 week after end of month Data available by Car Sales, Truck Sales and Total Sales for each make Data available from Source: Automotive News Data Center Google Trends subcategory Vehicle Brands. Weekly Search query index Total 31 verticals in this subcategory 27 verticals matching to Monthly Sales available Google Trends under Vehicle Brands Category US Auto Sales by Make
Google Confidential and Proprietary 26 Google Categories under Vehicle Brands NOTE: Area represents the queries volume from first half year 2008 and the color represents queries yearly growth rate
Google Confidential and Proprietary 27 Auto Sales by Make (Top 9 Make by Sales) Monthly Sales vs. Google Trends at Second Week of each month
Google Confidential and Proprietary 28 Analysis and Forecasting Fixed effects model: log(Y i,t ) = * log(Y i,t-1 ) * log(Y i,t-12 ) * X i,t, * X i,t,2 + a i * Make i + e i,t e i,t ~ N(0, ), Adjusted R 2 = Y i,t = Auto Sales of i-th Make at month t X i,t,1 = Google Trend Search at 1st week of month t and from i-th make X i,t,2 = Google Trend Search at 2nd week of month t and from i-th make Make i =Dummy variable for Auto Make a i = Coefficient to capture the mean level of Auto Sales by Make ANOVA Table Df Sum Sq Mean Sq F value Pr(>F) trends < 2e-16 *** trends log(s1) < 2e-16 *** log(s12) < 2e-16 *** as.factor(brand) < 2e-16 *** Residuals
Google Confidential and Proprietary 29 Actual vs. Fitted Sales (Top 9 Make by Sales)
Google Confidential and Proprietary 30 Analysis and Forecasting Conclusion Sales at time t are positively associated with Sales at time t-1 and Sales at time t-12. Sales show strong seasonality and autocorrelation Monthly Sales are positively correlated to the first and second weeks search volume of each month. If the search volume increase by 1%, the sales volume will increase by an average of 0.19%.
Google Confidential and Proprietary 31 Unemployment
Google Confidential and Proprietary YoY Growth in Initial Claims & Google Search According to the NBER, the current recession started December National unemployment rate passed 5% in mid 2008 and search queries on [Welfare and Unemployment] also increased at same time.
Google Confidential and Proprietary Initial claims is an important leading indicator
Google Confidential and Proprietary Google Trends data [Search Insights screenshot]
Google Confidential and Proprietary Initial Claims and Google Trends
Google Confidential and Proprietary Strong Autocorrelation in Initial Claims Time SeriesAutocorrelation Function
Google Confidential and Proprietary Initial Claims Before/After Recession Started CaliforniaNew York
Google Confidential and Proprietary Time Window for Analysis Window For Long Term Model Window For Short Term Model Recession Starts
Google Confidential and Proprietary Model Reference ARIMA(0,1,1) X (1,0,0) 12 Model ARIMA(0,1,1) X (1,0,0) 12 Model With Google Trends Model Fit improved significantly – smaller Standard deviation, high log likelihood and smaller AIC Initial Claims are positively correlated with searches on Jobs and Welfare. Signif. codes: *** 0.05 ** 0.01 *
Google Confidential and Proprietary Long Term Model: Prediction Comparison with MAE With Google Trends, the out-of-sample prediction MAE decreases by 16.84%. Prediction with rolling window from 1/11/2009 to 4/12/2009 Prediction Error at t: Mean Absolute Error:
Google Confidential and Proprietary Short Term Model: Prediction Comparison with MAE With Google Trends, the out-of-sample prediction MAE decreases by 19.23%. Prediction errors are within the same range as LT Model. Fit improvement is better with ST Model.
Google Confidential and Proprietary Summary Google Trends significantly improves out-of-sample prediction of state unemployment, up to 18 days in advance of data release. Mean absolute error for out-of-sample predictions declines by 16.84% for LT Model and 19.23% for ST Model. Further work Can examine metro level data Other local data (real estate) Combine with other predictors Detect turning points?