Presentation on theme: "Predicting the Present"— Presentation transcript:
1Predicting the Present With Google TrendsHyunyoung ChoiHal VarianJune 2009
2Problem statementGovernment agencies and other organizations produce monthly reports on economic activityRetail SalesHouse SalesAutomotive SalesUnemploymentProblems with reportsCompilation delay of several weeksSubsequent revisionsSample size may be smallNot available at all geographic levelsGoogle Trends releases daily and weekly index of search queries by industry verticalReal time dataNo revisions (but some sampling variation)Large samplesAvailable by country, state and cityCan Google Trends data help predict current economic activity?Before release of preliminary statisticsBefore release of final revision22
3Categories in Google Trends by Query Shares ClearingsSectorsWe have 30 top categories and under the 30 categories, we have 274 subcategories and we have 4 layers of categories with 753 verticals.ContextThis market map shows the top 2 layers of categories – area represents the relative query volume and the color represent the relative growth rate over the year.From this map, we see that Entertainment categories have high search volume and it’s been growing more than other sectors such as real estate and travel.For all 753 verticals, we could see the search volume changes over time and depending on the sector of your interest, this data could give some insight on the customer behavior.TransitionLet’s look at the example of Real estate. As we all know, real estate sector hasn’t been doing that well this year, and you can see it from the color of that sector.Note: Queries from to & Growth Comparison w/ the same time window33
6Subcategories under Real Estate by Query Shares Real Estate AgenciesRental Listings & ReferralsHome InsuranceHome Inspections & AppraisalProperty ManagementHome FinancingSubcategories under Real Estate by Query Shares66
9Improving predictions Depicting trendsGoogle Trends measures normalized query share of particular category of queries – controls for overall growthOften useful to look at year-on-year changes to eliminate seasonality.Illustrate correlations and covariates.Improving predictionsForecast time series using its own lagged values and add Trends data as a predictor.Statistical significance?Improved fit?Improved forecasts?Identify turning points?99
11Forecasting primer Basic forecasting models Autoregressive: value at time t depends onValue at time t-1Seasonal adjustment: value at time t depends onValue at time t-12For monthly dataTransfer function: value at time t depends onOther contemporaneous or lagging variablesSeasonal autoregressive transfer model: Value at time t depends onValue at time t-12 (seasonality)Value at time t-1 (recent behavior)Other lagging or contemporaneous variables (such as Google Trends data)Typical question of interestHow much more accurate forecasts can you get from additional variables over and above the accuracy you get with the history of the time series itself?11111111
12Model New Home Sales Recent Trend with New Home Sales at t-1 Seasonality with New Home Sales at t-12Time SeriesExogenous VariablesHousing affordability with Average/Median Home PriceRecent Search Activity onReal Estate AgenciesRental Listings & ReferralsHome Inspections & AppraisalProperty ManagementHome InsuranceHome FinancingGoogle Trends
13Predicting the present New Residential Sales from US CensusGoogle Trends Real Estate by CategoryMonthly release 24 – 28 days after the monthSeasonally adjustedNational and Regional aggregateHome Inspections & AppraisalHome InsuranceHome FinancingProperty ManagementRental Listings & ReferralsReal Estate Agencies1313
14New House Sales vs. Real Estate Google Trends Get new picsPlot --- Mortgage rate vs. Home financing1414
15Analysis and Forecasting Model:Yt = * Yt - 1 – * us * us96.2 – * AvgPt – 1Yt : New house sold at t-th monthAvgPt – 1: Average Sales Price of New One-Family Houses Sold at (t-1)-th monthus378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1st weekus96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2nd weekJuly 2008Actual = 515KPredicted = KZ-score = 2.53August 2008 Prediction = K1515
16Analysis and Forecasting ObservationsSince 2005 new house sales have been decreasing, with little seasonalityGoogle Trends captures seasonality & recent trendsPositive association with Real Estate Agencies (96)Negative association with Rental Listings & Referrals (378) and Average Price1616
18Subcategories under Travel by Query Shares Hotels & AccommodationsAttractions & ActivitiesAir TravelBus & RailCruises & ChartersAdventure TravelCar Rental & Taxi ServicesVacation DestinationsSubcategories under Travel by Query Shares1818
19Travel to Hong Kong Hong Kong Visitors Arrival Statistics from Hong Kong Tourism BoardGoogle Trends Travel by CategoryMonthly summaries release with 1 month lagReports Country/Territory of Residence of visitorsData availableHotels & AccommodationsAir TravelCar Rental & Taxi ServicesCruises & ChartersAttractions & ActivitiesVacation DestinationsAustraliaCaribbean IslandsHawaiiHong KongLas VegasMexicoNew York CityOrlandoAdventure TravelBus & Rail1919
20Visitors Arrival Statistics vs. Google Trends 2020
21Analysis and Forecasting Model:log(Yi,t) = * log(Yi,t-1) * log(Yi,t-12) * Xi,t, * Xi,t,3* FXrate i,t + ηi, + ei,tei,t ~ N(0, ), ηi ~ N(0, )Yi,t = Arrival to Hong Kong at month t and from i-th countryXi,t,1 = Google Trend Search at 1st week of month t and from i-th countryXi,t,2 = Google Trend Search at 2nd week of month t and from i-th countryXi,t,3 = Google Trend Search at 3rd week of month t and from i-th countryFXrate i,t = Hong Kong Dollar per one unit of i-th country’s local currency at month t. Average of first week’s FX rate is used as a proxy to FX rate per each month.2121
22Visitor Arrival Statistics - Actual & Fitted 2222
23Analysis and Forecasting ConclusionArrival at time t is positively associated with arrival at time t-1 and arrival at time t-12.It shows strong seasonality and autocorrelationArrival at time t is positively associated with searches on [Hong Kong].Arrival at time t is positively associated with FX rates.When the local currency appreciates relative to Hong Kong Dollar, visitors to Hong Kong increase.2323
25Google Trends under Vehicle Brands Category US Auto Sales by MakeUS Auto Sales by MakeGoogle Trends under Vehicle Brands CategoryMonthly summaries released 1 week after end of monthData available by Car Sales, Truck Sales and Total Sales for each makeData available fromSource: Automotive News Data CenterGoogle Trends subcategory Vehicle Brands.Weekly Search query indexTotal 31 verticals in this subcategory27 verticals matching to Monthly Sales available25252525
26Google Categories under Vehicle Brands NOTE: Area represents the queries volume from first half year 2008 and the color represents queries yearly growth rate2626
27Auto Sales by Make (Top 9 Make by Sales) Monthly Sales vs Auto Sales by Make (Top 9 Make by Sales) Monthly Sales vs. Google Trends at Second Week of each month27272727
28Analysis and Forecasting Fixed effects model:log(Yi,t) = * log(Yi,t-1) * log(Yi,t-12)* Xi,t, * Xi,t,2 + ai * Makei + ei,tei,t ~ N(0, ) , Adjusted R2 =Yi,t = Auto Sales of i-th Make at month tXi,t,1 = Google Trend Search at 1st week of month t and from i-th makeXi,t,2 = Google Trend Search at 2nd week of month t and from i-th makeMakei =Dummy variable for Auto Makeai = Coefficient to capture the mean level of Auto Sales by MakeANOVA TableDf Sum Sq Mean Sq F value Pr(>F)trends < 2e-16 ***trendslog(s1) < 2e-16 ***log(s12) < 2e-16 ***as.factor(brand) < 2e-16 ***Residuals28282828
29Actual vs. Fitted Sales (Top 9 Make by Sales) 2929
30Analysis and Forecasting ConclusionSales at time t are positively associated with Sales at time t-1 and Sales at time t-12.Sales show strong seasonality and autocorrelationMonthly Sales are positively correlated to the first and second weeks search volume of each month.If the search volume increase by 1%, the sales volume will increase by an average of 0.19%.30303030
32YoY Growth in Initial Claims & Google Search According to the NBER, the current recession started December 2007.National unemployment rate passed 5% in mid 2008 and search queries on [Welfare and Unemployment] also increased at same time.
33Initial claims is an important leading indicator
36Strong Autocorrelation in Initial Claims Time SeriesAutocorrelation Function
37Initial Claims Before/After Recession Started CaliforniaNew York
38Time Window for Analysis Recession StartsWindow For Long Term ModelWindow For Short Term Model
39Model Reference ARIMA(0,1,1) X (1,0,0)12 Model ARIMA(0,1,1) X (1,0,0)12 Model With Google TrendsSignif. codes: ‘***’ 0.05 ‘**’ 0.01 ‘*’Model Fit improved significantly – smaller Standard deviation, high log likelihood and smaller AICInitial Claims are positively correlated with searches on Jobs and Welfare.
40Long Term Model: Prediction Comparison with MAE With Google Trends, the out-of-sample prediction MAE decreases by 16.84%.Prediction with rolling window from 1/11/2009 to 4/12/2009Prediction Error at t:Mean Absolute Error:
41Short Term Model: Prediction Comparison with MAE With Google Trends, the out-of-sample prediction MAE decreases by 19.23%.Prediction errors are within the same range as LT Model.Fit improvement is better with ST Model.
42SummaryGoogle Trends significantly improves out-of-sample prediction of state unemployment, up to 18 days in advance of data release.Mean absolute error for out-of-sample predictions declines by 16.84% for LT Model and % for ST Model.Further workCan examine metro level dataOther local data (real estate)Combine with other predictorsDetect turning points?