Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9: Pricing with a “Double ML” approach to Causal Inference

Similar presentations


Presentation on theme: "Lecture 9: Pricing with a “Double ML” approach to Causal Inference"— Presentation transcript:

1 Lecture 9: Pricing with a “Double ML” approach to Causal Inference

2 Broad Agenda Review economics of price setting and importance of learning consumer price-sensitivity. Learning = Causal Inference. Review econometric challenges of causal inference in demand problems. Quick review of machine learning concepts. Combine machine learning + econometrics into “Double ML” method of causal inference.

3 Economics of Price Setting

4 Context: Pricing is Really Important
Worldwide retail sales were $24 trillion in 2015 (~$5 trillion in US). Gross margins for retailers are often high (20-35%) However, after factoring in fixed/employee costs net margins are often very low (1-3%). Better pricing to sell just a few more units/eek out higher gross margin can be tremendously important to retailer bottom line.

5 Economic Value to the Customer (EVC)
The maximum price a customer would be willing to pay assuming she is fully informed about the product’s benefits as compared to the closest competitor’s product and price Goal: Generate an accurate value proposition

6 The inefficiency of a single price
EVC traced out by demand curve The inefficiency of a single price Customers “in the DWL triangle” could have been profitably served (EVC>MC), but are “priced out” of the market To serve these customers with a single price, the firm has to discount all units to this price. Would lose more money on the intensive margin than it makes up by serving more customers.

7 Perfect Price Discrimination
Theoretical standard: charge everyone their willingness to pay, provided this exceeds costs In theory, perfectly efficient and results in much greater producer surplus. In practice, impossible to achieve. View this as a benchmark Solves arbitrage problem by letting consumers choose quantity discounts: Shoe, buy one get one free

8 Customer segmentation as price discrimination
Goal: whenever possible, charge customer segments a price that reflects their segments willingness to pay. Method: Measure price sensitivity in each group. Set prices/markups accordingly. Q: What does this mean? A: Let’s do some math Can be converted into a markup formula Often used to justify a “constant markup” policy but it doesn’t justify that, since elasticity changes from market to market In markets with less elastic demand, price higher. This is known as price discrimination

9 Profit Maximization Review
p(q) is the price needed to sell q units, c(q) be the cost of producing q units Firm profits (choose q to maximize this): First order condition 𝜋=𝑝 𝑞 ⋅𝑞 −𝑐 𝑞 𝑑 𝜋 𝑑 𝑞 =𝑝 𝑞 +𝑝′(𝑞)⋅𝑞 −𝑐′ 𝑞 = 0

10 Inverse Elasticity Rule
Profit Max (MR=MC): MR MC 𝑝 𝑞 𝑚 − 𝑝 ′ 𝑞 𝑚 𝑞 𝑚 = 𝑐 ′ 𝑞 𝑚 GM Fraction (“markup”) = 1 over elasticity: 𝑝−𝑐′( 𝑞 𝑚 ) 𝑝 = −𝑞𝑝′( 𝑞 𝑚 ) 𝑝 ≡ 1 𝜖 More elastic = more price sensitive  accept lower GM fraction = lower price. Less elastic = less price sensitive  demand higher GM fraction = higher price. Major goal of pricing: measure elasticities and use price discrimination strategies to inversely correlate them with markups. Necessarily elasticity > 1 Markup formula widely abused as a pricing strategy Price-cost margin AKA Lerner index

11 Extension to Multiple Products
Simple extension to multi-product case, but must also measure cross-price elasticities. If a product has mostly substitute (complimentary) relationships with other products, its optimal price will be higher (lower) than in the one good case. Vector of ones, vector of inverse elasticities

12 Extension to Multiple Products
Matrix Notation L: Vector of GM Fraction for each product E: Matrix of own and cross-price elasticities Ei,j = % change in demand of product i in response to a 1% change in price of product j Profit-optimal Solution* Choose prices such that: L* = - E-1 1 Vector of ones, vector of inverse elasticities

13 Anatomy of a Sale MSFT puts the Surface Pro 4 on sale
More of that model are sold At lower price More peripherals – Surface Pen, Surface dock are sold Fewer Surface Books are sold Fewer MacBook Pros are sold Some people planning to buy later are induced to buy now Future sales at full margin are cannibalized All these effects are captured in the optimal pricing formula!

14 Pricing Goal: Measure Elasticities (causal effects) and then optimize

15 Review Retail net margins are narrow. Better pricing can dramatically improve business valuation. Pricing is all about segmentation and measurement. Often better segmentation is impossible. Once you have measured all causal impacts, optimal pricing is just math.

16 Causal Inference in Demand Systems

17 Pricing Goal: Measure everything
Measuring demand elasticities is a causal inference task. Gold standard of causal inference: A/B testing. May be difficult to convince retailers to run experiments. What can we do with observational data?

18 Orange Juice Data Three different orange juice brands.
Office of the Chief Economist | Orange Juice Data Three different orange juice brands. Each dot corresponds to price and sales for one particular week. Log Sales

19 Orange Juice Data: Econometric Counterfactual
Office of the Chief Economist | Orange Juice Data: Econometric Counterfactual Simple econometric model: log 𝑠𝑎𝑙𝑒 𝑠 𝑖,𝑡 = 𝛼 𝑖 + 𝜖 𝑖 ⋅log 𝑝𝑟𝑖𝑐 𝑒 𝑖𝑡 + 𝜖 𝑖𝑡 Implied counterfactual demand curves are fixed from week to week. Simple econometric model implies that you have assumed simple counterfactual. Is this counterfactual right? Maybe not: Seasonal trends. Products could wax/wane in popularity. If these factors are also related to pricing policy, will cause omitted variable bias. Log Sales

20 Omitted Variable Bias Suppose world looks like:
But we run a regression of y onto x ignoring z. We estimate: Key takeaway: Bias is proportional to the impact of z on outcomes AND correlation between x and z.

21 Econometric relationship
Observe correlation between sales and price. Want to infer causal relationship. Lower Prices (X) example insight from book airlines leading example More Sales (Y)

22 Christmas is bad for econometrics
Christmas (Z) Lower Prices (X) example insight from book airlines leading example More Sales (Y)

23 Christmas is bad for econometrics
Lower Prices More Sales Solution: Add Christmas to the regression. example insight from book airlines leading example log 𝑠𝑎𝑙𝑒 𝑠 𝑖,𝑡 = 𝛼 𝑖 + 𝜖 𝑖 ⋅log 𝑝𝑟𝑖𝑐 𝑒 𝑖𝑡 +𝛾⋅1{𝐶ℎ𝑟𝑖𝑠𝑡𝑎𝑚 𝑠 𝑡 } +𝜇 𝑖𝑡

24 Changes in product popularity are bad for econometrics
Declining Popularity Lower Prices Fewer Sales example insight from book airlines leading example

25 Changes in product popularity are bad for econometrics
Declining Popularity Lower Prices Fewer Sales Solution: Add lagged sales to the regression. example insight from book airlines leading example log 𝑠𝑎𝑙𝑒 𝑠 𝑖,𝑡 = 𝛼 𝑖 + 𝜖 𝑖 ⋅log 𝑝𝑟𝑖𝑐 𝑒 𝑖𝑡 +𝛾⋅ log 𝑠𝑎𝑙𝑒 𝑠 𝑖,𝑡−1 +𝜇 𝑖𝑡

26 Marketing Campaigns are bad for econometrics
Lower Prices More Sales example insight from book airlines leading example

27 Marketing Campaigns are bad for econometrics
Lower Prices More Sales Solution: Add a variable for marketing to regression. example insight from book airlines leading example log 𝑠𝑎𝑙𝑒 𝑠 𝑖,𝑡 = 𝛼 𝑖 + 𝜖 𝑖 ⋅log 𝑝𝑟𝑖𝑐 𝑒 𝑖𝑡 +𝛾⋅𝑀𝑎𝑟𝑘𝑒𝑡𝑖𝑛 𝑔 𝑖,𝑡 +𝜇 𝑖𝑡

28 Anything could be bad for econometrics
??? Lower Prices More Sales Solution: Add “everything” to the model?? example insight from book airlines leading example

29 Adding “everything” to a regression model is infeasible.
Economists spend a lot of time using intuition to hand coding regression equations to correspond to these intuitions about causality. What we want: a data driven solution for “specifying” demand regressions.

30 Omitted Variable Bias What variables do we “need” to control for if we want to learn causal effects well? A: Anything correlated with sales and correlated with price (recall the formula for omitted variable bias). Safer Answer: Anything correlated with sales or correlated with price.

31 Estimating with Machine Learning

32 The intersection is brand new
Econometrics Hand picked models that exploit natural experiments to mimic A/B tests Machine Learning Automated model selection, embrace high dimensional data, focus on prediction This is fundamentally new. There are two fields colliding, and the potential is enormous. And we’re not talking about machine learners naively fitting models to economic data, or about economists using simple toy ML, but rather a full intersection where the economic theory (and the economists) frame systems of tasks that can be solved with state-of–the-art ML. And that is Economic AI: domain experts combining ML tasks to solve complex problems. The intersection is brand new Harness the predictive power of ML to assist in causal inference

33 What does this mean? Machine Learning is good at selecting from many features to predict something. Econometrics is good at carefully measuring effects that we care about. Solution: Split demand estimation into Pure prediction steps where machine learning algorithms can be used to optimally control for high-dimensional confounds. Measurement steps where elasticities are estimated in an unbiased way on left over variation.

34 Partially Linear Model for Treatment Effect of Price (Chernozhukov et al. 2016)
Our Strategy: Estimate a “partially linear” model: 𝑄 𝑖,𝑡 = 𝜖 𝑖 ⋅ 𝑃 𝑖,𝑡 +𝑔 𝑋 𝑖,𝑡 +𝜇 𝐸 𝜇 𝑃,𝑋 =0 𝑃 𝑖,𝑡 =𝑓 𝑋 𝑖,𝑡 +𝜁 𝐸 𝜁 𝑋 =0 𝑄 𝑖,𝑡 : Log Sales of product i in period t (outcome) 𝑃 𝑖,𝑡 : Log Price of product i in period t (treatment) 𝜖 𝑖 : Elasticity of demand for product i, treatment effect we wish to estimate 𝑋 𝑖,𝑡 : Baseline features that predict prices or sales (potentially high dimensional) 𝑔, 𝑓 : Unknown predictive functions we don’t really care about Step 1: Use modern ML techniques for estimating 𝑔,𝑓 non-parametrically as a function of X Step 2: Compute residuals: 𝑃 𝑖,𝑡 = 𝑃 𝑖,𝑡 − 𝑔 𝑋 𝑖,𝑡 𝑄 𝑖,𝑡 = 𝑄 𝑖,𝑡 − 𝑓 𝑋 𝑖,𝑡 Step 3: Run a simple regression of tilde Q onto tilde P

35 Why does this work? 𝑃 𝑖,𝑡 and 𝑄 𝑖,𝑡 are the extent to which prices and sales were “surprises” for product i in period t. If our ML models are good, these “surprises” are true idiosyncratic variation (similar to experiments) and are not contaminated by any other variables.

36 Partially Linear Model for Treatment Effect of Price (Chernozhukov et al. 2016)
Our Strategy: Estimate a “partially linear” model: 𝑄 𝑖,𝑡 = 𝜖 𝑖 ⋅ 𝑃 𝑖,𝑡 +𝑔 𝑋 𝑖,𝑡 +𝜇 𝐸 𝜇 𝑃,𝑋 =0 𝑃 𝑖,𝑡 =𝑓 𝑋 𝑖,𝑡 +𝜁 𝐸 𝜁 𝑋 =0 𝑄 𝑖,𝑡 : Log Sales of product i in period t (outcome) 𝑃 𝑖,𝑡 : Log Price of product i in period t (treatment) 𝜖 𝑖 : Elasticity of demand for product i, treatment effect we wish to estimate 𝑋 𝑖,𝑡 : Baseline features that predict prices or sales (potentially high dimensional) 𝑔, 𝑓 : Unknown predictive functions we don’t really care about Step 1: Use modern ML techniques for estimating 𝑔,𝑓 non-parametrically as a function of X Step 2: Compute residuals: 𝑃 𝑖,𝑡 = 𝑃 𝑖,𝑡 − 𝑔 𝑋 𝑖,𝑡 𝑄 𝑖,𝑡 = 𝑄 𝑖,𝑡 − 𝑓 𝑋 𝑖,𝑡 Step 3: Run a simple regression of tilde Q onto tilde P

37 Does this approach always work?
No, but key Assumption: 𝐸 𝜇 𝑃,𝑋 =0. What does this mean?

38 Does this approach really work?
Yes, but key Assumption: 𝐸 𝜇 𝑃,𝑋 =0. What does this mean? Prices (P) Observable Characteristics (X) Sales (Q)

39 Does this approach really work?
Yes, but key Assumption: 𝐸 𝜇 𝑃,𝑋 =0. What does this mean? Prices (P) Other Shocks (𝜇) Observable Characteristics (X) Sales (Q)

40 What does it mean in English?
Assumption: Price setters don’t act on anything that allows them to predict sales that isn’t in the data. If retailers give random discounts: this analysis will work. If retailers give discounts more often in certain markets, countries, SKUs: this analysis will work. If retailers give discounts after a period of low sales, this analysis will still work. If retailers give discounts because they anticipate that they are about to become less popular, this analysis will not work. Implication: If you can “featurize” all the possible confounds (sources of omitted variable bias) into your X matrix, you can do unbiased inference.

41 Mechanics of Double ML In Pricing

42 Pricing Engine Step 1: Predicting Sales
Use “baseline demand” features to predict sales for product i in period t0+1. This is just a prediction problem and can use any ML algorithm that maximizes predictive power out of sample. Key Predictive Features: Lagged sales information available up to time t0. Lagged price information available up to time t0. Indicator variables for each site, product, and different months/weeks of the year ( 𝑋 𝑖,𝑡 ) Arbitrary interactions and transformations of these variables. Important: Do NOT (yet) use any information on price offered in period t0+1. 𝑄 𝑖,𝑡 =𝑔 𝑋 𝑖,𝑡 + 𝑄 𝑖,𝑡 𝑄 𝑖,𝑡 are “residual sales”. This captures the extent to which sales were “different from our expectation”.

43 Predicting Sales Visualization
𝑄 𝑖,𝑡 =𝑔 𝑋 𝑖,𝑡 + 𝑄 𝑖,𝑡 𝑄 𝑖,𝑡 are “residual sales” that could not be explained by machine learning model. Predictors (X) Outcome Time Series: Sales t0-4 t0-3 t0-2 t0-1 t0 t0+1 Time Series: Prices Ignored

44 Pricing Engine Step 2: Predicting Prices
Use “baseline demand” features to predict prices in period t0+1. Key Predictive Features: Lagged sales information available up to time t0 Lagged prices available up to time t0 Indicator variables for each site, product, and different months/weeks of the year ( 𝑋 𝑖,𝑡 ) Arbitrary interactions and transformations of these variables. Important: Do NOT (yet) use any information on price offered in period t0+1. 𝑃 𝑖,𝑡 =𝑓 𝑋 𝑖,𝑡 + 𝑃 𝑖,𝑡 𝑃 𝑖,𝑡 are “residual prices”. This captures the extent to which pricing was “different from our expectation”.

45 Predicting Prices Visualization
𝑃 𝑖,𝑡 =𝑓 𝑋 𝑖,𝑡 + 𝑃 𝑖,𝑡 𝑃 𝑖,𝑡 are “residual sales” that could not be explained by this regression. Ignored Predictors (X) Time Series: Sales t0-4 t0-3 t0-2 t0-1 t0 t0+1 Time Series: Prices Outcome

46 Step 3: Measuring Price Sensitivity
𝑃 𝑖,𝑡 tells us how pricing policy in period t differed from the model’s expectations. 𝑄 𝑖,𝑡 tells us how sales outcomes in period t differed from the model’s expectations. Run a simple “residual on residual” regression to learn the immediate causal impact of changing price in an unbiased way. 𝑄 𝑖,𝑡 = 𝜖 ⋅ 𝑃 𝑖, 𝑡 𝑖∈𝑃𝑟𝑜𝑑 𝜖 𝑖 ⋅ 𝑃 𝑖, ⋅1{𝑃𝑟𝑜𝑑𝑢𝑐 𝑡 𝑖 } 𝜖 is the average elasticity across all products (completely un- penalized). 𝜖 𝑖 is heterogeneous elasticity for product i. Item Hierarchies combined with additional ML (Ridge Regression, Forests, etc…) can be used to more efficiently learn these if there are “many products” 𝑄 𝑖,𝑡 are left over variation in sales that is still unexplained

47 Thank You!


Download ppt "Lecture 9: Pricing with a “Double ML” approach to Causal Inference"

Similar presentations


Ads by Google