Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002

Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002 eneva@cs.cmu.edu

Recent Research Projects Dimensionality Reduction Methods and Fractal Dimension (with Christos Faloutsos) Learning to Change Taxonomies (with Valery Petrushin, Accenture Technology Labs) Text Re-Classification Using Existing Schemas (with Yiming Yang) Learning Within-Sentence Semantic Coherence (with Roni Rosenfeld) Automatic Document Summarization (with John Lafferty) Consumer Behavior Prediction (with Alan Montgomery [Business school] and Rich Caruana [SCS])

Outline Introduction & Motivation Dataset Baseline Models New Hybrid Models Results Summary & Work in Progress

How to increase profits? Without raising the overall price level? Without more advertising? Without attracting new customers?

A: Better Pricing Strategies Encourage the demand for products which are most profitable for the store Recent trend to consolidate independent stores into chains Pricing doesn’t take into account the variability of demand due to neighborhood differences.

A: Micro-Marketing Pricing strategies should adapt to the neighborhood demand The basis: the difference in interbrand competition in different stores Stores can increase operating profit margins by 33% to 83% [Montgomery 1997]

Understanding Demand Need to understand the relationship between the prices of products in a category and the demand for these products Price Elasticity of Demand

Price Elasticity consumer’s response to price change inelasticelastic Q is quantity purchased P is price of product

Prices and Quantities Q demanded of a specific product is a function of the prices of all the products in that category This function is different for every store, for every category

The Function Category Price of Product 1 Price of Product 2 Price of Product 3 Price of Product N... “I know your customers” Predictor Quantity bought of Product 1... Quantity bought of Product 2 Quantity bought of Product 3 Quantity bought of Product N Need to multiply this across many stores, many categories.

How to find this function? Traditionally – using parametric models (linear regression)

Data Example

Data Example – Log Space

The Function Category Price of Product 1 Price of Product 2 Price of Product 3 Price of Product N... “I know your customers” Predictor Quantity bought of Product 1... Quantity bought of Product 2 Quantity bought of Product 3 Quantity bought of Product N Need to multiply this across many stores, many categories. convert to ln spaceconvert to original space

How to find this function? Traditionally – using parametric models (linear regression) Recently – using non-parametric models (neural networks)

Our Goal Advantage of LR: known functional form (linear in log space), extrapolation ability Advantage of NN: flexibility, accuracy robustness accuracy NN new LR Take Advantage: use the known functional form to bias the NN Build hybrid models from the baseline models

Evaluation Measure Root Mean Squared Error (RMS) the average deviation between the true quantity and the predicted quantity

Error Measure – Unbiased Model which is an unbiased estimator for q. is a biased estimator for q, and we correct the bias by using by computing the integral over the distribution but

Dataset Store-level cash register data at the product level for 100 stores Store prices updated every week Two Years of transactions Chilled Orange Juice category (12 Products)

Models Hybrids –Smart Prior –MultiTask Learning –Jumping Connections –Frozen Jumping Connections Baselines –Linear Regression –Neural Networks

Baselines Linear Regression Neural Networks

q is the quantity demanded p i is the price for the i th product K products overall The coefficients a and b i are determined by the condition that the sum of the square residuals is as small as possible. Linear Regression

Results - RMS Error RMS

Neural Networks Generic nonlinear function approximators Collection of basic units (neurons), computing a (non)linear function of their input Random initialization Backpropagation Early stopping to prevent overfitting

Neural Networks 1 hidden layer, 100 units, sigmoid activation function

Results RMS RMS

Hybrid Models Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections

Smart Prior Idea: Initialize the NN with a “good” set of weights; help it start from a “smart” prior. Start the search in a state which already gives a linear approximation NN training in 2 stages –First, on synthetic data (generated by the LR model) –Second, on the real data

Smart Prior LR

Results RMS RMS

Multitask Learning Idea: learning an additional related task in parallel, using a shared representation Adding the output of the LR model (built over the same inputs) as an extra output to the NN Make the NN share its hidden nodes between both tasks [Caruana 1997]

MultiTask Learning Custom halting function Custom RMS function

Results RMS RMS

Jumping Connections Idea: fusing LR and NN Modify architecture of the NN Add connections which “jump” over the hidden layer Gives the effect of simulating a LR and NN together

Jumping Connections

Results RMS RMS

Frozen Jumping Connections Idea: show the model what the “jump” is for Same architecture as Jumping Connections, but two training stages Freeze the weights of the jumping layer, so the network can’t “forget” about the linearity

Frozen Jumping Connections

Results RMS RMS

Models Hybrids –Smart Prior –MultiTask Learning –Jumping Connections –Frozen Jumping Connections Baselines: –Linear Regression –Neural Networks Combinations –Voting –Weighted Average

Combining Models Idea: Ensemble Learning Use all models and then combine their predictions Committee Voting Weighted Average 2 baseline and 3 hybrid models (Smart Prior, MultiTask Learning, Frozen Jumping Conections)

Committee Voting Average the predictions of the models

Results RMS RMS

Weighted Average – Model Regression Optimal weights determined by a linear regression model over the predictions

Results RMS RMS

Normalized RMS Error Compare model performance across stores with different: –Sizes –Ages –Locations Need to normalize Compare to baselines Take the error of the LR benchmark as unit error

Normalized RMS Error

Summary Built new models for better pricing strategies for individual stores, categories Hybrid models clearly superior to baselines for customer choice prediction Incorporated domain knowledge (linearity) in Neural Networks New models allow stores to –price the products more strategically and optimize profits –maintain better inventories –understand product interaction www.cs.cmu.edu/~eneva Category P of Prod1 P of Prod2 P of Prod3 P of ProdN... “I know your customers ” Predictor Q bought of Prod1... Q bought of Prod2 Q bought of Prod3 Q bought of ProdN

References Montgomery, A. (1997). Creating Micro- Marketing Pricing Strategies Using Supermarket Scanner Data West, P., Brockett, P. and Golden, L (1997) A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice Guadagni, P. and Little, J. (1983) A Logit Model of Brand Choice Calibrated on Scanner data Rossi, P. and Allenby, G. (1993) A Bayesian Approach to Estimating Household Parameters

Work In Progress analyze Weighted Average model compare extrapolation ability of new models Other MTL tasks: –shrinkage model – a “super” store model with data pooled across all stores –store zones

On one hand… In log space, Price-Quantity relationship is fairly linear

On the other hand… the derivation of consumers' demand responses to price changes without the need to write down and rely upon particular mathematical models for demand

“The” Model Category Price of Product 1 Price of Product 2 Price of Product 3 Price of Product N... “I know your customers” Predictor Quantity bought of Product 1... Quantity bought of Product 2 Quantity bought of Product 3 Quantity bought of Product N Need to multiply this across many stores, many categories. convert to ln spaceconvert to original space

Problem Definition For a set of products –Given the price distribution –Predict the consumption distribution Change in price of one product affects the consumption of all other products

Assumptions Independence –Substitutes: fresh fruit, other juices –Other Stores Stationarity –Change over time –Holidays

The Most Important Slide for this presentation and the paper: www.cs.cmu.edu/~eneva/ eneva@cs.cmu.edu

Converting Predictions to Original Space

Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002

Similar presentations

Presentation on theme: "Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002

Similar presentations

Presentation on theme: "Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002"— Presentation transcript:

Similar presentations

About project

Feedback