Download presentation
Presentation is loading. Please wait.
Published byOpal Cameron Modified over 6 years ago
1
Target’s Pregnancy Prediction Problem The Complete Analytical Process
“Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August.”
2
The Model
3
Model Inputs Based on Duhigg’s description of Jenny Ward, what kinds of data did Target analysts use in constructing the pregnancy prediction model? Definitely Likely Unlikely or No Age Gender Income Past Purchase Indicators Past Purchase Intensity
4
Product Selection in Brochures
DATA OUTPUT ACTION Past Purchases - Related items - Target items Pregnancy Scores Product Selection in Brochures Age Gender
5
Discuss An analyst wants to improve the model by adding more variables to it. Suggest some additional variables.
6
Discuss Suggest other actions that can be informed by the predicted pregnancy scores.
7
Product Selection in Brochures
Model DATA OUTPUT ACTION Past Purchases - Related items - Target items Pregnancy Scores Product Selection in Brochures Age Gender
8
What is a Model? Valuation Model Source: Keith Howe (2009)
9
What is a Model? Climate Model Source: Mark Chandler, EdGCM
10
What is a Model? Climate Model Source: IPCC
11
Digital Marketing Attribution Model
What is a Model? Digital Marketing Attribution Model
12
Digital Marketing Attribution
For each response, allocate credit to the responsible channel SEO Display Ad SEM
13
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 Response Display Ad SEO SEM Email
User clicked on banner ad FIRST CLICK EXP DECAY Influence scales with time order User clicked on Google organic search result LAST CLICK Display Ad SEO SEM Response t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
14
A model is an abstraction, a simplified view of reality
15
All models are wrong but some are useful
George Box
16
First to Your Door “Right around the birth of a child... parents are exhausted and overwhelmed and their shopping patterns and brand loyalties are up for grabs.” “We knew that if we could identify them in their second trimester, there’s a good chance we could capture them for years.”
17
Brochure Design “As long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons.” “We’d put an ad for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.” “We started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random.”
18
Customer acquisition tool
It’s predictive So accurate it’s creepy
19
The Marketing Problem Better Prospecting More Relevant Brochures
More New Customers More Revenue Per Current Customer Lower Cost More Revenue More Profit
20
Discuss Identify any problems with the way Target analysts framed the business problem.
21
Hint: How is this model scored?
22
The Business Problem Revisited
Targeting of At-Risk Customers More Relevant Brochures Retain More Customers More Revenue Per Customer Cost: New Brochures v. Fewer Brochures + Any Offers More Revenue More Profit
23
Measuring Success According to Duhigg, Target’s pregnancy prediction effort was highly successful. How accurate was Target’s prediction model? The accuracy rate was not disclosed.
24
10% of targets are pregnant
25
20% predicted to be pregnant 3 of 10 predictions are accurate
2 of 5 pregnancies are missed
26
Predictive Lift
27
20% predicted to be pregnant
Accurate Prediction Missed Opp False Positives Why did Target mix in random products? A) 7 out of 10 receiving brochures will not be pregnant B) 3 out of 10 receiving brochures will feel creeped out A >> B 27
28
Customer acquisition tool It’s predictive So accurate it’s creepy
Not for customer acquisition 2. It’s not very predictive - even with Big Data Inaccurate and detracting
29
Target’s Pregnancy Prediction Problem
Defining and framing the business problem Collecting data for the analytical model Selecting an analytical method Developing a useful model that solves the problem Describing how model outputs can drive action Projecting the impact of such action Measuring the model performance
30
Complete Analytical Process
31
The Baby Names Voyager Importance of Proper Framing
32
Use Cases Which of the following questions can be answered directly by the Baby Names Voyager (without referring to other materials)? A. Why did the name Barbara peak in the 1940s? B. Is the name Charlotte or Chelsea more popular? C. How popular with David be in the year 2025? D. What name should I choose for my baby girl?
33
Other Analyses of the Data
Source: Social Security Administration
34
Other Analyses of the Data
Source: Social Security Administration
35
Other Analyses of the Data
Source: Social Security Administration
36
Other Analyses of the Data
37
Other Analyses of the Data
38
Other Analyses of the Data
39
Other Analyses of the Data
40
Inverting the Frame Given a name, guess someone’s age Given a name,
which time period is most likely? fivethirtyeight.com Given a time period, which names are popular? Baby Names Voyager Given a name, guess someone’s age Given a name, guess what languages he/she speaks
41
DATA OUTPUT ACTION Address
Religion First Name Last Name Probabilities of speaking English, Spanish, German, Japanese, etc. Segmentation, Targeting, etc.
42
Evaluating Model Performance
Make prediction using the median Use IQR as a measure of error Accuracy varies with name Source: fivethirtyeight.com
43
Evaluating Model Performance
There are a number of curiosities about this Target story. First, is it true that Target is playing offense. Is it true that Target is using this to take business away from competitors? That they are identifying the moment in which women switch loyalty and switch brands? Let’s look at the structure of the predictive model. I created this diagram to illustrate the example given in Duhigg’s article. The model seems like it’s a form of market basket analysis. It looks at the past purchasing behavior of shoppers, it identifies 25 products which are highly correlated with the purchase of baby products in the near future. So Jenny Ward is a hypothetical example and because she purchased four of these 25 products on the page, she scored 87% on the pregnancy prediction scale. The trouble is: someone like Jenny must be a current customer of Target, otherwise Target would not have data on what she purchased in the past. In fact, the other descriptions in the text seem to indicate that Jenny was already a great Target customer even before she became pregnant. She was said to use coupons, etc. etc. So, the best we can say about this model is that it is built for defence not offence. It’s more about customer retention than about customer acquisition. Accuracy varies with gender Accuracy improves with more co-variates Source: fivethirtyeight.com 43
44
Discuss What other co-variates might be useful to help predict age more accurately?
45
Complete Analytical Process
46
Course Project I. Project Proposal (Wk 3)
II. Midterm: Data Cleaning & Processing (Wk 7) III. Final: Analysis & Modeling (Wk 12)
47
Project Proposal Objectives:
Select a dataset and specify a business/organizational problem you want to solve Diagnose data issues in your dataset (you will fix these issues in Deliverable #2). We cover diagnosing and fixing data issues in Module 2 Due Date: [Sep 25th], 11:59 PM Grading: max 10 points All assignment files must be uploaded to Canvas. We do not accept ed files. Reminder: Late assignments (excused or not) will incur a penalty of 20%. Late without prior notification, or late by more than 7 days, will be scored zero. Ling or I will provide feedback and approval on Canvas. (Please open your documents before you us asking where our comments are.)
48
Choosing your Dataset Not too small (e.g. > 500 rows)
Not too big (e.g. < 1 million) Not too aggregated Not too dirty Not too clean Non-anticipatory (if Prediction)
49
Example of a Bad Dataset
Ebola in West Africa data Too aggregated For any given business problem, many of these rows will be useless Too few variables
50
Selecting an Analytical Problem
PREDICTION SEGMENTATION Probability of a borrower defaulting a loan Probability of an being spam Probability of a customer deactivating (“churn”) Amount of revenues Frequency of visits There is a response (outcome) variable If the response is binary (yes/no) or categorical (e.g. which product type), also called a “classification” problem Looking for correlations between the response and co-variates Predictions can be validated How many types of customers do we have? What are the characteristics of different types of shoppers? What is the probability that a company has a business model of type A (B, C, etc.)? (advanced) No response (outcome) variable Adding structure to the data Looking for correlations between co-variates Difficult to validate, need external evidence such as survey results
51
Non-Anticipatory You have a dataset of sales records of an electronics manufacturer (B2B) You aggregate the data so that the unit of analysis is the retailer (i.e. your customers are retailers) You propose to predict sales volume using frequency of different types of products (e.g. hard drives, smartphones, cables) Problem! In order to make predictions using your model, you will need to know the frequency distribution of products sold (inputs). But you don’t know these inputs until you have the sales transactions. So you don’t really have a prediction problem.
52
Appendix: Target model accuracy
Predicted Y N 6 4 10 14 76 90 20 80 100 Actual Positive Predictive Value = 6/20 = proportion of pregnancy predictions that are accurate Missed Opportunities = 4/10 = proportion of pregnancies that are not predicted
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.