Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Further advanced methods Chapter 17.

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 2 Data mining Data mining is “the exploration of a large set of data with the aim of uncovering relationships between variables” (Oxford Dictionary of Statistics) Also known as Knowledge Discovery in Databases (KDD) Making extensive use of information technology, through the automation of data analysis procedures

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 3 Statistics and data mining Statistics is also exploited, but it is adapted to deal with (very) large data sets Statistical approaches are those who valorize computer intensive methods Data mining merges statistics with other disciplines: Computer science Machine learning Artificial intelligence Database technology Pattern recognition

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 4 Data warehousing The common denominator among the techniques is always the use of very large databases These databases are the outcome of data warehousing, which Organizes all of the data available to a company into a common format allows integration of different data types Allows analysis through data mining The organization of company information in data warehouses requires recognition of linkages of data which relate to the same objects the time dimension (to monitor changes)

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 5 Marketing applications A typical application is market basket analysis customer purchasing patterns are discovered by looking at the databases of transactions in one or more stores of the same chain (e.g. through loyalty cards) the contents of the trolley are analyzed to detect repeated purchases and brand switching behaviors

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 6 Problems with data mining Data mining is a complex and automated process, which faces many risks: Data-sets may be contaminated (affected by error) Data may be affected by selection biases and non-independent observations Automated data analysis could find spurious relationships (as in spurious regression)

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 7 Steps for successful data mining 1.data warehousing 2.target data selection 3.data cleaning 4.preprocessing 5.transformation and reduction 6.data mining 7.model selection (or combination) 8.evaluation and interpretation 9.consolidation and use of the extracted knowledge

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 8 Frequentist vs. Bayesian statistics – the Frequentist paradigm Assumption: true and fixed population parameters exist albeit unknown Statistics can exploit sampling to estimate these unknown parameters Observations are associated with probabilities: the probability of a given outcome for a random event can be proxied by the frequency of that outcome The larger is the sample the closer is the estimated probability to the true probability Example: a linear regression model tries to estimate the true coefficients which link the explanatory variables and the dependent variable using a sample of observations A key concept of the frequentist approach is the confidence interval where a range of values contains the true and fixed value with a confidence level The confidence level is nothing more than the frequency with which an interval contains the true and fixed value considering different random samples.

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 9 The Bayesian approach The unknown parameters in the population are not fixed, but treated as a random variable with their own probability distribution One is allowed to exploit knowledge or beliefs about the shape of the probability distribution which existed prior to estimation Once data are collected, Bayesian methods exploit this information to update this and the final outcome is a posterior distribution which depends on the data and the prior knowledge

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 10 Bayes rule The estimation of the posterior distribution opens the way to Bayesian statistical operations and is based on the Bayes rule which relates the probability of the outcomes of two random events in the following way P(A|B) is the probability of the first random event to generate the outcome A when the second random event has generated the outcome B, thus it is the probability of A conditional on B P(A,B) is the joint probability that both events A and B happen P(B) is the unconditional probability of the event B The Bayes theorem shows that P(A,B) can be also expressed as the product P(B|A)P(A), that is the product between the probability that the event B happens conditional on the outcome A and the probability of the event A

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 11 Bayes estimation To understand the use of the Bayes rule the two random events could be –the value of unknown parameter (A), which in Bayesian statistics is determined by a random variable –the available data (B) which is also the outcome of a random variable since it was obtained through sampling The Bayes theorem says that the probability to obtain the parameter estimate A given the observed sample B (the posterior probability) can be computed through the Bayes rule as a function of the probability of observing sample B when the parameter estimate is A and the unconditional probabilities of the parameter estimate A The unconditional probability of the parameter estimate A is the prior probability

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 12 Use of the Bayes rule The Bayes rule is very helpful when it is easier to estimate P(B|A) than P(A|B) If the probability of having the sample B conditional on the unknown parameter A can be computed, and some prior information on the probability of the parameter A is available the unconditional probability of the sample B is known then it becomes possible to find the probability distribution of the parameter A conditional on the data which is the final objective of estimation

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 13 Unconditional probability The denominator of the Bayes rule can be rewritten as: which means that the unconditional probability of the sample B can be seen as the sum of probabilities of the sample B conditional on all of the possible estimates A j weighted by the probability of each estimate A j

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 14 Estimation Two elements have to be considered 1)P(B|A) is the likelihood function of A, that is the probability of a given set of observations depending on a set of parameters and its generally known (frequentists use it in maximum likelihood methods as well) 2)the denominator of the Bayes rule is a constant and it is generally not necessary to estimate it so that estimation can be based on the following result Where the sign which substitutes the equal sign means that the left-hand side is proportional to the right-hand side

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 15 Example Estimation of a single regression coefficient in a bivariate regression Caviar expenditure (c) as a function of income (i) Data come from a random sample which generates a set of observations included in the vectors (c) (for simplicity consider (i) as the observations of a fixed exogenous variable). The equation is c =  i

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 16 Frequentist estimation of the regression coefficient Start from some assumptions on the probability distribution of the data and the error term –E.g. normal distribution Get point estimates that are the most likely given the observed sample –E.g. maximum likelihood estimates Since the sample is random, confidence intervals can be built for the coefficient estimate

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 17 Bayesian estimation Start with the assumption that caviar expenditure follows a Normal probability distribution (the prior distribution) around its mean, which is equal to Second, assume a given standard deviation for this Normal distribution, e.g. the standard deviation of caviar expenditure is 0.02 Consider the value  =0.05 If the prior distribution holds, we should have that c is normally distributed around 0.05i. Now it becomes necessary to evaluate the probability to get the observed sample c given that  =0.05 Generate c* by multiplying i by 0.05

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 18 Bayesian estimation Considering that c is a random sample from a normal distribution, one can get the likelihood of c* conditional on  =0.05 using the known likelihood function The unconditional (prior) probability that  =0.05 is also known, given that we have assumed that the distribution is normal, with a 0.05 mean and a standard deviation of 0.02 –It means that the probability of  =0.05 is about 20% With a computer and given the prior distribution of , one can compute the unconditional probabilities for all possible values of  and the probabilities of all possible values of c* Using a slightly different notation of the Bayes rule which defines L(  |c) as the likelihood function of the sample c: Where the left-hand side is the (unknown) posterior probability of .

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 19 Posterior distribution As mentioned, for any fixed value of  it is possible to compute –the likelihood function –the unconditional probability using the prior Suppose that for  =0.05 the likelihood of observing the collected data set is 10%. Then, one may compute The above result does not mean that the probability is 2%, since there is a proportionality relationship (not an equality one) However, repeating the experiment for the whole range of values for  allows one to compute the probability distribution for b conditional on the observed sample (the posterior distribution) This ultimately allows one to determine the most likely estimate for . This estimate will be different from 0.05 unless we had an excellent prior.

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 20 Final output The posterior distribution might also differ from the normal distribution (although not in this case) From the posterior distribution it is possible to compute the percentiles (see appendix); thus a 95% Bayesian confidence interval can be obtained by considering the values of  corresponding to the 2.5th percentile and the 97.5th one from the posterior distribution The final result depends on the quality of the prior However, Bayesian statistics have extended the above founding concepts very much and there are many ways to relax the relevance of the prior assumption and check for their robustness For example, there are non-informative priors which do not assume particular knowledge of the parameters as they are uniformly distributed around the maximum range of possible values

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 21 Why Bayesian statistics are becoming so popular One of the reasons for the Bayesian statistics comeback in the 21st century is the fact that the Bayes rule can be applied iteratively This means that the prior distribution can be updated The progress in automated computing power has led to excellent results in estimating complex models through Bayesian methods For example, modern Bayesian methods exploit the posterior distribution to generate a larger number of draws from which estimates are actually computed Bayesian statistics and marketing In a recent article, Rossi and Allenby (2003) have explored the major role that Bayesian methods can play in marketing and include a long and annotated list hypothesis testing with scanner data extensions of conjoint analysis Bayesian multidimensional scaling the multinomial probit many other Bayesian alternatives to frequent multivariate statistics

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Further advanced methods Chapter 17.

Similar presentations

Presentation on theme: "Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Further advanced methods Chapter 17."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Further advanced methods Chapter 17.

Similar presentations

Presentation on theme: "Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Further advanced methods Chapter 17."— Presentation transcript:

Similar presentations

About project

Feedback