Presentation on theme: "Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution."— Presentation transcript:
week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The statistical model corresponds to the information a statistician brings to the application about what the true distribution is or at least what he or she is willing to assume about it. The variable θ is called the parameter of the model, and the set Ω is called the parameter space. From the definition of a statistical model, we see that there is a unique value, such that f θ is the true distribution that generated the data. We refer to this value as the true parameter value.
week 12 Examples Suppose there are two manufacturing plants for machines. It is known that the life lengths of machines built by the first plant have an Exponential(1) distribution, while machines manufactured by the second plant have life lengths distributed Exponential(1.5). You have purchased five of these machines and you know that all five came from the same plant but do not know which plant. Further, you observe the life lengths of these machines, obtaining a sample (x1, …, x5) and want to make inference about the true distribution of the life lengths of these machines. Suppose we have observations of heights in cm of individuals in a population and we feel that it is reasonable to assume that the distribution of height is the population is normal with some unknown mean and variance. The statistical model in this case is where Ω = R×R +, where R + = (0, ∞).
week 13 Bayesian Approach to Inference The basic underling principle is that, to be agreeable to analysis, all uncertainties need to be described by probabilities. Bayesian statistical inferences about a parameter of interest θ, are made in terms of probability statements. Therefore, the prescription of a statistical model alone as defined above is incomplete, since it does not tell us how to make probability statements about the unknown true value of θ. In the Bayesian approach to inference, in addition to specifying the model, the researcher prescribes a probability distribution of θ. This lead to a full probability model which is the joint probability distribution for all observable (data) and unobservable (parameter) quantities in a problem. The model should be consistent with knowledge about the underlying scientific problem and the data collection process.
week 14 Important Note In the Bayesian framework we distinguish between two kinds of estimands – unobserved quantities for which statistical inferences are made. First, quantities that are not directly observed, that is, parameters that govern the hypothetical process leading to the observed data. For example, mean height in the example above. Second, potentially observed quantities, such as future observation of a process. For example, life length of machines produced by another plant.
week 15 Why Bayesian Approach? Many statisticians prefer to develop statistical theory without the additional ingredients necessary for a full probability description of the unknown. This is motivated by the desire to avoid the prescription of the additional model ingredients necessary for the Bayesian formulation. In general, we would prefer to have a statistical analysis that is based on fewest and weakest model assumptions as possible, e.g. distribution free methods. However, there is a price for this weakening which is typically manifest in ambiguities about how inference should proceed. The Bayesian formulation in essence removes the ambiguity, but at the price of more involved model.
week 16 Bayesian Versus Frequentist Methods Frequentist methods are based on repeated sampling properties, e.g. confidence intervals as discussed in STA261. The Bayesian approach to inference is sometimes presented as antagonistic to frequentist methods that are based on repeated sampling properties. However, Bayesian model arises naturally from the statistician assuming more ingredients for the model. Therefore, it is up to the statistician to decide what ingredients can be justified and then use appropriate methods. Nevertheless, we must wary about all model assumptions, as when they are inappropriate, our inferences may be invalid. We will discuss model checking procedures later on.
week 17 The Prior Distribution The Bayesian model for inference contains the statistical model for the data and adds to this the prior probability measure П for θ. The prior describes the statistician’s beliefs about the true value of the parameter θ a priori, i.e., before observing the data. Note that the statistical model, is a set of conditional distributions for the data given θ. Example: suppose the parameter of interest θ is the probability of getting a head on the toss of a coin. The parameter space Ω = [0,1] then…
week 18 Important Comments The probabilities prescribed by the prior represent beliefs. Where do these beliefs come from in an application? Sometimes they come from previous experience with the random system under investigation or perhaps with related systems. However, this is rarely the case in reality. In fact, the prior as well as the statistical model, is often somewhat arbitrary construction used to drive the statistician’s investigations. This may put in question the relevance of the inferences derived to the practical context, if the model ingredients suffer from this arbitrariness. This is where the concept of model checking comes into play. From now on, we assume that all the ingredients make sense, but remember that in an application these must be checked, if the inferences taken are to be practically meaningful.
week 19 The Prior Predictive Distribution The ingredients of the Bayesian formulation for inference include a marginal distribution for θ, namely, the prior П, and a set of conditional distributions for the data s given θ. By the law of total probability, these ingredients specify a joint distribution for θ and s, which is given by The marginal distribution for the data s is given by if the prior distribution is absolutely continuous or if the prior distribution is discrete.
week 110 This distribution is referred to as the prior predictive distribution of the data. The prior predictive distribution is the relevant distribution for making probability statements about the unknown data if we did not observe any data. Similarly, the prior distribution is the relevant distribution to use in making probability statements about θ.
week 111 The Posterior Distribution Recall, the principle of conditional probability tells us that P(A) can be replaced by P(A|C) after we are told that C is true. Similarly, after observing the data, the relevant distribution to use in making probability statements about θ is the conditional distribution of θ given the data s. This conditional probability measure is denoted by П(∙|s). It has a density or probability function (whichever is relevant) given by i.e., the joint density of θ and s divided by the marginal density of s. This conditional distribution is called the posterior distribution of θ.
week 112 Important Comments The use of the posterior distribution is sometimes referred to as an application of Bayes’ rule. Note, the choice to use the posterior distribution for probability statements about θ is an axiom, or principle and not a theorem. Note that the prior predictive of the data s is referred to as the inverse normalizing constant for the posterior density. This means that the posterior density is proportional to as a function of θ, and to convert it into a proper density function we only need to divide by m(s). In many examples we do not need to compute the inverse normalizing constant as we can recognize the functional form, as a function of θ, of the posterior from the expression and so immediately deduce the posterior probability distribution of θ.