Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Sampling Distributions (§ )
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
8 Statistical Intervals for a Single Sample CHAPTER OUTLINE
I The meaning of chance Axiomatization. E Plurbus Unum.
Sampling Distributions
7-2 Estimating a Population Proportion
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
Inferential Statistics
Statistical Decision Theory
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
PARAMETRIC STATISTICAL INFERENCE
Theory of Probability Statistics for Business and Economics.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Let X represent a Binomial r.v as in (3-42). Then from (2-30) Since the binomial coefficient grows quite rapidly with n, it is difficult to compute (4-1)
Bayesian statistics Probabilities for everything.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Making sense of randomness
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Confidence Interval Estimation For statistical inference in decision making:
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Inference: Probabilities and Distributions Feb , 2012.
Confidence Interval Estimation For statistical inference in decision making: Chapter 9.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
President UniversityErwin SitompulPBST 10/1 Lecture 10 Probability and Statistics Dr.-Ing. Erwin Sitompul President University
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
+ Unit 5: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Estimation and Confidence Intervals
One-Sample Tests of Hypothesis
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Appendix A: Probability Theory
Bayes Net Learning: Bayesian Approaches
CONCEPTS OF ESTIMATION
More about Posterior Distributions
Bayesian Inference, Basics
Statistical NLP: Lecture 4
Bayes for Beginners Luca Chech and Jolanda Malamud
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Sampling Distributions (§ )
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Mathematical Foundations of BME Reza Shadmehr
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Applied Statistics and Probability for Engineers
Presentation transcript:

week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The statistical model corresponds to the information a statistician brings to the application about what the true distribution is or at least what he or she is willing to assume about it. The variable θ is called the parameter of the model, and the set Ω is called the parameter space. From the definition of a statistical model, we see that there is a unique value, such that f θ is the true distribution that generated the data. We refer to this value as the true parameter value.

week 12 Examples Suppose there are two manufacturing plants for machines. It is known that the life lengths of machines built by the first plant have an Exponential(1) distribution, while machines manufactured by the second plant have life lengths distributed Exponential(1.5). You have purchased five of these machines and you know that all five came from the same plant but do not know which plant. Further, you observe the life lengths of these machines, obtaining a sample (x1, …, x5) and want to make inference about the true distribution of the life lengths of these machines. Suppose we have observations of heights in cm of individuals in a population and we feel that it is reasonable to assume that the distribution of height is the population is normal with some unknown mean and variance. The statistical model in this case is where Ω = R×R +, where R + = (0, ∞).

week 13 Bayesian Approach to Inference The basic underling principle is that, to be agreeable to analysis, all uncertainties need to be described by probabilities. Bayesian statistical inferences about a parameter of interest θ, are made in terms of probability statements. Therefore, the prescription of a statistical model alone as defined above is incomplete, since it does not tell us how to make probability statements about the unknown true value of θ. In the Bayesian approach to inference, in addition to specifying the model, the researcher prescribes a probability distribution of θ. This lead to a full probability model which is the joint probability distribution for all observable (data) and unobservable (parameter) quantities in a problem. The model should be consistent with knowledge about the underlying scientific problem and the data collection process.

week 14 Important Note In the Bayesian framework we distinguish between two kinds of estimands – unobserved quantities for which statistical inferences are made. First, quantities that are not directly observed, that is, parameters that govern the hypothetical process leading to the observed data. For example, mean height in the example above. Second, potentially observed quantities, such as future observation of a process. For example, life length of machines produced by another plant.

week 15 Why Bayesian Approach? Many statisticians prefer to develop statistical theory without the additional ingredients necessary for a full probability description of the unknown. This is motivated by the desire to avoid the prescription of the additional model ingredients necessary for the Bayesian formulation. In general, we would prefer to have a statistical analysis that is based on fewest and weakest model assumptions as possible, e.g. distribution free methods. However, there is a price for this weakening which is typically manifest in ambiguities about how inference should proceed. The Bayesian formulation in essence removes the ambiguity, but at the price of more involved model.

week 16 Bayesian Versus Frequentist Methods Frequentist methods are based on repeated sampling properties, e.g. confidence intervals as discussed in STA261. The Bayesian approach to inference is sometimes presented as antagonistic to frequentist methods that are based on repeated sampling properties. However, Bayesian model arises naturally from the statistician assuming more ingredients for the model. Therefore, it is up to the statistician to decide what ingredients can be justified and then use appropriate methods. Nevertheless, we must wary about all model assumptions, as when they are inappropriate, our inferences may be invalid. We will discuss model checking procedures later on.

week 17 The Prior Distribution The Bayesian model for inference contains the statistical model for the data and adds to this the prior probability measure П for θ. The prior describes the statistician’s beliefs about the true value of the parameter θ a priori, i.e., before observing the data. Note that the statistical model, is a set of conditional distributions for the data given θ. Example: suppose the parameter of interest θ is the probability of getting a head on the toss of a coin. The parameter space Ω = [0,1] then…

week 18 Important Comments The probabilities prescribed by the prior represent beliefs. Where do these beliefs come from in an application? Sometimes they come from previous experience with the random system under investigation or perhaps with related systems. However, this is rarely the case in reality. In fact, the prior as well as the statistical model, is often somewhat arbitrary construction used to drive the statistician’s investigations. This may put in question the relevance of the inferences derived to the practical context, if the model ingredients suffer from this arbitrariness. This is where the concept of model checking comes into play. From now on, we assume that all the ingredients make sense, but remember that in an application these must be checked, if the inferences taken are to be practically meaningful.

week 19 The Prior Predictive Distribution The ingredients of the Bayesian formulation for inference include a marginal distribution for θ, namely, the prior П, and a set of conditional distributions for the data s given θ. By the law of total probability, these ingredients specify a joint distribution for θ and s, which is given by The marginal distribution for the data s is given by if the prior distribution is absolutely continuous or if the prior distribution is discrete.

week 110 This distribution is referred to as the prior predictive distribution of the data. The prior predictive distribution is the relevant distribution for making probability statements about the unknown data if we did not observe any data. Similarly, the prior distribution is the relevant distribution to use in making probability statements about θ.

week 111 The Posterior Distribution Recall, the principle of conditional probability tells us that P(A) can be replaced by P(A|C) after we are told that C is true. Similarly, after observing the data, the relevant distribution to use in making probability statements about θ is the conditional distribution of θ given the data s. This conditional probability measure is denoted by П(∙|s). It has a density or probability function (whichever is relevant) given by i.e., the joint density of θ and s divided by the marginal density of s. This conditional distribution is called the posterior distribution of θ.

week 112 Important Comments The use of the posterior distribution is sometimes referred to as an application of Bayes’ rule. Note, the choice to use the posterior distribution for probability statements about θ is an axiom, or principle and not a theorem. Note that the prior predictive of the data s is referred to as the inverse normalizing constant for the posterior density. This means that the posterior density is proportional to as a function of θ, and to convert it into a proper density function we only need to divide by m(s). In many examples we do not need to compute the inverse normalizing constant as we can recognize the functional form, as a function of θ, of the posterior from the expression and so immediately deduce the posterior probability distribution of θ.

week 113 Examples