July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

Slides:



Advertisements
Similar presentations
Point Estimation Notes of STAT 6205 by Dr. Fan.
Advertisements

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Chapter 6 Sampling and Sampling Distributions
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Statistical Estimation and Sampling Distributions
Sampling: Final and Initial Sample Size Determination
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
Maximum likelihood (ML) and likelihood ratio (LR) test
The Mean Square Error (MSE):. Now, Examples: 1) 2)
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Maximum likelihood (ML)
Chapter 7 Sampling and Sampling Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Maximum likelihood (ML) and likelihood ratio (LR) test
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Statistical inference
Presenting: Assaf Tzabari
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Inferences About Process Quality
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
BCOR 1020 Business Statistics
Maximum likelihood (ML)
Chapter 6: Sampling Distributions
Chapter 7 Estimation: Single Population
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Statistical Decision Theory
Model Inference and Averaging
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Sampling and estimation Petter Mostad
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Chapter 6 Sampling and Sampling Distributions
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Chapter 6: Sampling Distributions
Chapter 6: Sampling Distributions
Model Inference and Averaging
Statistics in Applied Science and Technology
Chapter 8 Estimation.
Presentation transcript:

July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Course details Course web: Course responsible, tutor and examiner: Anders Nordgaard Course period: Nov 2011-Jan 2012 Examination: Written exam in January 2012, Compulsory assignments Course literature: “Garthwaite PH, Jolliffe IT and Jones B (2002). Statistical Inference. 2nd ed. Oxford University Press, Oxford. ISBN ”

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Course contents Statistical inference in general Point estimation (unbiasedness, consistency, efficiency, sufficiency, completeness) Information and likelihood concepts Maximum-likelihood and Method-of-moment estimation Classical hypothesis testing (Power functions, the Neyman-Pearson lemma, Maximum Likelihood Ratio Tests, Wald’s test) Confidence intervals …

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Course contents, cont. Statistical decision theory (Loss functions, Risk concepts, Prior distributions, Sequential tests) Bayesian inference (Estimation, Hypothesis testing, Credible intervals, Predictive distributions) Non-parametric inference Computer intensive methods for estimation

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Details about teaching and examination Teaching is (as usual) sparse: A mixture between lectures and problem seminars Lectures: Overview and some details of each chapter covered. No full-cover of the contents! Problem seminars: Discussions about solutions to recommended exercises. Students should be prepared to provide solutions on the board! Towards the end of the course a couple of larger compulsory assignments (that need solutions to be worked out with the help of a computer) will be distributed. The course is finished by a written exam

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Prerequisities Good understanding of calculus an algebra Good understanding of the concepts of expectations (including variance calculations) Familiarity with families of probability distributions (Normal, Exponential, Binomial, Poisson, Gamma (Chi- square), Beta, …) Skills in computer programming (e.g. with R, SAS, Matlab,)

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Statistical inference in general Population Sample Model Conclusions about the population is drawn from the sample with assistance from a specified model

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden The two paradigms: Neyman-Pearson (frequentistic) and Bayesian Population Sample Model Neyman-Pearson: Model specifies the probability distribution for data obtained in a sample including a number of unknown population parameters Bayesian: Model specifies the probability distribution for data obtained in a sample and a probability distribution (prior) for each of the unknown population parameters of that distribution

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden How is inference made? Point estimation: Find the “best” approximations of an unknown population parameter Interval estimation: Find a range of values that with high certainty covers the unknown population parameter  Can be extended to regions if the parameter is multidimensional Hypothesis testing: Give statements about the population (values of parameters, probability distributions, issues of independence,…) along with a quantitative measure of “certainty”

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Tools for making inference Criteria for a point estimate to be “good” “Algorithmic” methods to find point estimates (Maximum Likelihood, Least Squares, Method-of-Moments) Classical methods of constructing hypothesis test (Neyman- Pearson lemma, Maximum Likelihood Ratio Test,…) Classical methods to construct confidence intervals (regions) Decision theory (make use of loss and risk functions, utility and cost) to find point estimates and hypothesis tests Using prior distributions to construct tests, credible intervals and predictive distributions (Bayesian inference)

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Tools for making inference… Using theory of randomization to form non-parametric tests (tests not depending on any probability distribution behind data) Computer intensive methods (bootstrap and cross-validation techniques) Advanced models from data that make use of auxiliary information (explanatory variables): Generalized linear models, Generalized additive models, Spatio-temporal models, …

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden The univariate population-sample model The population to be investigated is such that the values that comes out in a sample x 1, x 2, … are governed by a probability distribution The probability distribution is represented by a probability density (or mass) function f(x ) Alternatively, the sample values can be seen as the outcomes of independent random variables X 1, X 2, … all with probability density (or mass) function f(x )

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Point estimation (frequentistic paradigm) We have a sample x = (x 1, …, x n ) from a population The population contains an unknown parameter  The functional forms of the distributional functions may be known or unknown, but they depend on the unknown . Denote generally by f(x ;  ) the probability density or mass function of the distribution A point estimate of  is a function of the sample values such that its values should be close to the unknown .

The sample mean is a point estimate of the population mean  The sample variance s 2 is a point estimate of the population variance  2 The sample proportion p of a specific event (a specific value or range of values) is a point estimate of the corresponding population proportion  July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden “Standard” point estimates

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Assessing a point estimate A point estimate has a sampling distribution  Replace the sample observations x 1, …, x n with their corresponding random variables X 1, …, X n in the functional expression:   The point estimate is a random variable that is observed in the sample (point estimator)  As a random variable the point estimator must have a probability distribution than can be deduced from f (x ;  ) The point estimator /estimate is assessed by investigating the its sampling distribution, in particular the mean and the variance.

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Unbiasedness  A point estimator is unbiased for  if the mean of its sampling distribution is equal to   The bias of a point estimate for  is  Thus, a point estimate with bias = 0 is unbiased, otherwise it is biased

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Examples (within the univariate population-sample model) The sample mean is always unbiased for estimating the population mean Is the sample mean an unbiased estimate of the population median? Why do we divide by n–1 in the sample variance (and not by n )?

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Consistency A point estimator is (weakly) consistent if Thus, the point estimator should converge in probability to  Theorem: A point estimator is consistent if  Proof: Use Chebyshev’s inequality in terms of

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Examples The sample mean is a consistent estimator of the population mean. What probability law can be applied? What do we require for the sample variance to be a consistent estimator of the population variance?

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Efficiency Assume we have two unbiased estimators of , i.e. The efficiency of an unbiased estimator is defined as

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Example Let

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Likelihood function For a sample x  the likelihood function for  is defined as  the log-likelihood function is measure how likely (or expected) the sample is

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Fisher information The (Fisher) Information about  contained in a sample x is defined as Theorem: Under some regularity conditions (interchangeability of integration and differentiation) In particular the range of X cannot depend on  (such as in a population where X  U(0,  ) )

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Why is it measure of information for 

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Example X  Exp(  )

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Cramér-Rao inequality Under the same regularity conditions as for the previous theorem the following holds for any unbiased estimator  The lower bound is attained if and only if

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Proof:

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Example X  Exp(  )

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Sufficiency A function T of the sample values of a sample x, i.e. T = T(x)=T(x 1, …, x n ) is a statistic that is sufficient for the parameter  if the conditional distribution of the sample random variables does not depend on , i.e. What does it mean in practice?  If T is sufficient for  then no more information about  than what is contained in T can be obtained from the sample.  It is enough to work with T when deriving point estimates of 

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Example

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden The factorization theorem: T is sufficient for  if and only if the likelihood function can be written i.e. can be factorized using two non-negative functions such that the first depends on x only through the statistics T and also on  and the second does not depend on 

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Example, cont X  Exp(  )

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden