Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques CERAM February-March-April.

Slides:

Advertisements

Similar presentations

“Students” t-test.

Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition

Sampling: Final and Initial Sample Size Determination

Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)

Statistical Techniques I EXST7005 Lets go Power and Types of Errors.

EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.

Sampling Distributions

1 MF-852 Financial Econometrics Lecture 4 Probability Distributions and Intro. to Hypothesis Tests Roy J. Epstein Fall 2003.

1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.

Sample Size Determination

Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.

Inference about a Mean Part II

Experimental Evaluation

Inferences About Process Quality

BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.

Chapter 9 Hypothesis Testing.

5-3 Inference on the Means of Two Populations, Variances Unknown

Chapter 7 Probability and Samples: The Distribution of Sample Means

AM Recitation 2/10/11.

Hypothesis Testing:.

Overview Definition Hypothesis

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.

Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.

1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.

Chapter 11: Estimation Estimation Defined Confidence Levels

Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.

Estimation of Statistical Parameters

Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.

Random Sampling, Point Estimation and Maximum Likelihood.

6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.

Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques SKEMA Ph.D programme

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.

Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.

Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,

Large sample CI for μ Small sample CI for μ Large sample CI for p

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.

Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry

Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.

Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.

KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.

Introduction to Statistical Inference Jianan Hui 10/22/2014.

Ex St 801 Statistical Methods Inference about a Single Population Mean.

26134 Business Statistics Tutorial 11: Hypothesis Testing Introduction: Key concepts in this tutorial are listed below 1. Difference.

AP Statistics Section 11.1 B More on Significance Tests.

1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.

26134 Business Statistics Tutorial 12: REVISION THRESHOLD CONCEPT 5 (TH5): Theoretical foundation of statistical inference:

Class 3 Relationship Between Variables CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

© Copyright McGraw-Hill 2004

Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,

Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.

Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.

Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.

Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,

Hypothesis Tests for 1-Proportion Presentation 9.

6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.

4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.

CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.

Chapter Nine Hypothesis Testing.

Parameter Estimation.

Two-Sample Hypothesis Testing

Chapter 8: Inference for Proportions

Chapter 9 Hypothesis Testing.

Statistical inference

Statistical inference

Presentation transcript:

Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques CERAM February-March-April 2008

Hypothesis Testing

The Notion of Hypothesis in Statistics  Expectation  An hypothesis is a conjecture, an expected explanation of why a given phenomenon is occurring  Operational -ity  An hypothesis must be precise, univocal and quantifiable  Refutability  Le result of a given experiment must give rise to either the refutation or the corroboration of the tested hypothesis  Replicability  Exclude ad hoc, local arrangements from experiment, and seek universality

Examples of Good and Bad Hypotheses « The stakes Peugeot and Citroen have the same variance » « God exists! » « In general, the closure of a given production site in Europe is positively associated with the share price of a given company on financial markets. » « Knowledge has a positive impact on economic growth »    

Hypothesis Testing  In statistics, hypothesis testing aims at accepting or rejecting a hypothesis  The statistical hypothesis is called the “null hypothesis” H 0  The null hypothesis proposes something initially presumed true.  It is rejected only when it becomes evidently false, that is, when the researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis.  The alternative hypothesis (or research hypothesis) H 1 is the complement of H 0.

Hypothesis Testing  There are two kinds of hypothesis testing:  Homogeneity test compares the means of two samples.  H 0 : Mean( x ) = Mean( y ) ; Mean( x ) = 0  H 1 : Mean( x ) ≠ Mean( y ) ; Mean( x ) ≠ 0  Conformity test looks at whether the distribution of a given sample follows the properties of a distribution law (normal, Gaussian, Poisson, binomial).  H 0 : ℓ( x ) = ℓ*( x )  H 1 : ℓ( x ) ≠ ℓ*( x )

The Four Steps of Hypothesis Testing 1.Spelling out the null hypothesis H 0 et and the alternative hypothesis H 1. 2.Computation of a statistics corresponding to the distance between two sample means (homogeneity test) or between the sample and the distribution law (conformity test). 3.Computation of the (critical) probability to observe what one observes. 4.Conclusion of the test according to an agreed threshold around which one arbitrates between H 0 and H 1.

The Logic of Hypothesis Testing  We need to say something about the reliability (or representativeness) of a mean  Large number theory; Central limit theorem  The notion of confidence interval  Once done, we can whether two mean are alike  If so (not), their confidence intervals are (not) overlapping

Statistical Inference  In real life calculating parameters of populations is prohibitive because populations are very large.  Rather than investigating the whole population, we take a sample, calculate a statistic related to the parameter of interest, and make an inference.  The sampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter.

Prerequisite Standard Normal Distribution

Two Prerequisites  Large number theory  Large number theory tells us that the sample mean will converge to the population (true) mean as the sample size increases.  Central Limit Theorem  Central Limit Theorem tells us that for many samples of like and sufficiently large size, the histogram of these sample means will appear to be a normal distribution.

The Dice Experiment ValueP( X = x ) 11/

The Dice Experiment (n = 2)

/36 5/36 4/36 3/36 2/36 1/36

The Normal Distribution In probability, a random variable follows a normal distribution law (also called Gaussian, Laplace-Gauss distribution law) of expectation μ and standard deviation σ if its probability density function (pdf) is such that This law is written N (μ,σ ²). The density function of a normal distribution is symmetrical.

Normal Distributions For Different values of μ and σ

The standard normal distribution, also called Z distribution, represents a probability density function with mean μ = 0 and standard deviation σ = 1. It is written as N (0,1). All random variable following a normal law can be standardized via the following transformation The Standard Normal Distribution

68% of observations 95% of observations 99.7% of observations

The Standard Normal Distribution 95% of observations 2.5%

P(Z ≥ 0) P(Z < 0) The Standard Normal Distribution (z scores)

P(Z ≥ 0.51) Probability of an event (z = 0.51)

 The z-score is used to compute the probability of obtaining an observed score.  Example  Let z = What is the probability of observing z=0.51?  It is the probability of observing z ≥ 0.51: P(z ≥ 0.51) = ??

Standard Normal Distribution Table z

Probability of an event (Z = 0.51)  The Z-score is used to compute the probability of obtaining an observed score.  Example  Let z = What is the probability of observing z=0.51?  It is the probability of observing z ≥ 0.51: P(z ≥ 0.51)  P(z ≥ 0.51) =

Example  Suppose that for a population students of a famous business school in Sophia-Antipolis, grades are distributed normal with an average of 10 and a standard deviation of 3. What proportion of them  Exceeds 12 ; Exceeds 15  Does not exceed 8 ; Does not exceed 12  Let the mean μ = 10 and standard deviation σ = 3:

Confidence Interval

Inverting the way of thinking  Until now, we have thought in terms of observations x and sample values μ and σ to produce the z score.  Let us now imagine that we do not know x, we know μ and σ. If we consider any interval, we can write: ??

Inverting the way of thinking  If z ∈[-2.55;+2.55] we know that 99% of z-scores will fall within the range  If z ∈[-1.64;+1.64] we know that 90% of z-scores will fall within the range  Let us now consider an interval which comprises 95% of observations. Looking at the z table, we know that z=1.96

Confidence Interval  In statistics, a confidence interval is an interval within which the value of a parameter is likely to be (the mean). Instead of estimating the parameter by a single value, an interval of likely estimates is given.  Confidence intervals are used to indicate the reliability of an estimate.  A1. The sample mean is a random variable following a normal distribution  A2.The sample values μ and σ are good approximation of the population values.

 If a random sample is drawn from any population,  the sampling distribution of the sample mean is approximately normal for a sufficiently large sample size.  The larger the sample size, the more closely the sampling distribution of will resemble a normal distribution. The Central Limit Theorem

Moments of Sample Mean: The Mean On average, the sample mean will be on target, that is, equal to the population mean.

Moments of Sample Mean: The Variance The standard deviation of the sample means represents the estimation error of the sample mean, and therefore it is called the standard error.

The Sampling Distribution of the Sample Mean

General definition Definition for 95% CI Definition for 90% CI Confidence Interval

Standard Normal Distribution and CI 90% of observations 95% of observations 99.7% of observations

 Let us draw a sample of 25 students from CERAM (n = 25), with X = 10 and σ = 3. Let us build the 95% CI Application of Confidence Interval

CERAM Average grades 95% of chances that the mean is indeed located within this interval

 Let us draw a sample of 25 students from CERAM (n = 25), with X = 10 and σ = 3. Let us build the 95% CI Application of Confidence Interval  Let us draw a sample of 25 students from HEC (n = 30), with X = 11.5 and σ = 4.7. Let us build the 95% CI

HEC Average grades 95% of chances that the mean is indeed located within this interval

Hypothesis Testing  Hypothesis 1 : Students from CERAM have an average grade which is not significantly different from 11  H 0 : μ( CERAM ) = 11  H 1 : μ( CERAM ) ≠ 11  Hypothesis 2 : Students from CERAM have similar grades as students from HEC  H 0 : μ( CERAM ) = μ( HEC )  H 1 : μ( CERAM ) ≠ μ( HEC ) I Accept H 0 and reject H 1

Comparing the Means Using CI’s HEC CERAM The Overlap of the two CIs means that at 95% level, the two means do not differ significantly.

 Thus far, we have assumed that we know both the mean and the standard deviation of the population. But in fact, we do not know them: both μ and σ are unknown.  The Student t statistics is then preferred to the z statistics. Its distribution is similar (identical to z as n → +∞). The CI becomes The Student Test

 Let us draw a sample of 25 students from CERAM (n = 25), with μ = 10 and σ = 3. Let us build the 95% CI Application of Student t to CI’s  Let us draw a sample of 25 students from HEC (n = 30), with μ = 11.5 and σ = 4.7. Let us build the 95% CI

 Import CERAM_LMC into SPSS  Produce descriptive statistics for sales; labour, and R&D expenses  Analyse  Statistiques descriptives  Descriptive  Options: choose the statistics you may wish  A newspaper writes that by and large, LMCs have 95,000 employees.  Test statistically whether this is true at 1% level  Test statistically whether this is true at 5% level  Test statistically whether this is true at 10% and 20% level  Write out H 0 and H 1  Analyse  Comparer les moyennes  Test t pour é chantillon unique  Options: 99; 95, 90% SPSS Application: Student t

SPSS Application: t test at 99% level

SPSS Application: t test at 95% level

SPSS Application: t test at 80% level

SPSS Results (at 1% level)

Critical probability  The confidence interval is designed in such a way that for each t statistics chosen, we define a share of observations which this CI is comprising.  For large n, when t = 1.96, we have 95% CI  For large n, when t = 2.55, we have 99% CI  Actually, for each t, there corresponds a share of observations  One can compute directly the t value from our observations as follows:

Critical probability  The confidence interval is designed in such a way that for each t statistics chosen, we define a share of observations which this CI is comprising.  For large n, when t = 1.96, we have 95% CI  For large n, when t = 2.55, we have 99% CI  Actually, for each t, there corresponds a share of observation   One can compute directly the t value from our observations as follows:

Critical probability  With t = 1.552, I can conclude the following:  12% probability that μ belongs to the distribution where the population mean = 95,000  I have 12% chances to wrongly reject H 0  88% probability that μ belongs to another distribution where the population mean ≠ 95,000  I have 88% chances to rightly reject H 0 Shall I the accept or reject H0?

6.1% 88.0%

Critical probability  With t = 1.552, I can conclude the following:  12% probability that μ belongs to the distribution where the population mean = 95,000  I have 12% chances to wrongly reject H 0  88% probability that μ belongs to another distribution where the population mean ≠ 95,000  I have 88% chances to rightly reject H 0 I accept H 0 !!!

Critical probability  The practice is to reject H 0 only when the critical probability is lower than 0.1, or 10%  Some are even more cautious and prefer to reject H 0 at a critical probability level of 0.05, or 5%.  In any case, the philosophy of the statistician is to be conservative.

A Direct Comparison of Means Using Student t  Another way to compare two sample means is to calculate the CI of the mean difference. If 0 does not belong to CI, then the two sample have significantly different means. Standard error, also called pooled variance

 Another newspaper argues that US companies are much larger than those from the rest of the world. Is this true?  Produce descriptive statistics labour comparing the two groups  Produce a group variables which equals 1 for US firms, 0 otherwise  This is called a dummy variable  Write out H 0 and H 1  Analyse  Comparer les moyennes  Test t pour é chantillon ind é pendants  What do you conclude at 5% level?  What do you conclude at 1% level? SPSS Application: t test comparing means