Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques SKEMA Ph.D programme 2010-2011.

Similar presentations


Presentation on theme: "Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques SKEMA Ph.D programme 2010-2011."— Presentation transcript:

1 Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr SKEMA Ph.D programme 2010-2011

2 Hypothesis Testing

3 The Notion of Hypothesis in Statistics  Expectation  An hypothesis is a conjecture, an expected explanation of why a given phenomenon is occurring  Operational -ity  An hypothesis must be precise, univocal and quantifiable  Refutability  The result of a given experiment must give rise to either the refutation or the corroboration of the tested hypothesis  Replicability  Exclude ad hoc, local arrangements from experiment, and seek universality

4 Examples of Good and Bad Hypotheses « The stakes Peugeot and Citroen have the same variance » « God exists! » « In general, the closure of a given production site in Europe is positively associated with the share price of a given company on financial markets. » « Knowledge has a positive impact on economic growth »    

5 Hypothesis Testing  In statistics, hypothesis testing aims at accepting or rejecting a hypothesis  The statistical hypothesis is called the “null hypothesis” H 0  The null hypothesis proposes something initially presumed true.  It is rejected only when it becomes evidently false, that is, when the researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis.  The alternative hypothesis (or research hypothesis) H 1 is the complement of H 0.

6 Hypothesis Testing  There are two kinds of hypothesis testing:  Homogeneity test compares the means of two samples.  H 0 : Mean( x ) = Mean( y ) ; Mean( x ) = 0  H 1 : Mean( x ) ≠ Mean( y ) ; Mean( x ) ≠ 0  Conformity test looks at whether the distribution of a given sample follows the properties of a distribution law (normal, Gaussian, Poisson, binomial).  H 0 : ℓ( x ) = ℓ*( x )  H 1 : ℓ( x ) ≠ ℓ*( x )

7 The Four Steps of Hypothesis Testing 1.Spelling out the null hypothesis H 0 et and the alternative hypothesis H 1. 2.Computation of a statistics corresponding to the distance between two sample means (homogeneity test) or between the sample and the distribution law (conformity test). 3.Computation of the (critical) probability to observe what one observes. 4.Conclusion of the test according to an agreed threshold around which one arbitrates between H 0 and H 1.

8 The Logic of Hypothesis Testing  We need to say something about the reliability (or representativeness) of a sample mean  Large number theory; Central limit theorem  The notion of confidence interval  Once done, we can whether two mean are alike  If so (not), their confidence intervals are (not) overlapping

9 Statistical Inference  In real life calculating parameters of populations is prohibitive because populations are very large.  Rather than investigating the whole population, we take a sample, calculate a statistic related to the parameter of interest, and make an inference.  The sampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter.

10 Prerequisite 1 Standard Normal Distribution

11 The standard normal distribution, also called Z distribution, represents a probability density function with mean μ = 0 and standard deviation σ = 1. It is written as N (0,1). The Standard Normal Distribution

12 Since the standard deviation is by definition 1, each unit on the horizontal axis represents one standard deviation

13 Because of the shape of the Z distribution (symmetrical), statisticians have computed the probability of occurrence of events for given values of z. The Standard Normal Distribution

14 68% of observations 95% of observations 99.7% of observations

15 The Standard Normal Distribution 95% of observations 2.5%

16 P(Z ≥ 0) P(Z < 0) The Standard Normal Distribution (z scores)

17 P(Z ≥ 0.51) Probability of an event (z = 0.51)

18  The z-score is used to compute the probability of obtaining an observed score.  Example  Let z = 0.51. What is the probability of observing z=0.51?  It is the probability of observing z ≥ 0.51: P(z ≥ 0.51) = ??

19 Standard Normal Distribution Table z0.000.010.020.030.040.050.060.070.080.09 0.00.5000.4960.4920.4880.4840.4800.4760.4720.4680.464 0.10.4600.4560.4520.4480.4440.4400.4360.4330.4290.425 0.20.4210.4170.4130.4090.4050.4010.3970.3940.3900.386 0.30.3820.3780.3750.3710.3670.3630.3590.3560.3520.348 0.40.3450.3410.3370.3340.3300.3260.3230.3190.3160.312 0.50.3090.3050.3020.2980.2950.2910.2880.2840.2810.278 0.60.2740.2710.2680.2640.2610.2580.2550.2510.2480.245 0.70.2420.2390.2360.2330.2300.2270.2240.2210.2180.215 0.80.2120.2090.2060.2030.2010.1980.1950.1920.1890.187 0.90.1840.1810.1790.1760.1740.1710.1690.1660.1640.161 1.00.1590.1560.1540.1520.1490.1470.1450.1420.1400.138 1.60.0550.0540.0530.0520.050 0.0490.0480.0470.046 1.90.0290.0280.027 0.026 0.0250.024 0.023 2.00.0230.022 0.021 0.020 0.019 0.018 2.50.006 0.005 2.90.002 0.001

20 Probability of an event (Z = 0.51)  The Z-score is used to compute the probability of obtaining an observed score.  Example  Let z = 0.51. What is the probability of observing z=0.51?  It is the probability of observing z ≥ 0.51: P(z ≥ 0.51)  P(z ≥ 0.51) = 0.3050

21 Prerequisite 2 Normal Distribution

22 Normal distributions are just like standard normal distributions (or z distributions) with different values for the mean μ and standard deviation σ. This law is written N (μ,σ ²). The normal distribution is symmetrical. Normal Distributions

23 The Normal Distribution In probability, a random variable follows a normal distribution law (also called Gaussian, Laplace-Gauss distribution law) of mean μ and standard deviation σ if its probability density function is such that This law is written N (μ,σ ²). The density function of a normal distribution is symmetrical.

24 Normal distributions for different values of μ and σ

25 Still, it would be nice to be able to say something about these distributions just like we did with the z distribution. For example, textile companies (and clothes manufacturers) may be very interested in the distribution of heights of men and women, for a given country (provided that we have all observations). How could we compute the proportion of men taller than 1.80 meters? Standardization of Normal Distributions

26 Assuming that the heights of men is distributed normal, is there any way we could express it in terms of a z distribution? 1.We must center the distribution around 0. We must express any value in terms of deviation around the mean : (X – μ) 2.We must express (or reduce) each deviation in terms of number of standard deviation σ. (X – μ) / σ Standardization of Normal Distributions

27 Standardization of a normal distribution is the operation of recovering a z distribution from any other distribution, assuming the distribution is normal. It is achieved by centering (around the mean) and reducing (in terms of number of standard deviations) each observation. The obtained z value expresses each observation by its distance from the mean, in terms of number of standard deviations. Standardization of Normal Distributions

28 Example  Suppose that for a population of students of a famous business school in Sophia-Antipolis, grades are distributed normal with an average of 10 and a standard deviation of 3. What proportion of them  Exceeds 12 ; Exceeds 15  Does not exceed 8 ; Does not exceed 12  Let the mean μ = 10 and standard deviation σ = 3:

29 Implication 1 Intervals of likely values

30 Inverting the way of thinking  Until now, we have thought in terms of observations x and mean μ and standard deviation σ to produce the z score.  Let us now imagine that we do not know x, we know μ and σ. If we consider any interval, we can write:

31 Inverting the way of thinking  If z ∈[-2.55;+2.55] we know that 99% of z-scores will fall within the range  If z ∈[-1.64;+1.64] we know that 90% of z-scores will fall within the range  Let us now consider an interval which comprises 95% of observations. Looking at the z table, we know that z=1.96

32 Example  Take the population of students of this famous business school in Sophia-Antipolis, with average of 10 and a standard deviation of 3. What is the 99% interval ? 95% interval? 90% interval?

33 Prerequisite 3 Sampling theory

34 The social scientist is not so much interested in the characteristics of the sample itself. Most of the time, the social scientist wants to say something about the population itself looking at the sample. In other words, s/he wants to infer something about the population from the sample. Why worrying about sampling theory?

35 On the use of random samples The quality of the sample is key to statistical inference. The most important thing is that the sample must be representative of the characteristics of the population. The means by which representativeness can be achieved is by drawing random samples, where each individual observation have equal probability to be drawn. Because we would be inferring wrong conclusions from biased samples, the latter are worse than no sample at all.

36 Use of random samples The quality of the sample is key to statistical inference. The most important thing is that the sample must be representative of the characteristics of the population. The means by which representativeness can be achieved is by drawing random samples, where each individual observation have equal probability to be drawn. Hence observations are mutually independent. Because we would be inferring wrong conclusions from biased samples, the latter are worse than no sample.

37 Reliability of random samples The ultimate objective with the use of random samples is to infer something about the underlying population. Ideally, we want the sample mean to be as close as possible to the population mean μ. In other words, we are interested in the reliability of the sample. The are two ways to deal with reliability: 1.Monte Carlo Simulation (infinite number of samples) 2.Sampling theory (moments of a distribution)

38 Our goal is to estimate the population mean μ from the sample mean. How is the sample mean a good estimator of the population mean ? Reminder : the sample mean is computed as follows. The trick is to consider each observations as a random variable, in line with the idea of a random sample. Moment 1 – The Mean

39 On average, the sample mean will be on target, that is, equal to the population mean. What is the expected value of X i – E(X i ) – if I draw it an infinite number of times ? Obviously if samples are random, then the expected value of X i is μ. … working out the math…

40 Moment 2 – The Variance The standard deviation of the sample means represents the estimation error of approximation of the population mean by the sample mean, and therefore it is called the standard error. Doing just the same with the variance. We simply need to know that if two variables are independent, then the following holds:

41 Forms of sampling distributions With random samples, sample means X vary around the population mean μ with a standard deviation of σ/√n (the standard error).  Large number theory tells us that the sample mean will converge to the population (true) mean as the sample size increases. But what about the shape of the distribution, essential if we want to use z-scores?! The shape of the distribution will be normally distributed, regardless of the form of the underlying distribution of the population, provided that the sample size is large enough.  Central Limit Theorem tells us that for many samples of like and sufficiently large size, the histogram of these sample means will appear to be a normal distribution.

42 The Dice Experiment ValueP( X = x ) 11/6 2 3 4 5 6

43 The Dice Experiment (n = 2)

44 1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6/36 5/36 4/36 3/36 2/36 1/36

45 From SKEMA sample grade distribution…

46 …to SKEMA sample mean distribution

47 From SKEMA sample grade distribution… …to SKEMA sample mean distribution

48 Note the change in horizontal axis !!

49 Implication 2 Confidence Interval

50 Confidence Interval  In statistics, a confidence interval is an interval within which the value of a parameter is likely to be (the unknown population mean). Instead of estimating the parameter by a single value, an interval of likely estimates is given.  Confidence intervals are used to indicate the reliability of an estimate.  Reminder 1. The sample mean is a random variable following a normal distribution  Reminder 1. The sample values X and σ s can be used to approximate the population mean μ and its s.d. on σ p.

51 Remember intervals! Fully known sample mean Sample standard deviation Unknown value

52 Confidence Interval Sample mean used as a guess for population mean Standard error as a guess for standard deviation of errors Unknown value : population mean

53 General definition Definition for 95% CI Definition for 90% CI Confidence Interval

54 Standard Normal Distribution and CI 90% of observations 95% of observations 99.7% of observations

55  Let us draw a sample of 25 students from SKEMA (n = 25), with X = 10 and σ = 3. What can we say about the likely values of the population mean μ? Let us build the 95% CI Application of Confidence Interval

56 SKEMA Average grades

57 SKEMA sample mean distribution 95% of chances that the population mean is indeed located within this interval 8.8 11.2

58  Let us draw a sample of 25 students from SKEMA (n = 25), with X = 10 and σ = 3. What can we say about the likely values of the population mean μ? Let us build the 95% CI Application of Confidence Interval  Let us draw a sample of 25 students from HEC (n = 30), with X = 11.5 and σ = 4.7. What can we say about the likely values of the population mean μ? Let us build the 95% CI

59 HEC Sample Mean Distribution 95% of chances that the mean is indeed located within this interval 9.8 13.2

60 Hypothesis Testing  Hypothesis 1 : Students from SKEMA have an average grade which is not significantly different from 11 at 95% CI.  H 0 : Mean ( SKEMA ) = 11  H 1 : Mean ( SKEMA ) ≠ 11  Hypothesis 2 : Students from HEC have an average grade which is not significantly different from 11 at 95% CI.  H 0 : Mean ( SKEMA ) = 11  H 1 : Mean ( SKEMA ) ≠ 11 I Accept H 0 and reject H 1 because 11 is within the confidence interval.

61 Implication 3 Critical probability

62 We have concluded that the mean grade of the population of students from SKEMA is not significantly different from 11. To do so, we had to agree beforehand that 95% CI what the relevant confidence interval. But it is clear that if we had chosen another confidence interval (90%, 80%) our conclusion would have been different. Example

63 Critical probability The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter. There are two hypotheses  H 0 - the null hypothesis (Against you intuition)  H 1 - the alternative hypothesis (What you want to prove)

64 Critical probability The confidence interval is designed in such a way that for each z statistics chosen, we define a share of observations which this CI is comprising.  When z = 1.96, we have 95% CI  When z = 2.55, we have 99% CI If the tested value is within the confidence interval, we accept H 0. If the tested value is within the confidence interval, we accept H a.

65 Critical probability An alternative method by which decision about H 0 and H a can be made is provided with the computation of the critical probability – or p-value. What is the threshold value of z for which one concludes in favor of H a against H 0 ? One can compute directly the z value from the sample as follows: Target (or tested) value for the population mean

66 The p - value provides information about the amount of statistical evidence that supports the null hypothesis. The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed, given that the null hypothesis is true. Critical probability

67 Computing the critical probability  Let us draw a sample of 25 students from SKEMA (n = 25), with X = 10 and σ = 3. What can we say about the likely values of the population mean μ? Let us compute the z value.  Looking at the z table, it is now straightforward to recover the critical probability, for which we are indifferent between accepting or rejecting H0.

68 SKEMA Average grades 90.2% of chances that the mean is indeed located within this interval 4.9%

69 Interpreting the critical probability The probability of observing a test statistic at least as extreme as 11, given that the null hypothesis is true is 9.8%. We can conclude that the smaller the p-value the more statistical evidence exists to support the alternative hypothesis. But is 9.8% low enough to reject H 0 and to accept H a ?

70 Interpreting the critical probability  The practice is to reject H 0 only when the critical probability is lower than 0.1, or 10%  Some are even more cautious and prefer to reject H 0 at a critical probability level of 0.05, or 5%.  In any case, the philosophy of the statistician is to be conservative.

71 Interpreting the critical probability If the p-value is less than 1%, there is overwhelming evidence that support the alternative hypothesis. If the p-value is between 1% and 5%, there is a strong evidence that supports the alternative hypothesis. If the p-value is between 5% and 10% there is a weak evidence that supports the alternative hypothesis. If the p-value exceeds 10%, there is no evidence that supports of the alternative hypothesis.

72 The p-value can be used when making decisions based on rejection region methods as follows: 1. Define the hypotheses to test, and the required significance level  2. Perform the sampling procedure, calculate the test statistic and the p-value associated with it. 3. Compare the p-value to  Reject the null hypothesis only if p <  ; otherwise, do not reject the null hypothesis. Decisions Using the Critical probability

73 If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true. If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true. Decisions Using the Critical probability The alternative hypothesis is the more important one. It represents what we are investigating.

74 Prerequisite 4 Student T test

75  Thus far, we have assumed that we know both the standard deviation of the population. But in fact, we do not know it: σ is unknown.  When the sample is small, we should be imprecise. To take account of sample size, we use the t distribution, not z.  The Student t statistics is then preferred to the z statistics. Its distribution is similar (identical to z as n → +∞). The CI becomes The Student Test

76  Let us draw a sample of 25 students from SKEMA (n = 25), with μ = 10 and σ = 3. Let us build the 95% CI Application of Student t to CI’s  Let us draw a sample of 25 students from HEC (n = 30), with μ = 11.5 and σ = 4.7. Let us build the 95% CI

77  Import SKEMA_LMC into Stata  Produce descriptive statistics for sales; labour, and R&D expenses  A newspaper writes that by and large, LMCs have 95,000 employees.  Test statistically whether this is true at 1% level  Test statistically whether this is true at 5% level  Test statistically whether this is true at 10% and 20% level  Write out H 0 and H 1 STATA Application: Student t

78 Results (at 1% level)

79 STATA Application: Student t Two ways of computing confidence intervals: mean var1 ttest var1==specific value, option For example: mean lnassets ttest lnassets == 11 Or even manually (for a sample of more than 100 observations): sum lnassets display r(mean)-1.96*r(sd)/r(N)^(1/2) display r(mean)+1.96*r(sd)/r(N)^(1/2)

80 STATA Application: Student t Stata InstructionDescriptive statistics H0 Ha T value I accept H 0

81 Critical probability  With t = 1.552, I can conclude the following:  12% probability that μ belongs to the distribution where the population mean = 95,000  I have 12% chances to wrongly reject H 0  88% probability that μ belongs to another distribution where the population mean ≠ 95,000  I have 88% chances to rightly reject H 0 Shall I the accept or reject H0?

82 6.1% 88.0%

83 Critical probability  With t = 1.552, I can conclude the following:  12% probability that μ belongs to the distribution where the population mean = 95,000  I have 12% chances to wrongly reject H 0  88% probability that μ belongs to another distribution where the population mean ≠ 95,000  I have 88% chances to rightly reject H 0 I accept H 0 !!!

84  Import SKEMA_LMC into SPSS  Produce descriptive statistics for sales; labour, and R&D expenses  Analyse  Statistiques descriptives  Descriptive  Options: choose the statistics you may wish  A newspaper writes that by and large, LMCs have 95,000 employees.  Test statistically whether this is true at 1% level  Test statistically whether this is true at 5% level  Test statistically whether this is true at 10% and 20% level  Write out H 0 and H 1  Analyse  Comparer les moyennes  Test t pour é chantillon unique  Options: 99; 95, 90% SPSS Application: Student t

85 SPSS Application: t test at 99% level

86 SPSS Application: t test at 95% level

87 SPSS Application: t test at 80% level

88 Implication 4 Comparison of means

89 Comparison of means  Sometimes, the social scientist is interested in comparing means across two population.  Mean wage across regions  Mean R&D investments across industries  Mean satisfaction level across social classes  Instead of comparing a sample mean with a target value, we will compare the two sample means directly

90 Comparing the Means Using CI’s  The simplest way to do so is to compute the confidence intervals of the two population means.  Confidence interval for population 1  Confidence interval for population 2

91 Comparing the Means Using CI’s  If the two confidence interval overlap, we will conclude that the two sample mean come from the same population. We do not reject the null hypothesis H 0 that µ 1 = µ 2  If the two confidence interval overlap, we will conclude that the two sample mean come from the same population. We reject the null hypothesis and accept the alternative hypothesis H a that µ 1 ≠ µ 2

92 Example  Competition across business schools is fierce. Imagine you want to compare the performance of students between and SKEMA HEC schools.  Hypothesis 1: Students from SKEMA have similar grades as students from HEC  H 0 : µ SKEMA = µ HEC  H 1 : µ SKEMA ≠ µ HEC

93 SKEMA sample mean distribution 8.8 11.2

94 HEC Average grades 9.8 13.2

95 Comparison of sample mean Distributions CI SKEMA CI HEC Since the two confidence interval overlap, we conclude that the two sample means come from the same population. We do not reject the null hypothesis H 0 that µ SKEMA = µ HEC

96 A Direct Comparison of Means Using Student t  Another way to compare two sample means is to calculate the CI of the mean difference. If 0 does not belong to CI, then the two sample have significantly different means. Standard error, also called pooled variance

97  Another newspaper argues that American (US + Canada) companies are much larger than those from the rest of the world. Is this true?  Produce descriptive statistics labour comparing the two groups  Produce a group variables which equals 1 for US firms, 0 otherwise  This is called a dummy variable  Write out H 0 and H 1  Run the student t test  What do you conclude at 5% level?  What do you conclude at 1% level? Stata Application: t test comparing means

98 STATA Application: Student t  We again use the same command as before. But since we compare means, we need to mention to two groups we are comparing. Two ways of computing confidence intervals: ttest var1, by(catvar) For example: ttest labour, by(usgroup)

99 STATA Application: Student t Stata Instruction Descriptive statistics H0 Ha T value

100 SPSS Application: t test comparing means

101

102 Implication 5 Bi- or Uni- Lateral Tests?

103 Bilateral versus Unilateral tests  Up to now, we have always thought in terms of whether two means are equal. The alternative hypothesis is that the two means are different  There are many instances for which one may be willing to test inequalities between means.  Biotech companies have a higher R&D intensity than big pharmas (large pharmaceutical companies)  Biotech (pharma) companies publish / patent / innovate more (less)

104 Unilateral tests  To answer this question, we need to rewrite H 0 and H a as follows.  H 0 stands for the hypothesis which contradicts your intuition  H a stands for the hypothesis in favour of your intuition  In the case of R&D intensity, our intuition is that biotech companies are more R&D intensive. Hence  H 0 : µ biotech ≤ µ pharma  H a : µ biotech > µ pharma

105 The Bilateral Tests  Reminder on the method of the bilateral test. H 0 : µ biotech = µ pharma H a : µ biotech ≠ µ pharma Reject H 0 if |z| ≥ |z a | zaza -z a

106 Superior Unilateral Tests  The trick is simply to put the confidence interval on one side of the distribution. H 0 : µ biotech ≤ µ pharma H a : µ biotech > µ pharma Reject H 0 if z ≥ z a zaza

107 Inferior Unilateral Tests  The trick is simply to put the confidence interval on one side of the distribution. H 0 : µ biotech ≥ µ pharma H a : µ biotech < µ pharma Reject H 0 if z ≤ z a zaza

108 STATA Application: Student t Stata Instruction Descriptive statistics H0 Ha T value

109  Another newspaper argues that American (US + Canada) companies are much larger than those from the rest of the world. Is this true?  Produce descriptive statistics labour comparing the two groups  Produce a group variables which equals 1 for US firms, 0 otherwise  This is called a dummy variable  Write out H 0 and H 1  Run the student t test  What do you conclude at 5% level?  What do you conclude at 1% level? Stata Application: t test comparing means

110 STATA Application: Student t  We again use the same command as before. But since we compare means, we need to mention to two groups we are comparing. Two ways of computing confidence intervals: ttest var1, by(catvar) For example: ttest labour, by(usgroup)


Download ppt "Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques SKEMA Ph.D programme 2010-2011."

Similar presentations


Ads by Google