# Hypothesis testing.

## Presentation on theme: "Hypothesis testing."— Presentation transcript:

Hypothesis testing

Hypothesis testing H0= ? Ha= ?
The process of making judgments about a large group (population) on the basis of a small subset of that group (sample) is known as statistical inference. Hypothesis testing, one of two fields in statistical inference, allows us to objectively assess the probability that statements about a population are true. Because these statements are probabilistic in nature, we can never be certain of their truth. Steps in hypothesis testing Stating the hypotheses. Identifying the appropriate test statistic and its probability distribution. Specifying the significance level. Stating the decision rule. Collecting the data and calculating the test statistic. Making the statistical decision. Making the economic or investment decision. LOS: Define a hypothesis and describe the steps of hypothesis testing. Pages 243–244 It is worth spending time on Steps 1 and 2 at this stage. The statements of the null and alternative hypotheses are often where individuals will struggle. You will often encounter questions like “Is the null what I want to show or what I don’t want to show?” This creates an immediate tension because any truly scientific process should be devoid of “want” and be totally objective. Rather, think more along the lines of what can and can’t be demonstrated. H0= ? Ha= ?

1. State the hypothesis The foundation of hypothesis testing lies in determining exactly what we are to test. We organize a hypothesis test into two categories. The null hypothesis, denoted H0, is the hypothesis we are testing. The alternative hypothesis is denoted Ha. The different possibilities represented by the two hypotheses should be mutually exclusive and collectively exhaustive. Three different ways of formulating a hypothesis test: H0: θ = θ0 versus Ha: θ ≠ θ0 (a “not equal to” alternative hypothesis) H0: θ ≤ θ0 versus Ha: θ > θ0 (a “greater than” alternative hypothesis) H0: θ ≥ θ0 versus Ha: θ < θ0 (a “less than” alternative hypothesis) Hypothesis tests generally concern the true value of a population parameter as determined using a sample statistic. LOS: Define and interpret the null hypothesis and alternative hypothesis. Page 245 Each of these are mutually exclusive and collectively exhaustive. The following slide will show them in visual form to reinforce this point. The last bullet point is an opportunity to link the prior chapters on statistical (point) estimation and hypothesis testing. In hypothesis testing, we are making probabilistic statements about the relationship between point estimates and their underlying population parameters.

1. State the hypothesis The hypothesis is designed to assess the likelihood of a sample statistic accurately representing the population statistic it attempts to measure. Hypothesis tests are formulated in such a way that they lead to either one- tailed tests or two-tailed tests. One-tailed tests are comparisons based on a single side of the distribution, whereas two-tailed tests admit the possibility of the true population parameter lying in either tail of the distribution. H0: θ = θ0 versus Ha: θ ≠ θ0 (a two-tailed test) H0: θ ≤ θ0 versus Ha: θ > θ0 (a one-tailed test for the upper tail) H0: θ ≥ θ0 versus Ha: θ < θ0 (a one-tailed test for the lower tail) LOS: Distinguish between one-tailed and two-tailed tests of hypotheses. Page 245 It is important to note that the test of the null always ends up being a test of equivalence, but that the choice of a one- or two-tailed decision does affect the testing procedure when we assess the likelihood of the null being true compared with the probability distribution of our point estimate. In practice, virtually all tests end up being conducted as two-tailed tests.

1. State the hypothesis Focus On: Choosing the Null and Alternative Hypotheses The selection of an appropriate null hypothesis and, as a result, an alternative hypothesis, center around economic or financial theory as it relates to the point estimate(s) being tested. Two-tailed tests are more “conservative” than one-tailed tests. In other words, they lead to a fail-to-reject the null hypothesis conclusion more often. One-tailed tests are often used when financial or economic theory proposes a relationship of a specific direction. LOS: Discuss the choice of the null and alternative hypotheses. Pages 245–246 As pointed out in the text, one-tailed tests are most often selected when there is a “hoped for” outcome or when the researcher has strong prior beliefs. Because they are also less conservative tests, the probability is increased that the outcome of the test is the result of researcher biases (see earlier sampling chapter) instead of a true underlying difference between the sample estimate and the population parameter. Accordingly, many researchers will use two-tailed tests even when one-tailed tests might be appropriate.

2. Identifying the appropriate test statistic and its probability distribution
Test statistic= Sample statistic−Population parameter under 𝐻 0 Standard error of the sample statistic The test statistic is a measure based on the difference between the hypothesized parameter and the sample point estimate that is used to assess the likelihood of that sample statistic resulting from the underlying population. For a hypothesis test regarding the numerical value of the mean of the population as contained in the null hypothesis, such a test statistic would be known population variance unknown population variance LOS: Define and interpret a test statistic. Pages 246–247 Almost all the test statistics we use in this presentation will be in the form at the top of the slide. There are some that are couched as ratios which we will encounter later. TS= 𝑋 − μ 0 σ 𝑋 TS= 𝑋 − μ 0 s 𝑋 or where where σ 𝑋 = σ 𝑛 or s 𝑋 = s 𝑛

2. Identifying the appropriate test statistic and its probability distribution
Test statistic= Sample statistic−Population parameter under 𝐻 0 Standard error of the sample statistic Test statistics that we implement will generally follow one of the following distributions: t-distribution Standard normal F-distribution Chi-square distribution LOS: Define and interpret a test statistic. Page 247 The selection of an appropriate distribution will be covered later in the slides.

Errors in Hypothesis tests
Type I errors occur when we reject a null hypothesis that is actually true. Type II errors occur when we do not reject a null hypothesis that is false. True Situation Decision H0: True H0: False DNR Correct decision Type II error Reject Type I error Correct decision* Mutually exclusive problems: If we mistakenly reject the null, we make a Type I error. If we mistakenly fail to reject the null, we make a Type II error. Because we can’t reject and fail to reject simultaneously because of the mutually exclusive nature of the null and alternative hypothesis, the errors are also mutually exclusive. * The rate at which we correctly reject a false null hypothesis is known as the power of the test. LOS: Define and interpret a Type I error and a Type II error. Pages 247–248 The information here regarding Type I errors is reinforced when we discuss significance levels on the next slide.

3. Specifying the significance level
The level of significance is the desired standard of proof against which we measure the evidence contained in the test statistic. The level of significance is identical to the level of a Type I error and, like the level of a Type I error, is often referred to as “alpha,” or a. How much sample evidence do we require to reject the null? Statistical “burden of proof.” The level of confidence in the statistical results is directly related to the significance level of the test and, thus, to the probability of a Type I error. LOS: Define and interpret a significance level and explain how significance levels are used in hypothesis testing. Page Type I errors (levels of significance) are generally considered the most important feature of the hypothesis testing procedure because a fail to reject generally indicates no action to be taken whereas a rejection of the null may lead to action. Because Type I errors are the rate at which we reject when we shouldn’t, a high Type I error rate would lead to taking action when we shouldn’t. Significance Level Suggested Description 0.10 “some evidence” 0.05 “strong evidence” 0.01 “very strong evidence”

Because the significance level and the Type I error rate are the same, and Type I and Type II rates are mutually exclusive, there is a trade-off in setting the significance level. If we decrease the probability of a Type I error by specifying a smaller significance level, we increase the probability of a Type II error. The only way to decrease the probability of both errors at the same time is to increase the sample size because such an increase reduces the denominator of our test statistic. LOS: Discuss how the choice of significance level affects the probabilities of Type I and Type II errors. Page 248 Quantifying the trade-off is very difficult because the level of a Type II error generally cannot be observed. Decreased Type I but Increased Type II

Power of the test The power of the test is the rate at which we correctly reject a false null hypothesis. When more than one test statistic is available, use the one with the highest power for the specified level of significance. The power of a given test statistic also generally increases with an increase in sample size. LOS: Define the power of a test. Page 248 We depict the power of a given test statistic using a “power curve,” which plots the value of the test statistic power at different levels of confidence often across different sample sizes.

4. Stating the decision rule
The decision rule uses the significance level and the probability distribution of the test statistic to determine the value above (below) which the null hypothesis is rejected. The critical value (CV) of the test statistic is the value above (below) which the null hypothesis is rejected. Also known as a rejection point. One-tailed tests are indicated with a subscript a. Two-tailed tests are indicated with a subscript a /2. LOS: Define and interpret a decision rule. Page 249 Although the interior region is known as an “acceptance” region, this label is a misnomer. We never accept a null, we only fail to reject it, a much lower standard of proof. CV CV

Confidence Interval or Hypothesis test?
Two-tailed hypothesis tests can easily be rewritten as confidence intervals. Recall that a two-tailed hypothesis test rejects the null when the observed value of the test statistic is either below the lower critical value or above the upper. The lower critical value can be restated as the lower limit on a confidence interval. The upper critical value can be restated as the upper limit on a confidence interval. [ 𝑋 − 𝑧 α 2 σ 𝑋 , 𝑋 − 𝑧 α 2 σ 𝑋 ] When the hypothesized population parameter lies within this confidence interval, we fail to reject the null hypothesis. Although this relationship is useful, it precludes easy calculation of the significance level of the test, known as a p-value, from the values of the standard error and point estimate. LOS: Explain the relation between confidence intervals and hypothesis tests. Pages 249–253 This also works for one-tailed tests, particularly with distributions that have an infinite limit to one side or the other; the explanation lacks some clarity and detail (one end of the confidence interval will be the value infinity).

The empirical conclusion
The next two steps in the process follow from the first four. 5. Collect the data and calculate the test statistic. In practice, data collection is likely to represent the largest portion of the time spent in hypothesis testing, and care should be given to the sampling considerations discussed in the other chapters, particularly biases introduced in the data collection process. 6. Make the statistical decision. The statistical process is completed when we compare the test statistic from Step 5 with the critical value in Step 4 and assess the statistical significance of the result. Reject or fail to reject the null hypothesis. LOS: Explain the relation between confidence intervals and hypothesis tests. Pages 251–252

7. Make the economic decision
Quantitative analysis is used to guide decision making in a scientific manner; hence, the end of the process lies in making a decision. The economic or investment decision should take into account not only the statistical evidence, but also the economic value of acting on the statistical conclusion. We may find strong statistical evidence of a difference but only weak economic benefit to acting. Because the statistical process often focuses only on one attribute of the data, other attributes may affect the economic value of acting on our statistical evidence. For example, a statistically significant difference in mean return for two alternative investment strategies may not lead to economic gain if the higher-returning strategy has much higher transaction costs. The economic forces leading to the statistical outcome should be well understood before investing. LOS: Distinguish between a statistical decision and an economic decision. Page 252

The p-value approach The p-value is the smallest level of significance at which a given null hypothesis can be rejected. The selection of a particular level of significance is somewhat arbitrary. Lower levels lead to greater confidence but come at an increased risk of Type II errors. For a given test statistic and its distribution, we can determine the lowest possible level of alpha (highest possible critical value) for which we would reject the null hypothesis. Calculate the test statistic as before. Use a statistical package, spreadsheet, etc., to look up the “inverse” value of that test statistic. This value is the probability at which you would encounter a test statistic of that magnitude or greater (lesser). Smaller p-values mean greater confidence in the significance of the results, but leave the assessment of how much confidence to the reader. LOS: Discuss the p-value approach to hypothesis testing. Page 252–253 This approach is also sometimes known as the “marginal significance level” approach.

Testing a single mean Tests comparing a single mean with a value:
We almost never know the variance of the underlying population, and in such cases, tests of a single mean are either t-tests or z-tests. Tests comparing a single mean with a value: Use a t-test with df = n – 1 when Population variance is unknown and Sample is large or sample is small but (approximately) normally distributed. Can use a z-test if the sample is large or the population is normally distributed. Note that two of these use the sample standard deviation as an estimate of population standard deviation. 𝑡 𝑛−1 = 𝑋 − μ 0 𝑠 𝑛 unknown pop. variance known pop. variance 𝑧= 𝑋 − μ 0 𝑠 𝑛 𝑧= 𝑋 − μ 0 σ 𝑛 LOS: Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the population mean of a normally distributed population with (1) known or (2) unknown variance. Pages 254–261 The t-test is robust to departures from normality, but not to outliers or strong skewness. In both cases, the t-test is the theoretically correct test, but with large samples or normally distributed populations, the z and t critical values will be close to one another.

Testing a single mean Focus On: Calculations 𝑡 𝑛−1 = 𝑋 − μ 0 𝑠 𝑛
You have collected data on monthly equity returns and determined that the average return across the 48-month period you are examining was 12.94% with a standard deviation of returns of 15.21%. You want to test whether this average return is equal to the 15% return that your retirement models use as an underlying assumption. You want to be 95% confident of your results. Formulate hypothesis  H0: θ = 15% versus Ha: θ ≠ 15% (a two-tailed test). Identify appropriate test statistic  t-test for an unknown population variance. 3. Specify the significance level  0.05 as stated in the setup leading to a critical value of Collect data (see above) and calculate test statistic  Make the statistical decision  DNR the null hypothesis. Statistically  12.94% is not statistically different from 15% for this sample. Economically  12.94% is likely to affect the forecast outcomes of retirement planning. 𝑡 𝑛−1 = 𝑋 − μ 0 𝑠 𝑛 LOS: Formulate a null and an alternative hypothesis about a population mean and determine whether to reject the null hypothesis at a given level of significance. Pages 254–261 This is a problem for which the outcome statistically is “no difference,” but economically, there would be a large difference in retirement plans resulting from changing the underlying assumptions of a retirement model from 15% to 12.94%. This result can be used to highlight the need for an underlying rationale for the hypothesis test being conducted, and the necessity of examining all possible hypotheses in light of the decision you are contemplating. The economic conclusion is that there is no STATISTICAL evidence for changing the assumption, but there is certainly an economic rationale for ensuring that it is correct with further examination.

Difference in means or mean differences?
The critical distinction between testing for a difference in means and testing for a mean difference parameter value lies with sample independence. Independent samples  Test of difference in means If population variance is known, we use the population standard deviation in determining the standard error of the statistic. Otherwise, we use the sample standard deviation. When the variances are presumed the same, the standard error of the mean is calculated on a pooled basis and the degrees of freedom differ for two samples from the same population versus two from different populations. Dependent samples  Test of mean difference and use the variance of the differences in the test statistic LOS: Discuss the choice between tests of differences between means and tests of mean differences (paired comparisons test) in relation to the independence of samples. Pages 261–265 The primary issue in selecting between the two is whether there is an underlying reason to believe the two are not independent, which may be because of a functional relationship (balance sheet identities, for example) or because of an underlying economic or financial relationship (oligopolistic competitors, for example).

Testing for a difference in means
Independent Samples 𝑡= 𝑋 1 − 𝑋 2 −( μ 1 − μ 2 ) 𝑠 𝑝 2 𝑛 𝑠 𝑝 2 𝑛 2 Normally distributed, equal but unknown variances Uses a pooled variance estimator, sp2, which is a weighted average of the sample variances. Normally distributed, unequal and unknown variances Uses a different pooled variance estimator and has a lower number of degrees of freedom. df= 𝑛 1 + 𝑛 2 −2 𝑡= 𝑋 1 − 𝑋 2 −( μ 1 − μ 2 ) 𝑠 𝑛 𝑠 𝑛 2 LOS: Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the equality of two population means of two normally distributed populations, based on independent random samples, with (1) equal or (2) unequal assumed variances. Pages 261–265 The Chapter 7 spreadsheet has these tests built in as sample calculations. In the first case, we are using both sample variances as estimators for the unknown but presumed equal population variances. In the second, we lose degrees of freedom because we are using two estimates to capture differing population variance effects. By decreasing the degrees of freedom, we are increasing the critical value for any given alpha. df= 𝑠 𝑛 𝑠 𝑛 𝑠 𝑛 𝑛 𝑠 𝑛 𝑛 2 2

Testing for a difference in means
Focus On: Calculations You have decided to investigate whether the return to your client’s retirement portfolio will be enhanced by the addition of foreign equities. Accordingly, you first want to test whether foreign equities have the same return as domestic equities before proceeding with further analysis. Recall that U.S. equities returned 12.94% with a standard deviation of 15.21% over the prior 48 months. You have determined that foreign equities returned 17.67% with a standard deviation of 16.08% over the same period. You want the same level of confidence in this result (5%). You are willing to assume, for now, that the two samples are independent, approximately normally distributed, and drawn from a population with the same underlying variance. LOS: Formulate a null and an alternative hypothesis about the equality of two population means (normally distributed populations, independent samples), select the appropriate test statistic, and determine whether to reject the null hypothesis at a given level of significance. Pages 262–265

Testing for a difference in means
Focus On: Calculations Stating the hypotheses  H0: mDomEq = mForEq versus Ha: mDomEq ≠ mForEq Identifying the appropriate test statistic and its probability distribution  t-test for unequal means with a normal distribution and unknown but equal variances Specifying the significance level  CV = –1.986 Stating the decision rule  Reject the null if |TS| > 1.986 Collecting the data and calculating the test statistic  Making the statistical decision  FTR 𝑡= 𝑋 1 − 𝑋 2 −( μ 1 − μ 2 ) 𝑠 𝑝 2 𝑛 𝑠 𝑝 2 𝑛 2 df=48+48−2 LOS: Formulate a null and an alternative hypothesis about the equality of two population means (normally distributed populations, independent samples), select the appropriate test statistic, and determine whether to reject the null hypothesis at a given level of significance. Pages 262–265 How would this change if you assumed the variances are unequal? In this case, it would change very little because the sample sizes are large and variances are relatively similar. The conclusion of this test is that the mean return to domestic and foreign equities is statistically not different.

Testing for a mean difference
Dependent samples by definition Use paired observations and test the mean difference across pairs. They are normally distributed with unknown variances. Steps: Calculate the difference for each pair of observations. Calculate the standard deviation of differences. The test statistic: is approximately t-distributed. LOS: Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the mean difference between two normally distributed populations (paired comparisons test). Pages 265–267 This is sometimes called a “paired comparisons test.” These tests rely on a common factor in the observations that cause dependence between the observations in such a way that we can combine them into pairs for analysis. The pairs may be two different units of observation for which we are analyzing differences between the units across time or the same unit of observation for which we are measuring differences across time (before and after). 𝑡= 𝑑 − μ 𝑑0 𝑠 𝑑 where

Testing for a mean difference
Month Dividend Payers Not Payers Difference 1 0.2340 0.2203 0.0137 2 0.4270 0.1754 0.2516 3 0.1609 0.1599 0.0010 4 0.1827 0.4676 –0.2849 5 0.3604 0.1504 0.2100 6 0.4039 0.3398 0.0641 7 0.3594 0.1332 0.2262 8 0.1281 0.0582 0.0699 9 –0.0426 0.1488 –0.1914 10 0.0653 –0.0035 0.0688 11 –0.0867 0.1227 –0.2094 12 0.0878 0.1781 –0.0903 Focus On: Calculations You are interested in determining whether a portfolio of dividend-paying stocks that you hold has performed the same as a portfolio of non- dividend-paying stocks over the last 12 months. The portfolios are composed of a dividend paying/non-dividend-paying pair in each industry you hold. The returns on the portfolios and the difference in returns is: LOS: Formulate a null and an alternative hypothesis about the mean difference between two normally distributed populations (paired comparisons test), select the appropriate test statistic, and determine whether to reject the null hypothesis at a given level of significance. Pages 268–269

Testing for a mean difference
Dividend Payers Not Payers Difference Average 0.1900 0.1792 0.0108 Std Dev 0.1714 0.1229 0.1760 Focus On: Calculations Stating the hypotheses  H0: mPayers – mNoPay = 0 versus Ha: mPayers – mNoPay ≠ 0 Identifying the appropriate test statistic and its probability distribution  t-test with 12 – 1 = 11 degrees of freedom Specifying the significance level  CV = 2.201 Stating the decision rule  Reject the null if |TS| > 2.201 Collecting the data and calculating the test statistic  Making the statistical decision  FTR 𝑡= 𝑑 − μ 𝑑0 𝑠 𝑑 LOS: Formulate a null and an alternative hypothesis about the mean difference between two normally distributed populations (paired comparisons test), select the appropriate test statistic, and determine whether to reject the null hypothesis at a given level of significance. Pages 268–269

Testing a single variance
Tests of a single variance Normally distributed population Chi-square test with df = n – 1 Very sensitive to underlying assumptions Is the variance of domestic equity returns from our previous example, 15.21%, statistically different from 10%? Test statistic  Critical value for a = 5% is  Reject the null 𝑋 2 = 𝑛−1 𝑠 2 σ 0 2 LOS: Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the variance of a normally distributed population. Pages 269–270

Testing for equality of variance
Tests comparing two variance measures: If we have two normally distributed populations, then a ratio test of the two variances follows an F-distribution. 𝐹 df 1 , df 2 = 𝑠 𝑠 2 2 df 𝑖 = 𝑛 𝑖 −1 If the test statistic is greater than the critical value for an F-distribution with df1 and df2 degrees of freedom, reject the null. LOS: Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the equality of the variance of two normally distributed populations, based on two independent random samples. Pages 271–272 By convention, we use the larger of the two variances in the numerator of the F-test statistic.

Testing for equality of variance
Focus On: Calculations Return now to our earlier example comparing foreign and domestic equity returns. In the example, we assumed that the variances were equal. Perform the necessary test to assess the validity of this assumption. Recall we had 48 observations for each return series, foreign equity returns had a standard deviation of 16.08%, and domestic of %. LOS: Formulate a null and an alternative hypothesis about the equality of the variances of two populations (normally distributed populations, independent samples), select the appropriate test statistic, and determine whether to reject the null hypothesis at a given level of significance. Pages 271–274

Testing for equality of variance
Focus On: Calculations Stating the hypotheses  H0: sDomEq/sForEq = 1 versus Ha: sDomEq/sForEq ≠ 1 Identifying the appropriate test statistic and its probability distribution  F-test for a ratio of variances Specifying the significance level  CV = Stating the decision rule  Reject the null if TS > Collecting the data and calculating the test statistic  Making the statistical decision  FTR 𝐹 df 1 , df 2 = 𝑠 𝑠 2 2 df 𝑖 = 𝑛 𝑖 −1 LOS: Formulate a null and an alternative hypothesis about the equality of the variances of two populations (normally distributed populations, independent samples), select the appropriate test statistic, and determine whether to reject the null hypothesis at a given level of significance. Pages 271–274 The results bear out our assumption that the variances of the two samples are equal statistically.

Nonparametric statistics
Tests are said to be parametric when they are concerned with parameters and their validity depends on a definite set of assumptions. This definition is particularly true when one of the assumptions deals with the underlying distributional characteristics of the test statistic. Nonparametric tests, in contrast, are either not concerned with the value of a specific parameter, or make minimal assumptions about the population from which the sample is drawn. In particular, no, or few, assumptions are made about the distribution of the population. Nonparametric tests are useful when: The data do not meet necessary distributional assumptions. The data are given in ranks. The hypothesis does not address the value of the parameter or parameters. LOS: Distinguish between parametric and nonparametric tests and describe the situations in which the use of nonparametric tests may be appropriate. Pages 275–276 Nonparametric tests are an entire field of statistics. The tests most commonly encountered in finance are tests of the median and median differences and include the Wilcoxon signed-rank test, the Mann–Whitney U test, and the Sign test.

Testing for nonzero correlation
The Spearman rank correlation test can be used to assess the strength of a linear relationship between two variables. Calculating the test statistic: Rank the observations from largest to smallest for X and Y separately, with the largest value being ranked 1 for each. Calculate the difference in ranks for each pair of observations and then the Spearman rank correlation. The Spearman rank correlation test is t-distributed with df = n – 2 LOS: Explain the use of the Spearman rank correlation coefficient in a test that the correlation between two variables is zero. Pages 276–278 There are more evolved correlation tests that allow you to compare correlations between different pairs. If there is inquiry, refer them to Fisher’s z for correlation tests.

Summary Hypothesis testing allows us to formulate beliefs about investment attributes and subject those beliefs to rigorous testing following the scientific method. For parametric hypothesis testing, we formulate our beliefs (hypotheses), collect data, and calculate a value of the investment attribute in which we are interested (the test statistic) for that set of data (the sample), and then we compare that with a value determined under assumptions that describe the underlying population (the critical value). We can then assess the likelihood that our beliefs are true given the relationship between the test statistic and the critical value. Commonly tested beliefs associated with the expected return and variance of returns for a given investment or investments can be formulated in this way.