HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi

HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi
An-Najah National University Faculty of Nursing

HYPOTHESIS TESTING Statistical hypothesis testing provides objective criteria for deciding whether hypotheses are supported by empirical evidence. Suppose we hypothesized that participation of cancer patients in a stress management program would lower anxiety levels.

HYPOTHESIS TESTING The sample is 25 patients in a control group who do not participate in the program and 25 experimental subjects who do. All 50 subjects complete a post-treatment scale of anxiety, and the mean anxiety score for experimentals is 15.8 and that for controls is Should we conclude that the hypothesis was correct?

HYPOTHESIS TESTING True, group differences are in the predicted direction, but the results might simply be due to sampling fluctuations. The two groups might happen to be different by chance, regardless of the intervention. Perhaps with a new sample the group means would be nearly identical.

HYPOTHESIS TESTING Statistical hypothesis testing allows researchers to make objective decisions about study results. Researchers need such a mechanism for deciding which results likely reflect chance sample differences and which reflect true population differences.

The Null Hypothesis The procedures used in testing hypotheses are based on rules of negative inference. In the stress management program example, we found that those participating in the intervention had lower mean anxiety scores than subjects in the control group.

The Null Hypothesis There are two possible explanations for this result: (1) the intervention was successful in reducing patients’ anxiety; or (2) the differences resulted from chance factors, such as group differences in anxiety even before the treatment.

The Null Hypothesis The first explanation is our research hypothesis, and the second is the null hypothesis. The null hypothesis, it may be recalled, states that there is no relationship between variables. Statistical hypothesis testing is basically a process of disproof or rejection. It cannot be demonstrated directly that the research hypothesis is correct, but it is possible to show, using theoretical sampling distributions, that the null hypothesis has a high probability of being incorrect. Researchers seek to reject the null hypothesis through various statistical tests.

The null hypothesis in our example can be stated formally as follows:
H0: µE = µ C The null hypothesis (H0) claims that the mean population anxiety score for experimental subjects (E) is the same as that for controls (C). The alternative, or research, hypothesis (HA) claims the means are not the same: HA: µE # µ C Although null hypotheses are accepted or rejected based on sample data, the hypothesis is about population values. Hypothesis testing uses samples to draw conclusions about relationships within the population.

Type I and Type II Errors
Researchers decide whether to accept or reject a null hypothesis by determining how probable it is that observed group differences are due to chance. Because researchers lack information about the population, they cannot flatly assert تأكيد بشكل قاطع that a null hypothesis is or is not true. Researchers must be content to conclude that hypotheses are either probably true or probably false. Statistical inferences are based on incomplete information, so there is always a risk of error.

Researchers can make two types of error: rejecting a true null hypothesis or accepting a false null hypothesis. Researchers make a Type I error by rejecting the null hypothesis when it is, in fact, true. For instance, if we concluded that the experimental treatment was more effective than the control condition in alleviating patients’ anxiety, when in fact observed differences in anxiety scores resulted from sampling fluctuations, we would be making a Type I error.

a Type II error Conversely, if we concluded that group differences in anxiety scores resulted by chance, when in fact the intervention did reduce anxiety, we would be committing a Type II error by accepting a false null hypothesis.

Level of Significance Researchers do not know when an error in statistical decision making has been made. The validity of a null hypothesis could be ascertained only by collecting data from the population, in which case there would be no need for statistical inference. Researchers do, however, control the risk of a Type I error by selecting a level of significance, which signifies the probability of rejecting a true null hypothesis. The two most frequently used significance levels (referred to as alpha or ) are .05 and .01.

With a .05 significance level, we are accepting the risk that out of 100 samples drawn from a population, a true null hypothesis would be rejected only 5 times. With a .01 significance level, the risk of a Type I error is lower: in only 1 sample out of 100 would we erroneously reject the null hypothesis. The minimum acceptable level for usually is .05. A stricter level (e.g., .01 or .001) may be needed when the decision has important consequences.

Naturally, researchers want to reduce the risk of committing both types of error, but unfortunately lowering the risk of a Type I error increases the risk of a Type II error. The stricter the criterion for rejecting a null hypothesis, the greater the probability of accepting a false null hypothesis. Researchers must deal with tradeoffs in establishing criteria for statistical decision making, but the simplest way of reducing the risk of a Type II error is to increase sample size.

Example of significance levels:
Stark (2001) studied the relationship between the psychosocial tasks of pregnancy (e.g., preparation for labor) and pregnant women’s capacity to focus attention. Stark stated the following with regard to significance levels: “For all tests an alpha of .05 was designated a priori for significance. Because this is a new area of study, tests with an alpha of .10 were examined for trends” (p. 197).

Statistical Tests In practice, researchers do not construct sampling distributions and calculate critical regions. Research data are used to compute test statistics, using appropriate formulas. For every test statistic, there is a related theoretical distribution. Researchers compare the value of the computed test statistic to values in a table that specify critical limits for the applicable distribution.

When researchers calculate a test statistic that is beyond the critical limit, the results are said to be statistically significant. The word significant should not be read as important or clinically relevant. In statistics, significant means that obtained results are not likely to have been the result of chance, at a specified level of probability. A nonsignificant result means that any observed difference or relationship could have resulted from chance fluctuations.

TIP: When a statistical test indicates that the null hypothesis should be retained (i.e., when the results are nonsignificant), this is sometimes referred to as a negative result. Negative results are often disappointing to researchers and sometimes leads to rejection of a manuscript by a journal editor. Research reports with negative results are not rejected because editors are prejudiced against certain types of outcomes; they are rejected because negative results are usually inconclusive and difficult to interpret.

A nonsignificant result indicates that the result could have occurred as a result of chance, but offers no evidence that the research hypothesis is not correct.

Overview of Hypothesis-Testing Procedures
1. Select a test statistic. Researchers consider such factors as whether a parametric test is justified, which levels of measurement were used, whether a between-groups test is needed, and how many groups are being compared.

2. Establish the level of significance. Researchers establish the criterion for accepting or rejecting the null hypothesis before analyses are undertaken. An level of .05 is usually acceptable.

3. Select a one-tailed or two-tailed test. In most cases, a two-tailed test should be used, but in some cases a one-tailed test may be warranted.

4. Compute a test statistic. Using collected data, researchers calculate a test statistic using appropriate computational formulas, or instruct a computer to calculate the statistic.

5. Calculate the degrees of freedom (symbolized as df). Degrees of freedom is a concept that refers to the number of observations free to vary about a parameter. The concept is too complex for full elaboration here, but fortunately df is easy to compute.

6. Obtain a tabled value for the statistical test. There are theoretical distributions for all test statistics. These distributions enable researchers to determine whether obtained values of the test statistic are beyond the range of what is probable if the null hypothesis were true. Researchers examine a table for the appropriate test statistic and obtain the critical value corresponding to the degrees of freedom and significance level.

7. Compare the test statistic with the tabled
value. In the final step, researchers compare the value in the table with the value of the computed test statistic. If the absolute value of the test statistic is larger than the tabled value, the results are statistically significant. If the computed value is smaller, the results are nonsignificant.

When a computer is used to analyze data, researchers follow only the first two steps, and then give commands to the computer. The computer calculates the test statistic, the degrees of freedom, and the actual probability that the null hypothesis is true. For example, the computer may show that the probability (p) of an experimental group doing better than a control group on a measure of anxiety by chance alone is .025.

This means that only 25 times out of 1000 would a difference between the two groups as large as the one obtained reflect haphazard differences rather than true differences resulting from an intervention. The computed probability level can then be compared with the desired level of significance. If the significance level desired were .05, then the results would be significant, because .025 is more stringent صارم than .05.

If .01 involve different people (e.g., men versus women), the study uses a between-subjects design, and the statistical test is a between-subjects test (or test for independent groups). Other research designs involve one group of subjects (e.g., with a repeated-measures design, subjects are exposed to two or more treatments). In this situation, comparisons across treatments are not independent because the same subjects are used in all conditions. The appropriate statistical tests for such designs are within-subjects tests (or tests for dependent groups).

7. Compare the test statistic with the tabled
value. In the final step, researchers compare the value in the table with the value of the computed test statistic. If the absolute value of the test statistic is larger than the tabled value, the results are statistically significant. If the computed value is smaller, the results are nonsignificant.

HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi

Similar presentations

Presentation on theme: "HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi

Similar presentations

Presentation on theme: "HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi"— Presentation transcript:

Similar presentations

About project

Feedback