# Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.

## Presentation on theme: "Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin."— Presentation transcript:

Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin

Null Hypothesis Significance Testing Previous lecture, Steps of NHST –Specify the alternative/research hypothesis –Set up the null hypothesis –Collect data –Run the appropriate statistical test –Obtain the test statistic and associated p value –Decide whether to reject or fail to reject the null hypothesis on the basis of p value

Null Hypothesis Significance Testing Decision to reject or fail to reject H o –P value –Probability of obtaining the observed results if H o is true –By convention, use the significance level of p <.05 –Conclude that it is highly unlikely that we would obtain these results by chance, so we reject Ho –Caveat! The fact that there is a significance level does not mean that there is a simple ‘yes’ or ‘no’ answer to your research question

Null Hypothesis Significance Testing If you obtain results that are not statistically significant (p>.05), this does not necessarily mean that the relationship you are interested in does not exist There are a number of factors that affect whether your results come out as statistically significant –One and two-tailed tests –Type I and Type II errors –Power

One and Two-tailed Tests One-tailed / Directional Test –Run this when you have a prediction about the direction of the results Two-tailed / Non-Directional Test –Run this when you don’t have a prediction about the direction of the results

Recall previous example… Research Qu –Do anxiety levels of students differ from anxiety levels of young people in general? Prediction –Due to the pressure of exams and essays, students are more stressed than young people in general Method –You know the mean score for the normal young population on the anxiety measure = 50 –You predict that your sample will have mean > 50 –Run a one-tailed one-sample t test at p <.05 level

One-tailed Test Compare the mean of your sample to the sampling distribution for the population mean Decide to reject H o if your sample mean falls into the highest 5% of the sampling distribution

Dilemma But! What if your prediction is wrong? –Perhaps students are less stressed than the general young population Their own bosses, summers off, no mortgages –With previous one-tailed test, you could only reject Ho if you got an extremely high sample mean –What if you get an extremely low sample mean? Run a two-tailed test –Hedge your bets –Reject Ho if you obtain scores at either extreme of the distribution, very high or very low sample mean

Two-tailed Test You will reject H o when a score appears in the highest 2.5% of the distribution or the lowest 2.5% Note that it’s not the highest 5% and the lowest 5% as then you’d be operating at p =.1 level, rejecting Ho for 10% of the distribution So, we gain ability to reject Ho for extreme values at either end but values must be more extreme

Errors in NHST Howell (2008) p. 157 –“Whenever we reach a decision with a statistical test, there is always a chance that our decision is the wrong one” Misleading nature of NHST –Because there is a significance level (p =.05), people interpret NHST as a definitive exercise –Results are statistically significant or not –We reject H o or we don’t –The H o is wrong or right

Errors in NHST Remember we are dealing with probabilities –We make our decision on the basis of the likelihood of obtaining the results if H o is true –There is always the chance that we are making an error Two kinds of Error –We reject H o when it is true (Type I error) We say there’s a significant difference when there’s not –We accept H o when it is false (Type II error) We say there is no significant difference when there is

Type I Error Our anxiety example Predict that students will have greater anxiety score than young people in general Test H o that students’ anxiety levels do not differ from young people One-tailed one sample t-test at p <.05 Compare sample mean with sampling distribution of mean for the population (H o )

Type I Error Decide to reject H o if your sample mean falls in the top 5% of the distribution But! This 5%, even though at the extreme end, still belongs to the distribution If your sample mean falls within this top 5%, there is still a chance that your sample came from the H o population

Type I Error For example, if p =.04, this means that there is a very small chance that your sample mean came from that population, –But this is still a chance, you could be rejecting Ho when it is in fact true Researchers are willing to accept this small risk (5%) of making a Type I error, of rejecting Ho when it is in fact true Probability of making Type I error = alpha  = the significance level that you chose –.05,.01

Type II Error So why not set a very low significance level to minimise your risk of making a Type I error? –Set p <.01 rather than p <.05 As you decrease the probability of making a Type I error you increase the probability of making a Type II error Type II Error –Fail to reject H o when it is false –Fail to detect a significant relationship in your data when a true relationship exists

For argument’s sake, imagine that H 1 is correct Sampling Distribution under H o Sampling Distribution under H 1 Reject Ho if sample mean equals any value to the right of the critical value (red region) –Correct Decision Accept H o if sample mean equals any value to the left of the critical region –Type II Error

Four Outcomes of Decision Making True State of Nature DecisionH o is True H o is False Accept H o Correct DecisionType II Error Reject H o Type I ErrorCorrect Decision

Power You should minimise both Type I and Type II errors –In reality, people are often very careful about Type I (i.e. strict about  ) but ignore Type II altogether If you ignore Type II error, your experiment could be doomed before it begins –even if a true effect exists (i.e. H 1 is correct), if  is high, the results may not show a statistically significant effect How do you reduce the probability of a Type II error? –Increase the power of the experiment

Power Power –The probability of correctly rejecting a false H o –A measure of the ability of your experiment to detect a significant effect when one truly exists –1 - 

How do we increase the power of our experiment? Factors affecting power –The significance level (  ) –One-tailed v two-tailed test –The true difference between H o and H 1 (  o -  1 ) –Sample Size (n)

The Influence of  on Power Reduce the significance level (  )… –Reduce the probability of making a Type I error Rejecting the H o when it is true –Increase the probability of making a Type II error Accepting the H o when it is false –Reduce the power of the experiment to detect a true effect as statistically significant

Reduce  and reduce power

Increase  and increase power But! You increase the probability of a Type I error!

The Influence of One v Two-tailed Tests on Power We lose power with a two- tailed test –power is divided across the two tails of the experiment –Values must be more extreme to be statistically significant

The Influence of the True Difference between H o and H 1 The bigger the difference between  o and  1, the easier it is to detect it

The Influence of Sample Size on Power The bigger the sample size, the more power you have A big sample provides a better estimate of the population mean With bigger sample sizes, the sampling distribution for the mean clusters more tightly around the population mean Standard deviation of the sampling distribution, known as standard error the mean is reduced There is less overlap between the sampling distributions under H o and H 1 The power to detect a significant difference increases

The Influence of Sample Size on Power

Sample Size Exercise Open the following dataset –Software / Kevin Thomas / Power dataset (revised) –Explores the effects of Therapy on Depression Perform two Independent Samples t-test –Analyse / Compare means / Independent Samples t test –Group represents Therapy v Control –Score represents post-treatment depression –1. Group1 & Score1 –2. Group 2 & Score 2

Complete the following table Analysis 1Analysis 2 Size of sample Therapy mean score Therapy standard deviation Control mean score Control standard deviation Mean difference T statistic df P-value

What explains these results? Analysis 1Analysis 2 Size of sample20200 Therapy mean score5.5 Therapy standard deviation 3.032.89 Control mean score6.3 Control standard deviation 2.752.62 Mean difference-.8 T statistic-.618-2.051 Df18198 P-value.54.042

So, how do I increase the power of my study? You can’t manipulate the true difference between H o and H 1 You could increase your significance level (  ) but then you would increase the risk of a Type I error If you have a strong prediction about the direction of the results, you should run a one-tailed test The factor that is most under your control is sample size –Increase it!

Download ppt "Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin."

Similar presentations