Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics fundamentals

Similar presentations


Presentation on theme: "Statistics fundamentals"— Presentation transcript:

1 Statistics fundamentals
Neil Corrigan – Senior Medical Statistician University of Leeds: Clinical Trials Research Unit

2 Topics to cover Sampling distribution p-value (Hypothesis testing)
Confidence interval Significance & Power

3 Is this coin biased?  Is P(H) ≠ 0.5?
Our primary question here equates to asking “Is this unknown true value, P(H), not equal to 0.5”? Suppose that we have no previous knowledge of this coin, we’ve never flipped it before. Suppose also that we have limited time/resources - we’re in a queue about to spend this coin on something, say – but for whatever reason it has become very important to us to know whether or not this coin is biased. How can we go about trying to answer our question? 0.25 0.5 0.75 1 Probability

4 Is this coin biased?  Is P(H) ≠ 0.5?
How can we try to answer this? Could try repeatedly flipping the coin and see how often Heads comes up. Our primary question here equates to asking “Is this unknown true value, P(H), not equal to 0.5”? Suppose that we have no previous knowledge of this coin, we’ve never flipped it before. Suppose also that we have limited time/resources - we’re in a queue about to spend this coin on something, say – but for whatever reason it has become very important to us to know whether or not this coin is biased. How can we go about trying to answer our question? 0.25 0.5 0.75 1 Probability

5 If the coin happens to be fair then:
What do we think is the most likely number of heads from 10 flips? What range of values is reasonably likely? Say we only have enough time to flip the coin 10 times. If we want to use the information form those 10 flips to help us decide whether or not we can rule out the possibility that the coin is fair, then we need to think about what kinds of results we would reasonably expect to see if the coin was in fact fair. Suppose we don’t have a calculator or computer in front of us and can’t do any formal statistical calculations. What, just intuitively, do we think about these questions? 0.25 0.5 0.75 1 Probability

6 If the coin happens to be fair then:
What do we think is the most likely number of heads from 10 flips? What range of values is reasonably likely? 5 Heads 3-7, 2-8 at a push? Say we only have enough time to flip the coin 10 times. If we want to use the information form those 10 flips to help us decide whether or not we can rule out the possibility that the coin is fair, then we need to think about what kinds of results we would reasonably expect to see if the coin was in fact fair. Suppose we don’t have a calculator or computer in front of us and can’t do any formal statistical calculations. What, just intuitively, do we think about these questions? 0.25 0.5 0.75 1 Probability

7 Don’t know No H H H T H H T H T H 7/10 Heads
Is this coin biased? (Is P(H) ≠ 0.5?) Is 7/10 so unlikely from a fair coin, that the coin must not be fair? Don’t know No ? ??? 0.25 0.5 0.7 0.75 1 Probability We flip our coin 10 times, and this is what we get. 7 Heads, 3 Tails. Just intuitively, without doing any formal statistics, what does this result tell us about our coin? Can we answer our primary question? (I.e. can we rule out the possibility that P(H)=0.5?) If the coin was fair, then would this observation of 7/10 so unlikely that it casts reasonable doubt on the possibility? H H H T H H T H T H 7/10 Heads

8 Does our result tell us anything about P(H)?
1 0.25 0.5 0.7 0.75 1 Probability So we can’t conclusively answer our primary question with what we observed, but can we gleam ANY information from our experiment? Anything at all? Before our experiment, all we could say was the (obvious) statement that P(H) might be 0, 1, or any value inbetween. Can we narrow that uncertainty down now that we have some information? H H H T H H T H T H 7/10 Heads

9 Does our result tell us anything about P(H)?
1 ? 0.25 0.5 0.7 0.75 1 Probability Let’s start off with the easiest inferences…how about P(H)=0? If P(H)=0, then could we have seen 7/10 heads? (No) H H H T H H T H T H 7/10 Heads

10 Does our result tell us anything about P(H)?
0.001 1 ? 0.25 0.5 0.7 0.75 1 Probability ?– we now know that P(H) is not 0, since we saw a H, and is not 1, since we saw a T. Kind of trivial, but it’s a start – we know this now. H H H T H H T H T H 7/10 Heads

11 Does our result tell us anything about P(H)?
0.001 0.999 ? 0.1 0.25 0.5 0.7 0.75 1 Probability How about some of the lower values on the spectrum? If the true P(H)=0.1, then it would be incredibly unlikely that we’d observe 7/10 heads! Of course, it’s not impossible, but based on our results we can pretty confidently say that P(H) is not 0.1. In fact, arguably you can rule out higher values than that too! H H H T H H T H T H 7/10 Heads

12 Does our result tell us anything about P(H)?
0.101 0.999 0.25 0.5 0.7 0.75 1 Probability How about some of the lower values on the spectrum? If the true P(H)=0.1, then it would be incredibly unlikely that we’d observe 7/10 heads! Of course, it’s not impossible, but based on our results we can pretty confidently say that P(H) is not 0.1… H H H T H H T H T H 7/10 Heads

13 Does our result tell us anything about P(H)?
0.25 (??) 0.95 (??) 0.25 0.5 0.7 0.75 1 Probability …in fact, arguably you can rule out higher values than that too! H H H T H H T H T H 7/10 Heads

14 Does our result tell us anything about P(H)?
0.25 (??) 0.95 (??) 0.25 0.5 0.7 0.75 1 Probability Conclusions: Can’t say whether or not the coin is fair. …but pretty confident that P(H) is not very low (e.g. 0.2 or less) So, the conclusions that we can intuitively make from our findings are [blah]. They pretty much sum up this experiment, although they aren’t very precise, which is fine if it’s about a coin, but not so much if it’s about something that we really do need to know with precision (and which isn’t as intuitive to think about). Let’s briefly consider what a Statistician would do if you gave these results to them…

15 So, let’s give our coin flip data to a statistician and ask them to use it to figure out whether the coin is biased.

16 Alternative hypothesis
Is the coin biased? Null hypothesis H0: P(H)=0.5 Alternative hypothesis Ha: P(H)≠0.5 The process of a statistical analysis pretty much follows the rationale that we followed earlier, except it just seeks to formalise it to make it more reproducible, quantify it all, build on a mathematical framework and express it all with much more precision. First thing’s first. We want to know “Is the coin biased?”. In the more formal statistical language it would sound more like this… Null hypothesis = the thing which is to be disproven (too strong)/rejected.

17 Is the coin biased? If the coin is fair, then what range of values for “number of heads out of 10 flips” would be reasonably likely? 3-7, 2-8 at a push. What is the sampling distribution of “number of Heads from 10 flips” under H0? We then had a think about what kind of range you’d expect to see if the coin was fair (to lay the foundation for testing whether it’s fair based on the data). Without using statistics, we can kind of put our finger in the air and say how likely certain values “feel”. Similarly, the statistician will ask “what is the sampling distribution?”…

18 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Whenever you run an experiment, the true value that you’re trying to estimate is constant, unchanging, but because of the nature of chance there is a whole range of possibilities of what your results could show. To illustrate this, I painstakingly found a perfectly fair coin and did our experiment (10 coin flips) 100,000 times (by which I mean I wrote a program to simulate flipping a fair coin 10 times, and then repeatedly ran it and recorded the results…so not so painstaking). Here’s what happened:…

19 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads So what is the sampling distribution generally? Whenever you run an experiment, the true value that you’re trying to estimate is constant, unchanging, but because of the nature of chance there is a whole range of possibilities of what your results could show. To illustrate this, I painstakingly found a perfectly fair coin (i.e. true value P(H)=0.5) and did our experiment (10 coin flips) repeatedly and recorded the results. *talk through first 3 runs*

20 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads

21 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads

22 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads After 10 runs, we can see that 4 and 5 are coming up the most. There has also been a 9/10 – a relatively unlikely result with a fair coin, but it happened!

23 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads 50 After 50 runs, my arm was starting to ache…

24 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads 50 100 3/10 Heads After 100 runs my thumb nail was numb, but also we can see the distribution of results starting to take shape, peaking at 0.5 (5/10 Heads), as one would expect from a fair coin with the frequency of the result diminishing the further we deviate from 5/10 (with the exception of 0.7 – a chance thing – 100 runs is not a good enough representation of a sampling distribution…)

25 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads 50 100 3/10 Heads 100,000 8/10 Heads After 100,000 of these things (okay, I used a computer to simulate this) we start to see a decent representation of what the sampling distribution is for “number of heads out of 10 coin flips” from a fair coin. The most common result was 5/10, as expected, but that only happens ~25% of the time…

26 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads 50 100 3/10 Heads 100,000 8/10 Heads ~75% …~75% of the time the result was not exactly 5/10, deviating by at least 1 “Heads” (or 0.1).

27 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads 50 100 3/10 Heads 100,000 8/10 Heads ~35% …35% it deviates by 2 or more.

28 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads 50 100 3/10 Heads 100,000 8/10 Heads ~11% …11% of the time it deviates by 3 or more! So, in a given experiment flipping a fair coin 10 times, you would expect the number of Heads to fall between 3 and 7.

29 What is the sampling distribution?
If you were to repeat your study lots and lots of (/infinitely many) times, what would the spread of results be? Assume H0 is true i.e. P(H)=0.5 Run Result 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 10 9/10 Heads 50 100 3/10 Heads 100,000 8/10 Heads ~2% …it deviates by 4 or more just 2% of the time. So, if you flip a fair coin 10 times, then getting 0, 1, 9 or 10 Heads would be a surprising (although obviously not impossible) outcome! So, a 0, 1, 9 or 10 would be the only values in this experiment that would raise suspicions about the fairness of this coin.

30 Is the coin biased? If the coin is fair, then what range of values for “number of heads out of 10 flips” would be reasonably likely? 3-7, 2-8 at a push. ~2% So, the statistician would have a similar answer to us seems reasonably likely, 2-8 at a push, except derived rigorously, quantified and thus expressed with less ambiguity.

31 Is the coin biased? If the coin is fair, then what range of values for “number of heads out of 10 flips” would be reasonably likely? 3-7, 2-8 at a push. If the coin was fair, then would our observation be so unlikely that it actually casts doubt on the possibility? 7/10 doesn’t seem that unlikely from a fair coin. What is the p-value? Now, determining whether or not our observation – 7/10 Heads – is enough evidence to reject the null hypothesis (or, in our earlier non-statistical terms: is 7/10 heads so unlikely from a fair coin that this coin must not be fair?). Conduct a hypothesis test. Get a p-value.

32 What is the p-value? Do our results deviate so much from what we’d expect to see under the null hypothesis that we actually think the null hypothesis can’t be true? What is the probability that, under H0, we would see a result that was as extreme as, or more extreme than, the observed result? ~35% Is 7/10 Heads so unlikely from a fair coin that the coin must not be fair? H0: P(H)=0.5 Observed 7/10 Heads. p=0.3438 The p-value seeks to answer this question. (In terms of our coin, the p-value is answering this question…) The actual value itself is this (in terms of our coin, that means considering the sampling distribution under the null hypothesis – what is the likely spread of results if the coin is fair – and then looking at our estimate and summarising the probability that you would see such a deviation (or larger) if the coin was fair). Here we get p= Generally speaking, anything above 0.1 is considered to be very weak evidence, therefore here we say that we have insufficient evidence to reject H0. IMPORTANT TO NOTE: we are not “accepting H0” – we’re not saying “the coin is a fair coin”. We’re just saying that there is not enough evidence to rule it out – we don’t know whether it’s fair or not.

33 Common p-value misconceptions
1. Non-significant p-value means that the null hypothesis is true NO! It means that we can’t reject the null hypothesis, not that the null hypothesis is true! 2. Statistical significance = clinical significance NO! It just means we can reject the null hypothesis. It doesn’t tell us anything about the effect size. p-values, in isolation, aren’t really that useful. Confidence intervals are what contain the really useful information! Could have a list of different interpretations here and ask people which they think is correct?

34 Common p-value misconceptions
1. Non-significant p-value means that the null hypothesis is true NO! It just means that we can’t reject the null hypothesis, not that the null hypothesis is true! 2. Statistical significance = clinical significance NO! It just means we can reject the null hypothesis. It doesn’t tell us anything about the effect size. p-values, in isolation, aren’t really that useful. Confidence intervals are what contain the really useful information! Could have a list of different interpretations here and ask people which they think is correct?

35 Common p-value misconceptions
1. Non-significant p-value means that the null hypothesis is true NO! It just means that we can’t reject the null hypothesis, not that the null hypothesis is true! 2. Statistical significance = clinical significance NO! It just means we can reject the null hypothesis. It doesn’t tell us anything about the effect size. p-values, in isolation, aren’t really that useful. Confidence intervals are what contain the really useful information! Could have a list of different interpretations here and ask people which they think is correct?

36 Common p-value misconceptions
1. Non-significant p-value means that the null hypothesis is true e.g. 7/10 Heads (p=0.3438) doesn’t mean that the coin is definitely fair! 2. Statistical significance = clinical significance e.g. Observing 55,500/110,000 Heads might give something like: Est:=0.505 (95% CI: 0.502, 0.508; p=0.0009)! Yes, the coin is probably biased in the strictest sense, but not by an amount that actually matters! p-values, in isolation, aren’t really that useful. Confidence intervals are what contain the really useful information! Could have a list of different interpretations here and ask people which they think is correct?

37 Insufficient evidence to reject H0.
Is the coin biased? If the coin is fair, then what range of values for “number of heads out of 10 flips” would be reasonably likely? 3-7, 2-8 at a push. If the coin was fair, then would our observation be so unlikely that it actually casts doubt on the possibility? 7/10 doesn’t seem that unlikely from a fair coin. p=0.3438 Insufficient evidence to reject H0. Again, the statistics lines up with our general feeling. 7/10 heads is not sufficient evidence to conclude that the coin is biased. Except the statistician expresses this with a p-value.

38 What is the confidence interval?
Is the coin biased? If the coin is fair, then what range of values for “number of heads out of 10 flips” would be reasonably likely? 3-7, 2-8 at a push. If the coin was fair, then would our observation be so unlikely that it actually casts doubt on the possibility? 7/10 doesn’t seem that unlikely from a fair coin. What can we infer about P(H)? It’s probably not very low (e.g or less) What is the confidence interval? We can’t conclusively answer our main question due to lack of evidence, but we can still make some inferences about P(H) i.e. narrow down the plausible possibilities for its value. The statistician would use a confidence interval for this.

39 P(H)=0.5 Run Result Est. and 95% CI 0.25 0.5 0.75 1 1 7/10 Heads 2
0.25 0.5 0.75 1 P(H)=0.5 Run Result Est. and 95% CI 1 7/10 Heads 2 5/10 Heads 3 4/10 Heads 4 5 6 6/10 Heads 7 8 9 10 9/10 Heads 11 12 13 3/10 Heads 14 15 2/10 Heads 16 17 18 19 20 8/10 Heads To see how confidence intervals work, let’s look back at the results I got when I repeated the 10-flip experiment with a fair coin 100,000 times. Specifically, let’s just consider the first 20 runs.

40 P(H)=0.5 Run Result Est. and 95% CI 0.25 0.5 0.75 1 1 7/10 Heads
0.25 0.5 0.75 1 P(H)=0.5 Run Result Est. and 95% CI 1 7/10 Heads 0.7 (0.39, 0.90) 2 5/10 Heads 3 4/10 Heads 4 5 6 6/10 Heads 7 8 9 10 9/10 Heads 11 12 13 3/10 Heads 14 15 2/10 Heads 16 17 18 19 20 8/10 Heads In the first one we have 7/10 heads; so our estimate of P(H)=0.7 (7/10) and the corresponding confidence interval is 0.39 – 0.9. Now, without delving into the method of calculating that interval or how it’s derived, let’s just consider what it represents. The confidence interval is simply an expression of uncertainty. It’s like if you’re about to drive somewhere to meet someone and they ask you “how long are you going to be?” – you wouldn’t respond with “I’m going to be exactly 12.5 minutes”, because you don’t know what the traffic etc. will be like. Instead, you’re more likely to say something like “I’ll be around minutes”. You express your estimation of an unknown quantity with an interval that represents your uncertainty. This is what the confidence interval is doing. Specifically, here we saw 7/10 heads and we conclude that P(H) is between 0.39 and is our best estimate, but we can’t rule out any of the values in that interval.

41 P(H)=0.5 Run Result Est. and 95% CI 0.25 0.5 0.75 1 1 7/10 Heads
0.25 0.5 0.75 1 P(H)=0.5 Run Result Est. and 95% CI 1 7/10 Heads 0.7 (0.39, 0.90) 2 5/10 Heads 0.5 (0.24, 0.76) 3 4/10 Heads 4 5 6 6/10 Heads 7 8 9 10 9/10 Heads 11 12 13 3/10 Heads 14 15 2/10 Heads 16 17 18 19 20 8/10 Heads The second run gave 5/10 heads. If you analysed this experiment (ignoring the previous one), then you’d get an estimate for P(H) of 0.5 (5/10) and a confidence interval of 0.24 to 0.76 to express the uncertainty.

42 P(H)=0.5 Run Result Est. and 95% CI 0.25 0.5 0.75 1 1 7/10 Heads
0.25 0.5 0.75 1 P(H)=0.5 Run Result Est. and 95% CI 1 7/10 Heads 0.7 (0.39, 0.90) 2 5/10 Heads 0.5 (0.24, 0.76) 3 4/10 Heads 0.4 (0.17, 0.69) 4 5 6 6/10 Heads 7 8 9 10 9/10 Heads 11 12 13 3/10 Heads 14 15 2/10 Heads 16 17 18 19 20 8/10 Heads Etc.

43 P(H)=0.5 Run Result Est. and 95% CI 0.25 0.5 0.75 1 1 7/10 Heads
0.25 0.5 0.75 1 P(H)=0.5 Run Result Est. and 95% CI 1 7/10 Heads 0.7 (0.39, 0.90) 2 5/10 Heads 0.5 (0.24, 0.76) 3 4/10 Heads 0.4 (0.17, 0.69) 4 5 6 6/10 Heads 7 8 9 10 9/10 Heads 11 12 13 3/10 Heads 14 15 2/10 Heads 16 17 18 19 20 8/10 Heads Got 5/10 again on run 4 so the CI is the same as for run 2.

44 P(H)=0.5 Run Result Est. and 95% CI 0.25 0.5 0.75 1 1 7/10 Heads
0.25 0.5 0.75 1 P(H)=0.5 Run Result Est. and 95% CI 1 7/10 Heads 0.7 (0.39, 0.90) 2 5/10 Heads 0.5 (0.24, 0.76) 3 4/10 Heads 0.4 (0.17, 0.69) 4 5 6 6/10 Heads 7 8 9 10 9/10 Heads 11 12 13 3/10 Heads 14 15 2/10 Heads 16 17 18 19 20 8/10 Heads Same as 5. Note that all of the confidence intervals so far contain the true value (0.5).

45 P(H)=0.5 Run Result Est. and 95% CI 0.25 0.5 0.75 1 1 7/10 Heads
0.25 0.5 0.75 1 P(H)=0.5 Run Result Est. and 95% CI 1 7/10 Heads 0.7 (0.39, 0.90) 2 5/10 Heads 0.5 (0.24, 0.76) 3 4/10 Heads 0.4 (0.17, 0.69) 4 5 6 6/10 Heads 0.6 (0.31, 0.83) 7 8 9 10 9/10 Heads 0.9 (0.57, 1.00) 11 12 13 3/10 Heads 0.3 (0.10, 0.61) 14 15 2/10 Heads 0.2 (0.05, 0.52) 16 17 18 19 20 8/10 Heads 0.8 (0.48, 0.95) And if we keep doing this for all of them, we see that in fact most of the confidence intervals contain the true value 0.5. In fact, this is the whole point of a confidence interval. A 95% confidence is derived in such a way that 95% of the time (or 19/20 times) it will contain the true value. We can see here that 19/20 experiments yielded confidence intervals containing 0.5, the true P(H) value of the fair coin used for this, and just one – run 10 – produces a confidence interval that does not contain 0.5. But of course in reality we don’t repeatedly run our experiment lots and lots of times. We just do it once, so it’s a bit of a lottery really which of these results we end up with (with some being more likely than others, as we know from our sampling distribution!). But, what we do know is that whichever confidence interval we get, it has been generated via a process that, if repeated lots of times, gives an interval that contains the true value 95% of the time…

46 What can we infer about P(H)?
Is the coin biased? If the coin is fair, then what range of values for “number of heads out of 10 flips” would be reasonably likely? 3-7, 2-8 at a push. If the coin was fair, then would our observation be so unlikely that it actually casts doubt on the possibility? 7/10 doesn’t seem that unlikely from a fair coin. What can we infer about P(H)? It’s probably not very low (e.g or less) 95% CI:= (0.39, 0.9) Can confidently rule out the possibility that P(H) lies outside this interval …so, the statistician would say that “we’re 95% confident that P(H) lies within this interval or, in other words, we can confidently rule out values outside this interval.

47 Does our result tell us anything about P(H)?
0.25 (??) 0.95 (??) 0.25 0.5 0.7 0.75 1 Probability Conclusions: Can’t say whether or not the coin is fair. …but pretty confident that P(H) is not very low (e.g. 0.2 or less) So, a quick look back at what we thought before…

48 Insufficient evidence to reject H0. p=0.3438.
0.39 0.9 0.25 0.5 0.7 0.75 1 Probability Conclusions: Insufficient evidence to reject H0. p= Estimate and 95% Confidence interval for P(H) is 0.70 (0.39, 0.90). …and a summary of what the statistics would tell us. COMMON MISCONCEPTION: typically you hear people interpreting “non-significant” results like this as proof that the null hypothesis is true. If these results got published somewhere, I guarantee that someone would interpret them as “so that coin is fair then”. This is NOT the correct interpretation. There is simply insufficient evidence to reject the null hypothesis. The coin might be fair, or it might be biased. We don’t know so, in particular, we can’t conclude that the coin is biased. In this case, the sample size might be letting us down. We have only flipped the coin 10 times, that is not a lot to go on, and so this “non-significant” result may in fact be

49 Topics to cover Sampling distribution p-value (Hypothesis testing)
Confidence interval Significance & Power

50 this coin is totally biased. P(H)≈0.8
Study shows that this coin is totally biased. P(H)≈0.8 …so why did our study not produce a “significant” result? p=0.3438 0.39 0.9 0.25 0.5 0.7 0.75 0.8 1 Probability

51 Common p-value misconceptions
1. Non-significant p-value means that the null hypothesis is true NO! It just means that we can’t reject the null hypothesis, not that the null hypothesis is true! 2. Statistical significance = clinical significance NO! It just means we can reject the null hypothesis. It doesn’t tell us anything about the effect size. p-values, in isolation, aren’t really that useful. Confidence intervals are what contain the really useful information! Could have a list of different interpretations here and ask people which they think is correct?

52 Sampling distributions of number of heads from 10 coin flips where:
~2% H0: P(H)=0.5 Significance Probability of (incorrect) significant result under H0 Power Probability of (correct) significant result under some specific Ha ~38% P(H)=0.8

53 The study was under-powered. (Power≈38%)
The coin was biased! Why did our study not produce a significant result? The study was under-powered. (Power≈38%)

54 Sampling distributions of number of heads from 28 coin flips where:
~5% H0: P(H)=0.5 Significance Probability of (incorrect) significant result under H0 Power Probability of (correct) significant result under some specific Ha ~90% Don’t have time to flip a coin 100 times. What’s the minimum sample size that gets me decent power? P(H)=0.8

55 Sampling distributions of number of heads from 10 coin flips where:
~2% H0: P(H)=0.5 Significance Probability of (incorrect) significant result under H0 Power Probability of (correct) significant result under some specific Ha ~38% P(H)=0.8

56 H H H T H H T H T H H H T H H H H H H H H H H T H H H H 23/28 Heads
0.25 0.5 0.75 1 Probability H H H T H H T H T H H H T H H H H H H H H H H T H H H H We flip our coin 10 times, and this is what we get. 7 Heads, 3 Tails. Just intuitively, without doing any formal statistics, what does this result tell us about our coin? Can we answer our primary question? (I.e. can we rule out the possibility that P(H)=0.5?) If the coin was fair, then would this observation of 7/10 so unlikely that it casts reasonable doubt on the possibility? 23/28 Heads

57 Insufficient evidence to reject H0. p=0.0009.
0.64 0.92 0.25 0.5 0.75 1 Probability Conclusions: Insufficient evidence to reject H0. p= Estimate and 95% Confidence interval for P(H) is 0.82 (0.64, 0.92). …and a summary of what the statistics would tell us. COMMON MISCONCEPTION: typically you hear people interpreting “non-significant” results like this as proof that the null hypothesis is true. If these results got published somewhere, I guarantee that someone would interpret them as “so that coin is fair then”. This is NOT the correct interpretation. There is simply insufficient evidence to reject the null hypothesis. The coin might be fair, or it might be biased. We don’t know so, in particular, we can’t conclude that the coin is biased. In this case, the sample size might be letting us down. We have only flipped the coin 10 times, that is not a lot to go on, and so this “non-significant” result may in fact be

58 Summary Sampling distribution
VERY important! Inferences (p-values, confidence intervals…) derive from this. It’s important that it is determined properly! p-value (Hypothesis testing) Determines whether you can reject the null hypothesis. Nothing more. Confidence interval Very useful! Information about the magnitude of the thing we’re trying to estimate (specifically which values we can confidently rule out). Significance & Power Probability of erroneous significant result (given H0 true) and correct significant result (given H0 not true) respectively. In particular, these are used to determine sample sizes.


Download ppt "Statistics fundamentals"

Similar presentations


Ads by Google