Presentation on theme: "= £55 and = £50 more than a ball How much is a ball?"— Presentation transcript:
= £55 and = £50 more than a ball How much is a ball?
“Are people good intuitive statisticians? … …expert colleagues, like us, greatly exaggerated the likelihood that the original result of an experiment would be successfully replicated even with a small sample. They also gave very poor advice to a fictitious graduate student about the number of observations she needed to collect. Even statisticians were not good intuitive statisticians.”
Introduction to Statistical Considerations in Experimental Research Dr. Richy Hetherington and Dr. Kim Pearce
Important messages early Statistical Support is available for your needs Get advice at the right time Keep it simple Set up your tests to get noteworthy results p < 0.05 is not essential for reporting interesting results
An Experiment “The action of trying anything, or putting it to proof; a test, trial” Oxford English Dictionary My Life as a Turkey Book Illumination in the Flatwoods Joe Hutto
Support for broader applied health research methods For students with social or health research Signposting with –Surveys –Interviews –Qualitative methods Tuesday 11/11/2014 (10:00 - 12:00) Dr Justin Presseau
Today’s Session Start a live experiment Discussion of considerations when setting up experiments Analyse the results of our experiments with thoughts on what to look out for The best help for you
Results Now how do we analyse the results? FIRST MAKE A GUESS OF THE TOTAL HOW MANY YOU ARE OUT BY AND WRITE IT ON THE BACK
Results Unfold the sheet Find the difference between your guess and the real answer (take the smallest number from the biggest) Write it in the final column Add all the numbers in the last column That will be your ‘coefficient of unconscious counting’
A New Hypothesis Men are more confident about their numerical guesses than women. Less chance of error Maybe more chance of significance ?!
Take Home Messages Leave no stone unturned (use all possible sources of information) Training to help (workshops throughout the year): Library Databases Robust search Methodologies for Literature Review Systematic Review Alerting Services Advanced Medline Think about what is coming next
Planning Your Experiments Take home message. Don’t believe everything you read & Introduction to Critical Appraisal (online) Academic Integrity and Plagiarism Use non-rigorous experiments but be prepared to repeat them with rigour Take home message. Get as much help as is available in setting up your experiments (shy bairns get nowt!)
Make every result count Take home message. Set up your experiments so all eventualities are interesting Results can be meaningful and interesting without being statistically significant Also reporting non-significant findings avoids others from needlessly repeating that experiment
Subject Selection and Randomisation Make sure the sample you take is representative of what you are testing Samples should be made randomly to avoid bias e.g. are you a representative sample of the population. If I want more left handed people for my experiment how should I find them?
Replication Combining datasets from separate experiments is difficult Datasets can be treated as replicates if all other variables are the same or weighted Analysis of replicates indicate the amount of variation in a result
Controls Controls should give you internal validity Take as much care with controls as with samples Each experiment requires its own control
Why have small sample sizes Non-Human Primates often n=1 Very rare conditions the population is small Animal experimentation
Get a statisticians help now “To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.” Dr. R. A. Fisher ca1938
Males vs Females The 2 sample t-test (Parametric Test) Subjects (units) are usually randomly assigned to two groups. One of the groups undergoes experimental manipulation (e.g. has a treatment applied), the other group is the control. In many examples, however, two groups are compared where membership is ‘fixed’ e.g. males vs females, left vs right handed etc. We are testing if the two population means are equal. The 2 sample t-test statistic makes use of 1.the difference between (average) value of male and female groups, 2.the (pooled) standard deviation, and 3.the size of the male and female groups. (We do not have to have equal numbers in our groups) We compare the value of the statistic to a statistical distribution. The significance of the statistic is obtained and is expressed by a ‘p value’. When p is < 0.05 we say that the statistic is statistically significant i.e. in this case, there is evidence that the male group is different to the female group (in the population).
Result using the data available Do males and females differ? P=? Let’s look at the data on a plot.
Using smaller samples 5 males and 5 females were randomly chosen and the 2 sample t-test was again carried out. Is there evidence that the male group is different to the female group (in the population)? As the group size is small, there is a reduced chance of observing a difference between the male and female groups when we conduct the test.
What is the power of these tests? We would like our test to have high power which means that the test will detect a difference when it truly exists. The power of the test is influenced by different things including sample size. The lower sample size of our 2 nd test (using 5 males and 5 females) means that the test’s power has been reduced.
Power of Our tests Test 1 (large sample sizes). Power = Test 2 (5 males, 5 females). Power=
What influences the power of a test? 1.As variation in the sample increases, power decreases. 2.As the difference we care about decreases, power decreases. 3.As sample size decreases, power decreases.
Prospective Power Analysis (used before collecting data) Finding a sample size to detect an effect size we care about at a specific power. Usually need to specify: Alpha level Variance (from literature or pilot data) Statistical power Effect size we care about* * Effect size could be, for example, the difference between the means
Retrospective Power Analysis (after test has been done on collected data): controversial! Finding the power of the test that you have performed to detect “an effect size”. Usually need to specify: Alpha level Variance (from data) Sample Size Effect size
Retrospective Power Analysis You could: calculate power based on effect size you observe in your data: not recommended…… Power calculated in this way is related to the p value of the test and both are dependent on the observed effect size. - Non significant test tends to have low power; - Significant test tends to have high power.
Retrospective Power Analysis Calculate power based on effect size you care about. Less controversial. For example, say we get a non significant test…we can work out the power that your test has to detect an effect size that you care about. If test has a low power to detect this effect size then you can do something about it (e.g. collect more data) to increase the power, then continue to evaluate the same problem; if test has high power to detect this effect size, then you may conclude that there is no meaningful difference (effect) and refrain from collecting additional data. Suggested that you also report 95% confidence interval for power (as variance is estimated from sample data). Which effect size should I choose? Look at a range of effect sizes. Can also use ‘reverse power analysis’ : determine effect size detectable with a certain power…question could be ‘what effect size am I able to detect with my data at power 0.8?’
Retrospective Power Analysis Calculate confidence intervals about the effect size calculated from your data –recommended. For example, if dealing with differences between means, we can be 95% confident that the true difference between the means (in the population) lie within this interval. If a zero is contained within the 95% confidence interval, this means there is no evidence to suggest that there is a difference between means. We ask ourselves : does the ‘difference we care about’ lie in this interval? Confidence intervals ‘quantify our uncertainty’.
Retrospective Power Analysis References Hoenig, J.M. and Heisey, D.M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician 55, 19--24. Thomas, L. (1997). Retrospective power analysis. Conservation Biology 11, 276-280. Lenth, R.V. (2001). Some Practical Guidelines for Effective Sample Size Determination. The American Statistician, 55, No. 3, 187-193.
More than 2 groups: 1-way Analysis of Variance (ANOVA) We are testing if population means are equal when there are 3+ groups. 1-way ANOVA is also called a ‘completely randomised’ experiment. Subjects are regarded as being homogeneous ‘units’; even so, the subjects are assigned to the experimental groups at random to reduce the risk of any (unknown) variation influencing the experiment.
More than 2 groups: 1-way Analysis of Variance (ANOVA) Each group is comprised of different subjects. A measurement is recorded for each subject (in the above, say, “test score”). Although not necessary, it is usually a good idea to have the same number of subjects in each treatment group. Hypothetical experimental set-up. Say a treatment 1 is learning method 1; treatment 2 is leaning method 2:
Adding a 2 nd Factor: 2-Way ANOVA In a 2 –way ANOVA we have 2 factors. Experiments such as this with two or more crossed factors are called factorial experiments. There are n replicates per treatment combination (here 10 replicates). There are 10 different people per treatment combination. The subjects (units) are considered homogeneous above & these units are randomly assigned to the 6 experimental conditions (combinations) Here the 2 factors are ‘alertness’ and ‘drug’ type – by testing, we can establish if there are differences between (i) levels of alertness and (ii) levels of drug and (iii) establish if there is a alertness x drug interaction.
2-Way ANOVA: What is meant by an interaction? There is a significant interaction. The lines on the plot are non-parallel. The difference in (mean) driving performance between fresh and tired subjects depends on which treatment (drug) they have received. If an interaction is significant you must be careful interpreting the main effects....here, the effect of being fresh or tired is dependent on which level of drug you are considering.
1-way ANOVA - revisited What are its disadvantages? 1.We may get differences between treatment groups occurring not just because the treatments are having different effects, but also because the groups of people tested are different (due to IQ levels, age, experience etc) i.e. there is a lot of noise which can cloud the result 2.It uses a lot of subjects
Repeated Measures Each subject has a measure taken at each level of the treatment factor. In the example below, ‘learning method’ is the factor. It is called a ‘within-subjects’ factor. Learning Method OneTwoThree Person 1 Person 2......... Person 20 Note this is a simple example! There are many other more complex designs.
Repeated Measures Disadvantages: Practice Effect: say if you had to learn 3 similar lists. The first list was learned under a control condition, then the second under method A, then the third under method B. An improvement under method A, for example, may be a practice effect – the more lists one learns, the better one gets at learning lists. Carry over effect: Recall of items in a list is prone to interference from items in previous lists. Order Effect (dependent on sequence of conditions). If we moved from method A to control condition, it would be almost impossible for the subject to cease to use method A on demand.
Repeated Measures Counterbalancing Remedy by “counterbalancing”.....the order of presentation of the levels making up the repeated measures factor is varied from subject to subject. It is hoped that carry over effects and order effects will balance out. Counter balancing makes little sense in some situations e.g. it would make little sense to have the control condition coming last in the above example.
Repeated Measures Instead of the effects of different treatments being studied for a set of subjects, we may look at the effect of something over time. For example: does IQ change when we compare a set of subjects at age 12, age 13, age 14 and age 15? A set of subjects learns a list of 50 words and are given 3 trials; the number of words recalled correctly per trail is recorded. We can test if the subjects learn as a function of practice.
I haven’t much experience….. I have no time TO LEARN IT! Ha Ha….I will get someone to do it for me….. I’ve never done statistics before! I did statistics years ago and can’t remember a thing! Statistics is easy – I will have no trouble….. HELP!!!! YAWN – Statistics is SO tedious and boring!!! Why should I have to learn it when it’s not going to be part of my job? Aaarrrgggghhh!!! I thought I’d left maths behind a long time ago! It’s not important….. I know what I want to do but which statistical test should I use?…..
When do you need a 1-2-1 statistical session? When: 1.You do not know what sample size is required to get a reliable result 2.You need to check that your proposed design is appropriate for a statistical test 3.When you have some idea of how to analyse your data but you need to double check and/or get further advice on appropriate methods 4.You need some suitable study references
Statistics 1-2-1 Sessions The statistics 1-2-1 sessions are only 1 hour long They are NOT: 1.Meant as a means of regular intensive statistical tuition 2.Provided to solve a list of all of your statistical problems 3.Provided to have a statistician do your analysis for you 4.Provided to correct your results 5.A means to have a statistician interpret results and write your conclusions PLEASE send a detailed description of your query at least 2 days before the session. PLEASE avoid bringing queries/papers to the session which have not previously been seen by the statistician.
Statistics –The Way Forward Think Ahead!: what are the potential problems? Drop out? Missing Values? Use your supervisor Read some statistics books that feature the types of tests you need (manuals written to accompany statistical packages are good) Don’t gather your data, THEN try and fit a statistical test to a messy data set....you are going to run into problems. E.g. missing values, unequal replicates etc. It could make your analysis much more difficult than it should have been....and you may have to learn advanced techniques. Please don’t leave the statistics until the last minute. The analysis can be VERY time consuming and the writing of associated conclusions has to be spot on!
Analysis Software There are many statistics packages available. MINITAB & SPSS are the most widely used & among the most straightforward to learn (Minitab has a good help facility) The ISS (computing service) provides support to users. Other packages (e.g. SAS) may be used in various schools. Excel is not recommended as a piece of analysis software.
So what is right for you Refresher in stats – –ISRU very basic stats (45 minutes) –ISRU basic stats (3 hours) clinical / pure science Overview of Stats packages SPSS beginners and Advanced Getting stated with SAS MatLab Introduction to Applied Health Research Methods One to one stats is useful for anyone at the right time Maths aid by appointment (ncl.ac.uk/students/mathsaid/support/book.htm) Applied Statistics (ICM students)
Important messages reminder Statistical Support is available for your needs Get advice at the right time Keep it simple Don’t underestimate what information is relevant Set up your tests to get noteworthy results p < 0.05 is not everything