Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning outcomes By the end of this session you should know about:

Similar presentations


Presentation on theme: "Learning outcomes By the end of this session you should know about:"— Presentation transcript:

1 Choosing the right test Mathematics & Statistics Help University of Sheffield

2 Learning outcomes By the end of this session you should know about:
Some useful approaches to analysing data By the end of this session you should be able to: Recognise different data types Use a flowchart to decide which analysis method to use Undertake some basic analyses and construct appropriate charts for your data

3 Some initial thoughts

4 Planning a study What do you want to investigate and why? What are your aims? How are you going to investigate it? How will you collect your data? Who/what is in the sample? How will you summarise your data? How will you analyse your data?

5 Steps for choosing the right test (1)
Clearly define your research question What is your main outcome of interest? There may be more than one. What data type is it? The data type will determine the type of analysis Are the observations paired? Can it be characterised using a known distribution (i.e. parametric vs non-parametric test)? What may affect the outcome of interest? What data type is it/are they? How will your results be summarised? What charts can you use to display your results?

6 Data types: recap What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name.

7 Summary measures: recap
Data type Summary statistics Nominal Mode, %’s Ordinal Mode, Median, %’s Discrete (Count) %’s, can also calculate means and medians as you would for continuous data but does depend on how many separate counts you have Continuous: normally distributed Mean, Standard deviation skewed Median, Interquartile range

8 Chart types: recap One variable Two variables
Categorical: Pie chart, barchart Numerical discrete: barchart Numerical continuous: histogram, boxplots Two variables Both categorical: stacked barchart, clustered barchart, multiple pie charts One categorical / one numerical discrete: boxplots (sometimes!), multiple barcharts One categorical / one numerical continuous: boxplots, multiple histograms Both numerical: scatterplot

9 Steps for choosing the right test (2)
Are you interested: Testing differences between groups. How many groups are there? Assessing/modelling the relationship between variables Are the observations paired? Is the pairing due to having repeated measurements of the same variable for each subject? Does the test you have chosen make any assumptions? Are the assumptions met? e.g. assumption of normality for t-test

10 Test assumptions Parametric tests: Non-parametric:
Generally assume data or some function of the data follows a known distribution e.g. normal Parametric tests: Non-parametric: All tests have assumptions and one of the main assumptions for a lot of the most common tests is that the data is normally distributed. Tests with this assumption are called parametric tests. Non-parametric tests do not require this assumption as they are based on the ranks of the data rather than the actual data. Nonparametric techniques are usually based on ranks/signs rather than actual data

11 Non-parametric methods are used when:
Dependent variable is ordinal A plot of the data appears to be very skewed or the data do not seem to follow any particular shape or distribution (e.g. Normal) Assumptions underlying parametric test not met There are potentially influential outliers in the dataset Sample size is small

12 Comparing averages (1) Normally distributed Skewed or ordinal
Comparing BETWEEN groups 2 Independent sample t-test One way ANOVA 3+ Mann-Whitney Kruskall-Wallis For comparing means ask two questions: Are there repeated measurements of the same variable for each participant? How many means are being compared? 2 = t-test, 3+ = ANOVA

13 Paired data (1) Most commonly, measurements from the same individuals collected on more than one occasion Can be used to look at differences in mean score: 2 or more time points e.g. before/after a diet 2 or more conditions e.g. hearing test at different frequencies Each person listened to a sound until they could no longer hear it at three different frequencies. Would use Repeated measures ANOVA to test for a difference between the frequencies.

14 Comparing averages (1) Normally distributed Skewed or ordinal
Comparing BETWEEN groups Comparing measurements WITHIN the same subject 2 Independent sample t-test One way ANOVA Paired t-test Repeated measures ANOVA 3+ Mann-Whitney Kruskall-Wallis For comparing means ask two questions: Are there repeated measurements of the same variable for each participant? How many means are being compared? 2 = t-test, 3+ = ANOVA Wilcoxon signed rank test Friedman

15 Comparing averages (2) Comparing: Dependent (outcome) variable
Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Nominal (Binary) Independent t- test Mann-Whitney test/ Wilcoxon rank sum Comparing 3+ INDEPENDENT groups Comparing 2 measurements on the same subject e.g. weight before and after a diet Comparing 3+ measurements on the same subject

16 Comparing averages (2) Comparing: Dependent (outcome) variable
Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Nominal (Binary) Independent t- test Mann-Whitney test/ Wilcoxon rank sum Comparing 3+ INDEPENDENT groups Nominal One-way ANOVA Kruskal-Wallis test Comparing 2 measurements on the same subject e.g. weight before and after a diet Comparing 3+ measurements on the same subject

17 Comparing averages (2) Comparing: Dependent (outcome) variable
Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Nominal (Binary) Independent t- test Mann-Whitney test/ Wilcoxon rank sum Comparing 3+ INDEPENDENT groups Nominal One-way ANOVA Kruskal-Wallis test Comparing 2 measurements on the same subject e.g. weight before and after a diet Time/ Condition variable Paired t-test Wilcoxon signed rank test Comparing 3+ measurements on the same subject

18 Comparing averages (2) Comparing: Dependent (outcome) variable
Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Nominal (Binary) Independent t- test Mann-Whitney test/ Wilcoxon rank sum Comparing 3+ INDEPENDENT groups Nominal One-way ANOVA Kruskal-Wallis test Comparing 2 measurements on the same subject e.g. weight before and after a diet Time/ Condition variable Paired t-test Wilcoxon signed rank test Comparing 3+ measurements on the same subject Time/ condition variable Repeated measures ANOVA Friedman test

19 Examples?

20 What to check for normality
Comparing: What to check for normality Non-parametric test for ORDINAL variable or skewed data Independent samples t-test Dependent variable by group Mann-Whitney U test ANOVA Residuals (differences between each individual and their group mean) Kruskall-Wallis test Paired t-test Paired differences Wilcoxon signed rank test Repeated measures ANOVA Residuals by time point (differences between each individual and time point mean) Friedman test

21 What to check for normality
Comparing: What to check for normality Non-parametric test for ORDINAL variable or skewed data Independent samples t-test Dependent variable by group Mann-Whitney U test ANOVA Residuals (differences between each individual and their group mean) Kruskall-Wallis test Paired t-test Paired differences Wilcoxon signed rank test Repeated measures ANOVA Residuals by time point (differences between each individual and time point mean) Friedman test

22 What to check for normality
Comparing: What to check for normality Non-parametric test for ORDINAL variable or skewed data Independent samples t-test Dependent variable by group Mann-Whitney U test ANOVA Residuals (differences between each individual and their group mean) Kruskall-Wallis test Paired t-test Paired differences Wilcoxon signed rank test Repeated measures ANOVA Residuals by time point (differences between each individual and time point mean) Friedman test

23 What to check for normality
Comparing: What to check for normality Non-parametric test for ORDINAL variable or skewed data Independent samples t-test Dependent variable by group Mann-Whitney U test ANOVA Residuals (differences between each individual and their group mean) Kruskall-Wallis test Paired t-test Paired differences Wilcoxon signed rank test Repeated measures ANOVA Residuals by time point (differences between each individual and time point mean) Friedman test

24 Example 1: Did gender affect ticket price paid on the Titanic?
Steps: What is the outcome variable? What is the grouping / explanatory variable? What methods are available to analyse these data? Check the assumptions Conduct the appropriate analysis and report the results What test do you think would be appropriate?

25 Example 1: Did gender affect ticket price paid on the Titanic?
Steps: What is the outcome variable? Ticket price What is the grouping / explanatory variable? Gender What methods are available to analyse these data? Comparing ticket price between two groups (male and female). Most appropriate method is independent samples t-test Check the assumptions. Assumes that the groups are independent, the data in the two groups are normally distributed and the variability in the two groups is similar. Conduct the appropriate analysis and report the results. If the assumptions for the t-test are not met, use the Mann-Whitney U test

26 Example 1: Did gender affect ticket price paid on the Titanic?
Data were positively skewed A Mann-Whitney U test was carried out to compare the ticket price for men and women There was highly significant evidence (U=5.5, p < 0.001) to suggest a difference in the distributions of ticket price for male and females What else would be useful to know when interpreting these results? Medians: women £23 vs men £12

27 Investigating relationships
Comparing: Dependent (outcome) variable Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Pearson’s correlation Spearman’s correlation Predicting the value of one variable from the value of a predictor variable or looking for significant relationships Scale Any Simple linear regression Transform the data Nominal (binary) Logistic regression Assessing the relationship between two categorical variables Categorical Chi-squared test

28 Investigating relationships
Comparing: Dependent (outcome) variable Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Pearson’s correlation Spearman’s correlation Predicting the value of one variable from the value of a predictor variable or looking for significant relationships Any Simple linear regression Transform the data Nominal (binary) Logistic regression Assessing the relationship between two categorical variables Categorical Chi-squared test

29 Investigating relationships
Comparing: Dependent (outcome) variable Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Pearson’s correlation Spearman’s correlation Predicting the value of one variable from the value of a predictor variable or looking for significant relationships Any Simple linear regression Transform the data Nominal (binary) Logistic regression Assessing the relationship between two categorical variables Categorical Chi-squared test

30 Investigating relationships
Comparing: Dependent (outcome) variable Independent (explanatory) variable Parametric test (data are normally distributed) Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Pearson’s correlation Spearman’s correlation Predicting the value of one variable from the value of a predictor variable or looking for significant relationships Any Simple linear regression Transform the data Nominal (binary) Logistic regression Assessing the relationship between two categorical variables Categorical Chi-squared test

31 Examples?

32 Example 2: two categorical variables
Survival of the pushiest?

33 Example 2: Survival of the pushiest
Research question: Was survival on the titanic linked to nationality? Dependent: Survival Independent: Nationality What test do you think you should use? Chi-squared test

34 Example 2: Survival of the pushiest
The data suggests that Americans were more likely to survive as 56% survived compared to 32% of British and 35% of those from other countries Results from the χ2 test suggest, that there is evidence of a significant relationship between nationality and survival (p < 0.001)

35 Example 2: Further thoughts
Class was one of the most important predictors of survival on the Titanic 70% of Americans were travelling in 1st class A more detailed analysis, using logistic regression showed that nationality was NOT a significant predictor of survival after controlling for class In looking at these data is there any other information that would be useful? The numbers for each nationality

36 Learning outcomes You should now know about:
Some useful approaches to analysing data By the end of this session you should be able to: Recognise different data types Use a flowchart to decide which analysis method to use Undertake some basic analyses and construct appropriate charts for your data

37 Exercises Attempt the 4 exercises in SPSS
In each case you need to identify an appropriate analysis based on the dataset provided Remember to check the assumptions for any analysis you conduct Add value labels to the data if required Use the flow charts & table to assist you

38 Download the data In your web browser, type in the following address and save the files to your computer:

39 Maths And Statistics Help
Statistics appointments: Mon-Fri (10am-1pm) Statistics drop-in: Mon-Fri (10am-1pm), Weds (4-7pm)

40 Resources: All resources are available in paper form at MASH or on the MASH website

41 Contacts Follow MASH on twitter: @mash_uos Staff (stats)
Jenny Freeman Basile Marquier Marta Emmett Website Follow MASH on


Download ppt "Learning outcomes By the end of this session you should know about:"

Similar presentations


Ads by Google