Descriptive Statistics
Descriptive Statistics Summarising, organising and describing raw data Limitation: can’t tell whether the difference is large enough to attribute to the IV, or whether it has occurred by chance Does not establish a cause-effect relationship between the variables
Includes: Percentages Measures of central tendency Spread of scores Graphs
1. Percentages The proportion of a sample Quick, effective comparison tool Number of people with feature X 100 size of the sample
Question 1 Of 12 subjects, 2 scored 2 points, 5 scored 4 points, 4 scored 3 points and 1 scored 9 points. Compare the percentage of the sample who scored 4 points with that who scored 2 points.
2. Measures of Central Tendency Provides one score that represents all the scores in the sample. Includes: Mean Mode Median
a) Mean All data scores added together and divided by the number of scores. Most suitable for interval or ratio data (data on a scale, most precise data) Strength – most sensitive measure Weakness – can be distorted by extreme (large or small) scores (outliers)
b) Mode The most common score in the set. Most suitable for nominal data (number of times something has occurred) Strength – easy to obtain, not influenced by extreme scores Weakness - Can be unreliable in small samples, not useful if bi/tri modal
c) Median The middle number in the set when they are all arranged in numerical order. Most suited for measures of ordinal data (data that can be ordered / ranked) Strength - Not affected by outliers Weakness – can be distorted by small samples, is less sensitive
Question 2. Of 12 subjects, 2 scored 2 points, 5 scored 4 points, 4 scored 3 points and 1 scored 9 points. Calculate the mean, median and mode of this sample. Which measure of central tendency would be best to use in this instance? Explain.
3. Spread of Scores Shows how the data is spread – its variability Includes: Range Standard deviation
a) Range Calculated by subtracting the lowest score from the highest score Eg: 1,2,3,4,5,6,7,8,9 Range = 9 – 1 = 8 Strength – quick and easy to calculate Weakness – is directly affected by outliers, may not represent the majority of the data
c) Standard Deviation Calculates the average amount all scores deviate from the mean A low sd means the data is grouped around the mean, a high sd means the data is spread away from the mean Strength – most sensitive measure, can be used to relate the sample to the population Weakness – more time consuming to calculate
Example of variance and standard deviation Eg: Calculate the variance and standard deviation of: 1, 5, 10, 15, 19 Mean = 50/5 = 10 var = (1-10)2 + (5-10)2 +(10-10)2 +(15-10)2 +(19-10)2 /5 = (-92 + -52 + 02 + 52 + 92) / 5 = (81 + 25 + 0 + 25 + 81) / 5 = 212/5 = ?? SD = ??
Question 4. Calculate the standard deviation of the following data: 1,3,5,7,9
4. Graphing
Requirements for graphs Title that summarises the IV and DV or both axes Axes labelled with both title and units Scale – must begin at the origin (0,0) and cuts must be present if required More than half the graph paper must be used X-axis is the IV, x-axis is the DV (if applicable) Points must be plotted as accurately as possible The correct type of graph for the data must be drawn
a) Frequency Distribution Table Used for large amounts of data Uses class intervals / categories Counts how many times a score occurs Can be graphed as a frequency histogram (intervals on the x axis, frequency on the y axis) Shows data for all categories, even those with zero frequencies
b) Bar Graph Used for discrete data (categories) that are not on a continuum Bars must not touch Categories on the x-axis, frequency on the y-axis
c) Pie Chart Displays categories of data as a proportion of the total population Data must be converted into % and then converted to degrees of a circle
d) Line Graph Shows all data as a series of dots connected by lines Used for data that is continuous/on a continuum. Can show more than one data set on the same set of axis. X-axis for continuous variable. Y-axis what we are measuring.
e) Scatter graphs/plot Plot pairs of scores to show their correlational relationship Patterns can be calculated using a line of best fit
f) Frequency Distribution Curve Show continuous psychological variables Bell shaped and symmetrical about the mean Percentages of scores covered by the curve are known
Inferential statistics Statistical significance A number obtained from interdental statistics that provides an estimate of how often experimental results could have occurred by chance alone, or by the independent variable.
If p is less than 0.05 The difference between control and experimental groups is statistically significant The difference is unlikely to be due to chance alone The difference is likely to be due to the independent variable The hypothesis is supported/a conclusion can be drawn.
If p is more than 0.05. The difference between control and experimental groups is NOT statistically significant The difference is likely to be due to chance alone The difference is unlikely to be due to the independent variable The hypothesis is rejected/no conclusion can be drawn.
Revision Questions 1. Put the following into order from lowest to highest probability that the results are due to chance: 0.01 0.10 0.05 0.50 0.005 0.02 2. Circle the values in your list that are statistically significant