Questions to consider What is my question / interest? Can the sample tell me about it? What are the relevant variables, what are their characteristics – are they in the form I want them? (If not, change them into that form). What is the most appropriate method to use? If it works what will / might it show me? NB negative results are not necessarily uniformative If it doesn’t work – why might that be?
Levels of measurement Nominal e.g., colours numbers are not meaningful Ordinal e.g., order in which you finished a race numbers don’t indicate how far ahead the winner of the race was Interval e.g., temperature equal intervals between each number on the scale but no absolute zero Ratio e.g., time equal intervals between each number with an absolute zero.
Univariate analysis Measures of central tendency –Mean= –Median= midpoint of the distribution –Mode= most common value
Mode – value or category that has the highest frequency (count) agefrequency (count) sexfrequency (count) 16-2512male435 26-3520female654 36-4532 46-5527 56+18
Median – value that is halfway in the distribution (50 th percentile) age12141821364142 median age121418213641 median=(18+21)/2=19.5
Mean – the sum of all scores divided by the number of scores What most people call the average Mean: ∑x / N
Which One To Use? ModeMedianMean Nominal Ordinal Interval
Measures of dispersion –Range= highest value-lowest value – variance, s 2 = –standard deviation, s (or SD)= The standard error of the mean and confidence intervals –SE
Definitions: Measures of Dispersion Variance: indicates the distance of each score from the mean but in order to account for both + and – differences from the mean (so they don’t just cancel each other out) we square the difference and add them together (Sum of Squares). This indicates the total error within the sample but the larger the sample the larger the error so we need to divide by N-1 to get the average error.
Definitions: Measures of Dispersion Standard deviation: due to the fact that we squared the sums of the error of each score the variance actually tells us the average error². To get the SD we need to take the square root of the variance. The SD is a measure of how representative the mean is. The smaller the SD the more representative of your sample the mean is.
Definitions: Measures of Dispersion Standard error: the standard error is the standard deviation of sample means. If you take a lot of separate samples and work out their means the standard deviation of these means would indicate the variability between the means of different samples. The smaller the standard error the more representative your sample mean is of the population mean.
Definitions: Measures of Dispersion Confidence Intervals: A 95% confidence interval means that if we collected 100 samples and calculated the means and confidence intervals 95 of those confidence intervals would contain the population mean.
Describing data Numbers / tables –Analyze – Descriptive Statistics- Frequencies / Descriptives Charts / graphs –Graphs – Pie / Histogram / Bar –Using excel for charts
Bivariate relationships Asking research questions involving two variables: –Categorical and interval –Interval and interval –Categorical and Categorical Describing relationships Testing relationships
Interval and interval Correlation –To be covered next week with OLS
Categorical (dichotomous) and interval T-tests –Analyze – compare means – independent samples t-test – check for equality of variances –t value= observed difference between the means for the two groups divided by the standard error of the difference –Significance of t statistic, upper and lower confidence intervals based on standard error
E.g. (with stats sceli.sav) Average age in sample=37.34 Average age of single=31.55 Average age of partnered=39.45 t=7.9/.74 Upper bound=-7.9+(1.96*.74) Lower bound=-7.9-(1.96*.74)
Categorical and Categorical Chi Square Test –Tabulation of two variables –What is the observed variation compared to what would be expected if equal distributions? –What is the size of that observed variation compared to the number of cells across which variation could occur? (the chi-square statistic) –What is its significance? (the chi square distribution and degrees of freedom)
E.g. Are the proportions within employment status similar across the sexes? Could also think about it the other way round