# 1 Week 1 Review of basic concepts in statistics handout available at 30-9-2007 Trevor Thompson.

## Presentation on theme: "1 Week 1 Review of basic concepts in statistics handout available at 30-9-2007 Trevor Thompson."— Presentation transcript:

1 Week 1 Review of basic concepts in statistics handout available at http://homepages.gold.ac.uk/aphome 30-9-2007 Trevor Thompson

2 Review of following topics: Population vs. sample Population vs. sample Measurement scales Measurement scales Plotting data Plotting data Mean & Standard deviation Mean & Standard deviation Degrees of freedom Degrees of freedom Transforming data Transforming data Normal distribution Normal distribution - Howell (2002) Chap 1-3. Statistical Methods for Psychology

3 Population vs. sample Population - an entire collection of measurements Population - an entire collection of measurements (e.g. reaction times, IQ scores, height or even height of male Goldsmiths students) (e.g. reaction times, IQ scores, height or even height of male Goldsmiths students) Sample – smaller subset of observations taken from population Sample – smaller subset of observations taken from population sample should be drawn randomly to make inferences about population. Random assignment to groups improves validity sample should be drawn randomly to make inferences about population. Random assignment to groups improves validity

4 Population vs. sample In general: In general: population parameters =Greek letters population parameters =Greek letters sample statistics=English letters sample statistics=English letters -worth learning glossary of other symbols now to avoid later confusion (e.g. Σ=the sum of) PopulationSample mean μ (mu) X variance σ 2 (sigma) s2s2s2s2

5 Measurement scales Categorical or Nominal Categorical or Nominal e.g. male/female, or catholic/protestant/other e.g. male/female, or catholic/protestant/other Continuous Continuous Ordinal - e.g. private/sergeant/admiral Ordinal - e.g. private/sergeant/admiral Interval- e.g. temperature in celsius Interval- e.g. temperature in celsius Ratio - e.g. weight, height etc Ratio - e.g. weight, height etc

6 Plotting data Basic rule is to select plot which represents what you want to say in the clearest and simplest way Basic rule is to select plot which represents what you want to say in the clearest and simplest way Avoid chart junk (e.g. plotting in 3D where 2D would be clearer) Avoid chart junk (e.g. plotting in 3D where 2D would be clearer) Popular options include bar charts, histograms, pie charts etc - see any text book. SPSS charts discussed in workshop Popular options include bar charts, histograms, pie charts etc - see any text book. SPSS charts discussed in workshop

7 Summary statistics Two essential components of data are: Two essential components of data are: (i) central tendency of the data & (ii) spread of the data (e.g. standard deviation) (i) central tendency of the data & (ii) spread of the data (e.g. standard deviation) Although mean (central tendency) and standard deviation (spread) are most commonly used, other measures can also be useful Although mean (central tendency) and standard deviation (spread) are most commonly used, other measures can also be useful

8 Measures of central tendency Mode Mode the most frequent observation: 1, 2, 2, 3, 4,5 the most frequent observation: 1, 2, 2, 3, 4,5 Median Median the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 (average of middle two numbers when even number of scores exist) relatively uninfluenced by outliers relatively uninfluenced by outliers Mean = Mean = Mode Mode the most frequent observation: 1, 2, 2, 3, 4,5 the most frequent observation: 1, 2, 2, 3, 4,5 Median Median the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 (average of middle two numbers when even number of scores exist) relatively uninfluenced by outliers relatively uninfluenced by outliers

9 Measures of dispersion Several ways to measure spread of data: Several ways to measure spread of data: Range (max-min), IQR or Inter-Quartile Range (middle 50%), Average Deviation, Mean Absolute Deviation Range (max-min), IQR or Inter-Quartile Range (middle 50%), Average Deviation, Mean Absolute Deviation Variance – average of the squared deviations Variance – average of the squared deviations Variance for population of 3 scores (-10,0,10) is 66.66 (200/3) Variance for population of 3 scores (-10,0,10) is 66.66 (200/3) Standard deviation is simply the square root of the variance Standard deviation is simply the square root of the variance

10 Calculating sample variance Population variance ( 2 ) is the true variance of the population calculated by - this equation is used when we have all values in a population (unusual) However, the variance of a sample (S 2 ) tends to be smaller than the population from which it was drawn. So, we use this equation: The correction factor of N-1 increases the variance to be closer to the true population variance (in fact, the average of all possible sample variances exactly equals 2 )

11 Degrees of freedom Why is N-1 used to calculate sample variance? Why is N-1 used to calculate sample variance? When calculating sample variance, we calculate the sample mean thus making make the last number in the dataset redundant – i.e. we lose a degree of freedom (last no. is not free to vary) When calculating sample variance, we calculate the sample mean thus making make the last number in the dataset redundant – i.e. we lose a degree of freedom (last no. is not free to vary) e.g. M=10, sample data: 12, 9, 10, 11, 8 Calculating the sample mean (10) means that we have already (implicitly) included the last number in our calculations. If we (knew and) used the population mean rather than the sample mean this would not be the case so we could use N not N-1. Howell illustrates this with a worked example (and mathematical proof can be retrieved with internet search) Howell illustrates this with a worked example (and mathematical proof can be retrieved with internet search) Bottom line is whenever we have to estimate a statistic (e.g. mean) we lose a degree of freedom Bottom line is whenever we have to estimate a statistic (e.g. mean) we lose a degree of freedom

12 Transforming data One reason we might transform data is to convert from one scale to another One reason we might transform data is to convert from one scale to another e.g. feet into inches, centigrade into fahrenheit, raw IQ scores into standard IQ scores e.g. feet into inches, centigrade into fahrenheit, raw IQ scores into standard IQ scores Scale conversion can usually be achieved by simple linear transformation (multiplying/dividing by a constant and adding/subtracting a constant) Scale conversion can usually be achieved by simple linear transformation (multiplying/dividing by a constant and adding/subtracting a constant) X new = b*X old + c So to convert centigrade data into fahrenheit we would apply the following: So to convert centigrade data into fahrenheit we would apply the following:

13 Transforming data Z-transform (standardisation) is one common type of linear transform, which produces a new variable with M=0 & SD=1 Z-transform (standardisation) is one common type of linear transform, which produces a new variable with M=0 & SD=1 Z -scores= X Z -scores= X Standardisation is useful when comparing the same dimension measured on different scales (e.g. anxiety scores measured on a VAS and questionnaire) Standardisation is useful when comparing the same dimension measured on different scales (e.g. anxiety scores measured on a VAS and questionnaire) After standardisation these scales could also be added together (adding two quantities on different scales is obviously problematic) After standardisation these scales could also be added together (adding two quantities on different scales is obviously problematic)

14 Normal Distribution Many real-life variables (height, weight, IQ etc etc) are distributed like this Many real-life variables (height, weight, IQ etc etc) are distributed like this Mathematical equation mimics this normal (or Gaussian) distribution Mathematical equation mimics this normal (or Gaussian) distribution

15 Normal Distribution The mathematical normal distribution is useful as its known mathematical properties give us useful info about our real-life variable (assuming our real-life variable is normally distributed) The mathematical normal distribution is useful as its known mathematical properties give us useful info about our real-life variable (assuming our real-life variable is normally distributed) For example, 2 standard deviations above the mean represent the extreme 2.5% of scores (calculus equations used to derive this) For example, 2 standard deviations above the mean represent the extreme 2.5% of scores (calculus equations used to derive this) Consequently, a person with an IQ score of 130 (M=100, SD=15), would be in the top 2.5% (assuming IQ is normally distributed) Consequently, a person with an IQ score of 130 (M=100, SD=15), would be in the top 2.5% (assuming IQ is normally distributed)

16 Normal Distribution Normality is important assumption (though more about this next week). Violations of normality generally take two forms: Normality is important assumption (though more about this next week). Violations of normality generally take two forms: SKEWNESS SKEWNESS KURTOSIS KURTOSIS

Download ppt "1 Week 1 Review of basic concepts in statistics handout available at 30-9-2007 Trevor Thompson."

Similar presentations