3 Standardization Is used to improve interpretability of variables. Some variables have a natural interpretable metric: e.g. income, age, gender, country. Others, primarily ordinal variables, do not: e.g. education, attitude items, intelligence. Standardizing these variables makes them more interpretable.
4 Standardization Transforming the variable to a comparable metric –known unit –known mean –known standard deviation –known range Three ways of standardizing: –P-standardization (percentile scores) –Z-standardization (z-scores) –D-standardization (dichotomize a variable)
5 When you should always standardize When averaging multiple variables, e.g. when creating a socioeconomic status variable out of income and education. When comparing the effects of variables with unequal units, e.g. does age or education have a larger effect on income?
6 P-Standardization Every observation is assigned a number between 0 and 100, indicating the percentage of observation beneath it. Can be read from the cumulative distribution In case of knots: assign midpoints The median, quartiles, quintiles, and deciles are special cases of P-scores.
8 P-standardization Turns the variable into a ranking, i.e. it turns the variable into a ordinal variable. It is a non-linear transformation: relative distances change Results in a fixed mean, range, and standard deviation; M=50, SD=28.6, This can change slightly due to knots A histogram of a P-standardized variable approximates a uniform distribution
9 Linear transformation Say you want income in thousands of guilders instead of guilders. You divide INCMID by f1000,- MSD Incmid ƒ 2543,- ƒ 1481,- Incmid/1000 kƒ 2,543k ƒ 1,481
10 Linear transformation Say you want to know the deviation from the mean Subtract the mean (f2543,-) from INCMID MSD Incmid ƒ 2543,- ƒ 1481,- Incmid-M ƒ0,-ƒ 1481,-
11 Recap: multiplication and addition and the number line
12 Linear transformation Adding a constant (X’ = X+c) –M(X’) = M(X)+c –SD(X’) = SD(X) Multiply with a constant (X’ = X*c) –M(X’) = M(X)*c –SD(X’) = SD(X) * |c|
13 Z-standardization Z = (X-M)/SD two steps: –center the variable (mean becomes zero) –divide by the standard deviation (the unit becomes standard deviation) Results in fixed mean and standard deviation: M=0, SD=1 Not in a fixed range! Z-standardization is a linear transformation: relative distances remain intact.
14 Z-standardization Step 1: subtract the mean c = -M(X) M(X’) = M(X)+c M(X’) = M(X)-M(X)=0 SD(X’)=SD(X)
15 Z-standardization Step 2: divide by the standard deviation c is 1/SD(X) M(Z) = M(X’) * c M(Z) = 0 * 1/SD(X) = 0 SD(Z) = SD(X’) * c SD(Z) = SD(X) * 1/SD(X) = 1
16 Normal distribution Normal distribution = Gauss curve = Bell curve Formula (McCall p. 120) –Note the (x- ) 2 part –apart from that all you have to remember is that the formula is complicated Normal distribution occurs when a large number of small random events cause the outcome: e.g. measurement error
17 Normal distribution Other examples the height of individuals, intelligence, attitude But: the variables Education, Income and age in Eenzaam98 are not normally distributed
18 Z-scores and the normal distribution Z-standardization will not result in a normally distributed variable Standardization in NOT the same as normalization We will not discuss normalization (but it does exist) But: If the original distribution is normally distributed, than the z-standardized variable will have a standard normal distribution.
19 Standard normal distribution Normal distribution with M=0 and SD=1. Table A in Appendix 2 of McCall Important numbers (to be remembered): –68% of the observations lie between ± 1 SD –90% of the observations lie between ± 1.64 SD –95% of the observations lie between ± 1.96 SD –99% of the observations lie between ± 2.58 SD
20 Why bother? If you know: –That a variable is normally distributed –the mean and standard deviation Than you know the percentage of observations above or below and observation These numbers are a good approximation, even if the variable is not exactly normally distributed
21 P & Z standardization Both give a distribution with fixed mean, standard deviation, and unit P-standardization also gives a fixed range Both are relative to the sample: if you take observations out, than you have to re- compute the standardized variables
22 P & Z-standardization When interpreting Z-standardized variables one uses percentiles With P-standardization one decreases the scale of measurement to ordinal, BUT this improves interpretability.