Assignment Surgery In F35 Edward Wright Today (Thu) 10-12 Tomorrow (Friday) 2-4 Monday 9-12, 2-5
Quantitative methods, lecture 7 Scale levels Brief discussion on more advanced statistical techniques. Correlation, regression, factor analysis
Scale levels Variables can be of different scale levels. The scale level of a variable determines how advanced statistical techniques we can use to analyse it There are four different scale levels: 1.Nominal scale 2.Ordinal scale 3.Interval scale 4.Ratio scale
Nominal scale variables… …are not numerical. Can therefore not be quantified Examples: gender, nationality Possible mathematical operations: = We cannot rank or quantify nationality since it is not meaningful to give one nationality a higher value than another In a dataset, each nationality has a value on the variable. But these values mean nothing, they are only there to separate the different nationalities from each other The statistics software can do all sorts of things with nominal scale variables, such as calculating means and standard deviations. But it is meaningless Nominal scale variables are normally not possible to use for more than cross tabulations.
Ordinal scale variables… …are possible to rank We know that one value is higher or lower than the other, but not by how much Possible mathematical operations: = > < But we do not know the distance between the values!!! How much higher than A levels or Highers is a university degree? We do not know Five response alternatives: Agree entirely, agree on the whole, neither agree nor disagree, disagree on the whole and disagree entirely. They are not numerical… …so we do not know that two different persons who have answered 'agree on the whole' are located at the same place vis a vis the neutral standpoint Sometimes the researcher assumes that there is an equal distance between the different values. But such assumptions are always questionable
Interval scale variables… …allow us not only to rank the values of a variable, but also determine the distance between them Possible mathematical operations: = > < + - Interval scale variables have no natural zero point There may be a conventionally agreed zero point, but no absolute one. Thus interval scale variables are rare in social science Temperature. Celsius/ Centigrade and Fahrenheit scales have no natural zero point, but zero points set by agreed convention (Kelvin scale does have a zero point) Time. In the Western world, the birth of Jesus Christ is used as a conventional zero point. But we can hardly set an absolute zero point of time
Ratio scale variables… …do have a natural zero point Possible mathematical operations: = > < + - * Age. The natural zero point is the time when the person was born Therefore we can measure the relationship between the values of the variable in question It is, for example, meaningful to state that a person is twice as old as another person
Scale levels It is advantageous to have as high a scale level as possible. This can be affected with the questionnaire It is advantageous to make the responses to questions numerical, and to avoid open-ended questions One way of doing this is to let respondents answer in terms of scales For example like/dislike scales, where it is then assumed that you like something twice as much is you answer 10 (Really Like) as 5 (Neither like nor dislike) You can also set the scale so that zero is the mid point, surrounded by an equal number of positive and negative numbers.
Scale levels Scales (0-10, -5 -- + 5 etc) are based on the assumption that they reflect people's opinions, and that people interpret the scale in exactly the same way That it means exactly the same thing that persons A and B answer +2 on a like/dislike scale to a question on how much you like the prime minister With ordinal scale variables this is even more questionable An example is course evaluation forms, which have a five step agree/disagree scale Can it be assumed that there is an equal distance between the different response choices? And if the alternatives would have been numbered +2, +1, 0, -1 and -2 instead, would that have meant that the 'interval scale assumption' is reasonable? This is of course a philosophical problem, and many would argue that any such quantification of what really is individual qualitative answers is misconceived As said in the first lecture, a lot of what we are doing in this module is based on assumptions, and these assumptions are by no means unquestionable
Scale levels Variables at interval scale or higher can be used in two main statistical techniques: correlation and regression Nominal and ordinal scale variables can also be transformed into ratio level variables… …by creating dummy variables A dummy variable has only two values, 1 and 0. For example, Nationality can be transformed into a great number of dummy variables: is Finnish (1), is not Finnish (0), is Australian (1) is not Australian (0), and so on Virtually any variable can thus be transformed into ratio level variables However, the variable is then no longer nationality. It is instead a long series of separate variables: 'Finnish', 'Australian' and so on
Correlation Correlation analysis is a technique by which we can analyse the extent to which two variables are related to each other A correlation coefficient is a measure of the extent to which the value of variable X coincides with the value of variable Y A correlation can be positive (high values on variable X tend to coincide with high values on variable Y)… …or negative (high values on variable X tend to coincide with low values on variable Y) There are several types of correlation techniques. One of the most commonly used is Pearsons r… …which requires interval scale variables or higher
Regression analysis In correlation, we cannot speak in terms of independent and dependent variables… …but this is possible in regression analysis A regression coefficient tells us how much a unit moves along the vertical scale if it moves one step along the horizontal scale In other words, the effect of the independent x variable on the dependent y variable.
Correlation v regression 1 Both correlation and the type of regression we are dealing with here assume and measure linear relationships We can calculate a straight line which reflects the relationship between the variables. This is done with the so-called Least-Squares method. I will not go into that, but it is a way of calculating the straight line which has the smallest sum of squared deviations from it.
Correlation v regression 2 Correlation tells us how closely concentrated all observations are to the line (see last point on previous slide) Correlation coefficients vary between –1 and 1, where 0 means no correlation at all A positive correlation means that high values on one variable tend to coincide with high values on the other, and vice versa (low values on one variable coincide with low values on the other) A negative correlation tells us that the relationship between the two variables is inverse, i.e. that high values on one variable coincide with low values on the other In a perfect correlation, i.e. –1 or 1, all observations would be the same as the line. In practice, we seldom we get higher correlation coefficients than 0.30. Regression tells us the slope of the line. This is done by using an equation: y = + x ( = intercept with y (vertical) axis. = slope: the change in y caused by one units change in x – but do not worry about this!!!)
Correlation v regression 3 Thus, we may have a high correlation, but a low regression coefficient A mild slope, of an almost straight line, means that there is movement along one variable but not along the other Therefore there is little effect by the independent variable on the dependent variable, because the latter is not affected much by the former. Hence, a low regression coefficient But, at the same time, the observations are closely concentrated to that line. Hence, a high correlation coefficient We may also have a high regression coefficient and a low correlation coefficient at the same time. This when the slope is steep, but the observations are spread out far away from the line The above assumes a linear relationship. This is often a simplification of reality, and there are so-called curvilinear regression models which are open for more complex relationships
Factor analysis Factor analysis is a technique which uses a long series of correlations among several variables, in order to find out whether they form patterns, or dimensions Factor analysis can for example determine that negative attitudes to inner city driving, willingness to raise the penalties for pollution, subsidies to organic food production and scepticism against economic growth constitute a factor This, simplistically put, means that they correlate more with each other, than with any other attitudes This can also be called an attitude dimension In this example, we can call it a green factor, or dimension