3 Factor Analysis - Intro Data reduction - identifies parts of data set which potentially measure the same thing. –Commonly encountered through identification of personality dimensions. Hundreds of questions relating to components of personality are complied –Do you enjoy socialising with different people at parties? –Do you worry a lot? –Do you enjoy trying out new things? –Do you get upset very easily? Questions consistently responded to in similar manner by different respondents supposedly address the same underlying construct or ‘Common Factor’ - e.g., Extraversion-Introversion or Neuroticism.
4 Factor Analysis - Intro Most data, generated from responses, can be exposed to FA and therefore it is not limited to questionnaires e.g., a series of physical tests may have as their essence one or two core skills. Is arguably the most abused statistical technique used. Generates much controversy and the treatment here is very simplified. Are many different types - simplest form described here - others identified in due course
5 Identifying No. of factors - inspection of responses Q1Q2 Q3Q4Q5Q6 Stephen554112 Ann121112 Paul343454 Janette443121 Michael334122 Christine333545 Answer 1 for strongly agree and 5 for strongly disagree Q1 I enjoy socialising 12345 Q2 I often act on impulse12345 Q3 I am a cheerful sort of person12345 Q4 I often feel depressed12345 Q5 I have difficulty getting to sleep12345 Q6 Large crowds make me feel anxious12345 Question Sample Response Sample
6 Tentative inferences Responses to Q1-3 and Q4-6 were very similar. Suggests these questions are addressing the same common factor. Made easy by the fact –related question items were positioned side by side –were very few participants. In normal situations this would be impossible Usually a correlation matrix is required to identify which items are related to each other.
7 Correlation matrix Q1Q2Q3Q4Q5Q6 Q1 1 Q2.9331 Q3.824.6961 Q4 -.096-.05201 Q5 -.005.058.111.8961 Q6 -.167-.1270.965.8081 Corrleation matrix depicting the correlations between the six items given the responses previously documented. Mentioned 2 slides later
8 Interpretation Q1-3 correlate strongly with each other and hardly at all with 4-6 indicating 2 common factors. This would not be typical. –Correlations here are artificially large - in real life would rarely be in excess of.5 - typically be between.2-.3. Would make it very difficult to establish a pattern by eye. –With more items there would be a greater number of correlations to observe. 6 items produced 15 correlations. 40 items would produce 780 items - N(N-1)/2.
9 Representing FA through geometry Items or factors can be represented by straight lines of equal length. adjacenthypotenuse cosine of angle = hypotenuse/ adjacent correlation =.97, cosine=.97, angle = 15 Lines are positioned such that the correlation between the items = cosine of the angle. Q4Q6
10 Interpreting angles Factors/items above horizontal are positively correlated to F1 Factors/items at right angles to F1 have zero correlation to it Factors/items at 180 have a perfect negative correlation F1 F2 F1- F2 =15 , r=.97 F3 F1- F3 =105 , r= -.26 F1 - F4 =165 , r= -.97 F1 - F5 =285 , r=.26
11 Combining Factors and Items Roughly orthogonal solution for the items described previously. I2 I3 I1 F1 I5 I4 I6 F2
12 Possible relationships between Common Factors Orthogonal solution - when two common factors are extracted which are not themselves correlated i.e., they are at right angles to each other. Is preferable since if the common factors are not correlated they truly represent independent factors. Oblique solution - the common factors extracted may themselves may be correlated.
13 Essential FA output & associated statistical concepts Factor (structure) matrix - table showing the correlations between all the items and the factors. By convention factors are shown as columns. –Factor loading - correlation between an item and a factor NB this is different to the correlation matrix
14 Factor matrix shows 3 things/ 1&2 Which items make up which common factor –Convention dictates that an item only contributes to a factor if the correlation is greater than ±.3 Revels amount of overlap between each item and all the factors –square of correlation indicates the common variance between item and factor. Sum these squared correlations = communality of item For I1=.9 2 +.1 2 =.82 –Communality for an item may be low because measures something conceptually different from all the other items Has excessive measurement error Are few individual differences in the way the item is responded to - may be very easy or very difficult
15 Indicates the relative importance of each common factor i.e., A factor that for example explains 40% of the overlap between the items will be more important than one that only explains 25%. –Calculated through an eigenvalue. Square the factor loadings for a single factor, add them up = the eigenvalue. Divide the eigenvalue by the number of items - proportion of variance which is explained by that factor. Factor matrix shows 3 things/ 3
17 Additional observation Indicates possibility that some of the variance may be unexplained by the factors. Possible explanations: –Factors are an approximation - some of the original information is sacrificed during this process. The 2 different methods of EFA make different assumptions about the possibility of unexplained variance.
18 Principal Components Analysis vs. Principle Axis Factoring Both are examples of Exploratory FA but are distinguished by assumption regarding the possibility of unexplained variance PCA - all item variance can be explained by the factors. All items will have a communality of 1 and the factors will, between them, account for 100% of the variation among the items. –Total variance = common factor variance + measurement error
19 PAF - items may have ‘unique variance’ - variance which cannot be explained by factors –Suppose there are two test items: What is the capital of Italy? What is the capital of Spain? Lets assume that they are of the same level of difficulty and therefore test the underlying factor (geographical knowledge) to the same degree. Will these items always be responded to in the exact same way? –Someone may have a poor level of Geographical knowledge but just happen to know the capital of Spain. It is therefore not possible to consider the two items as being completely equivalent Correct response depends on knowledge relating to –common factor (geographical knowledge) –something unique to the individual item - Specific Variance - cannot be predicted from the common factors
20 –Total variance = common factor variance + specific item variance + measurement error PAF is more complicated because it must determine how much of the variance relating to an item is ‘common-factor’ variance and how much is ‘specific variance’. –PCA does not allow for the possibility of Item specific variance
21 PCA or PAF? Seem to produce very similar results so much so that some researchers do not identify which one they are carrying out. Since PAF allows for specific variance then an item’s communality is necessarily going to be less than one –Loading factors for items are going to appear less impressive with PAF as opposed to PCA.
22 Used for 4 basic purposes /1 Shows how many distinct common factors are measured by a set of test items –Are the supposed different constructs: neuroticism, anxiety, hysteria, ego strength, self-actualisation, and locus of control, 6 independent entities or would they be better described as only 2 factors? ‘Elements of Pathology’ ‘Healthy mechanisms’ neuroticism, anxiety, hysteria ego strength, self- actualisation, locus of control
23 Used for 4 basic purposes /2&3 Shows which items relate to which common factors –from previous example neuroticism belonged to the factor ‘Elements of Pathology’ Determines whether tests that purportedly measure the same thing in fact do so –3 tests that claim to measure anxiety. FA may produce more than one factor indicating something in addition to anxiety is being measured
24 Checks the psychometric properties of questionnaire - with a different sample do the same factors materialise? –Would a different population made up of Native American Indians identify the constructs of extraversion-introversion & Neuroticism which have been found in European cultures? Used for 4 basic purposes /4