1 PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2006 Wednesdays 7-10 PM Room 204 FACTOR ANALYSIS, CLUSTER ANALYSIS.

Slides:



Advertisements
Similar presentations
Chapter Nineteen Factor Analysis.
Advertisements

© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
Lecture 7: Principal component analysis (PCA)
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Factor Analysis There are two main types of factor analysis:
Factor Analysis Factor analysis is a method of dimension reduction.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Segmentation and Profiling using SPSS for Windows Kate Grayson.
Principal component analysis
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Data Analysis Statistics. Inferential statistics.
Correlational Designs
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
Chapter 8: Bivariate Regression and Correlation
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
Principal Components Principal components is a method of dimension reduction. Suppose that you have a dozen variables that are correlated. You might use.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Hierarchical Binary Logistic Regression
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Factor Analysis © 2007 Prentice Hall. Chapter Outline 1) Overview 2) Basic Concept 3) Factor Analysis Model 4) Statistics Associated with Factor Analysis.
Factor Analysis Istijanto MM, MCom. Definition Factor analysis  Data reduction technique and summarization  Identifying the underlying factors/ dimensions.
The Goal of MLR  Types of research questions answered through MLR analysis:  How accurately can something be predicted with a set of IV’s? (ex. predicting.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.
© 2007 Prentice Hall19-1 Chapter Nineteen Factor Analysis © 2007 Prentice Hall.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression Rubab G. ARIM, MA University of British Columbia December 2006.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Applied Quantitative Analysis and Practices
Exploratory Factor Analysis. Principal components analysis seeks linear combinations that best capture the variation in the original variables. Factor.
Education 795 Class Notes Factor Analysis Note set 6.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2007 Wednesdays 7-10 PM Room 204.
Factor Analysis I Principle Components Analysis. “Data Reduction” Purpose of factor analysis is to determine a minimum number of “factors” or components.
Applied Quantitative Analysis and Practices LECTURE#19 By Dr. Osman Sadiq Paracha.
PART 2 SPSS (the Statistical Package for the Social Sciences)
FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
Principal Component Analysis
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
FACTOR ANALYSIS & SPSS. First, let’s check the reliability of the scale Go to Analyze, Scale and Reliability analysis.
Basic statistical concepts Variance Covariance Correlation and covariance Standardisation.
1 FACTOR ANALYSIS Kazimieras Pukėnas. 2 Factor analysis is used to uncover the latent (not observed directly) structure (dimensions) of a set of variables.
Appendix I A Refresher on some Statistical Terms and Tests.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
FACTOR ANALYSIS & SPSS.
Exploratory Factor Analysis
BINARY LOGISTIC REGRESSION
Dr. Siti Nor Binti Yaacob
EXPLORATORY FACTOR ANALYSIS (EFA)
Analysis of Survey Results
An introduction to exploratory factor analysis in IBM SPSS Statistics
Segmentation and Profiling using SPSS for Windows
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Principal Component Analysis
Chapter_19 Factor Analysis
Presentation transcript:

1 PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2006 Wednesdays 7-10 PM Room 204 FACTOR ANALYSIS, CLUSTER ANALYSIS and SEGMENTATIONS

2 1. Purpose of Factor Analysis Factor Analysis – a ‘data reduction’ technique 1.Technique for dealing with multicollinearity 2.Used to transform Likert scales into factor scores as an alternative to linear additive scale. 3.Creates groups of respondents based on sets of shared attitudes (explains variables in terms of their underlying dimensions). 4.Facilitates interpretation of a large number of variables 5.Factor scores (the grouped attitudes) can be then used as an independent variable.

3 2. Steps to conducting FA When creating a questionnaire, often you may want to include a number of attitudinal questions around certain issues. When analyzing the data with all these variables you start by selecting those attitudes that you think describe some overall category, for example ‘Taste in Music’. These attitudinal variables ideally should be of the same metric (e.g., 1,2,3,4,5). Some say the variables should have 7 values, but 5 works fine. Don’t use dichotomous variables. Begin by computing a correlation matrix of all the variables in question. There should be some significant correlations, both positive and negative. There should be a 4:1 ratio of cases to variables (e.g., 100 cases for 25 variables minimum), and sample size of at least 50.

4 Correlation matrix of musical tastes Research issue: You’ve been asked by a music store owner to assist in increasing sales by making sure the placement of music genres in the store is optimal. Using GSS93 subset.sav, run a set of frequencies to check that the variables fit the requirements. Then run a correlation matrix of all the music questions.

5 Correlation Matrix

6 Evaluating Appropriateness of FA Check the correlation matrix, which examines only relationships between pairs of variables (e.g., bivariate, not multivariate correlation) So, then select these variables into the FA. Analysis - Data Reduction – Factor Move all 11 music variables to the Variables window. Under Descriptions, click on the option for KMO and Bartletts test of sphericity. Use Bartlett Test of Sphericity to examine the entire matrix, where you want to reject the null hypothesis that the matrix is a unity matrix (i.e., it should be significant. A unity matrix is when all the correlations are 0 except for, of course, the correlation between a variable and itself (=1). (Note that our text says not to place much value on this test in most cases.) KMO stands for Kaiser-Meyer-Olkin Meausure and it compares the magnitude of observed correlation coefficients to partial (that is, what’s unique about the attribute) coefficients. Here you want a number closer to 1. Less than.5 indicates that FA may not be appropriate. Ours is.748.

7 SPSS for PCA/FA Analysis – Data Reduction – Factor Under Extraction, choose the options for Principle Components, Eigenvalues over 1, Display unrotated and screen plot. Note that there is an option for Number of Factors. There are times you may want to impose a number rather than letting SPSS decide for you (and it decides based on the eigenvalues in the extraction). For Rotation, choose Varimax (variance maximization; it’s the most commonly used), and Display Rotated Solution. For scores, you will want to select Save as Variables/Regression when you find your solution. But not while in the exploration phase.

8 SPSS for PCA/FA FACTOR /VARIABLES bigband blugrass country blues musicals classicl folk jazz opera rap hvymetal /MISSING LISTWISE /ANALYSIS bigband blugrass country blues musicals classicl folk jazz opera rap hvymetal /PRINT INITIAL KMO EXTRACTION ROTATION /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PC /CRITERIA ITERATE(25) /ROTATION VARIMAX /METHOD=CORRELATION.

9 Interpreting SPSS results Under the chart ‘Total Variance Explained’ you will see that four factors have been identified, based on having eigenvalues > 1. The screen plot shows you a pictoral view of the eigenvalues. We have four, some might want to try the fifth, because that’s where the slope of the eigenvalues change, or similarly, try only 2. The most important thing is that the solution is interpretable, that it makes sense, that the factors provide insight into your overall concept. Eigenvalues are the values for the factor loading matrix that is used to describe the factors. It’s the variance in the correlation matrix condensed into a scale such that the factor with the largest eigenvalue has the most variance (or, the more variance the greater the distance of one factor from another, i.e., the factors are distinguishable. The unrotated matrix doesn’t tell you too much, go directly to the rotated matrix: here’s where the ‘rotated view’ can give you a better picture on the distinctiveness of each factor. Rotation maximizes high correlations and minimizes low correlations in the matrix used t calculate the factors, or it makes the factors more distinguishable to the ‘naked eye.’ In the rotated matrix, you then select the variables (attributes) with the highest coefficients. This one works out pretty well, sometimes you have to go back to the drawing board to redefine. Try it by limiting the result to just two factors. What underlying issue might be explaining this result compared to the four-factor solution?

10 Interpreting SPSS results You want to keep find components where the coefficients are at least above.3 and see a clear demarcation between the highest coefficients per component. Note that folk music is high for both 1 and 3. Sometimes therefore it is worthwhile to set the number of components to one above, and one less, than the default number based on the eigenvalue you’ve selected.

11 Scree Plot: Number of Components

12 Interpreting SPSS results Rotated Component Matrix(a) Component 1234 Bigband Music Bluegrass Music Country Western Music Blues or R & B Music Broadway Musicals Classical Music Folk Music Jazz Music Opera Rap Music Heavy Metal Music Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. aRotation converged in 5 iterations.

13 Project Recommendations Current Aisle 1AAisle1BAisle2AAisle2B bigbandjazzheavy metalrap bluegrassbluesmusicalsC&W classicalopera folk Recommended Aisle 1AAisle1BAisle2AAisle2B bigbandfolkC&Wmetal musicalsclassicalbluegrassrap opera blues Jazz

14 Homework #8 Using our own employee dataset (or if you wish, use your SDA data set and select your own variables), take the attitudinal variables, to understand how people define “quality of work.” –V11 I have the necessary resources (e.g., computers, databases) to do my work comfortably and efficiently. –V13 The work I'm responsible for is appropriate for my level of capability. –V16 I'm challenged and interested in my work. –V17 My immediate manager recognizes and acknowledges my contributions. –V22 I have responsibility with the required authority. –V24 I am satisfied with communications between management and employees. –v41r Your total compensation (salary, bonuses) –v42r 401(k), retirement and/or pension –v43r Availability of PTO (vacation) days –v44r The office itself (lighting, space, decor) –v45r Performance awards and bonuses

15 Homework #8 Run a frequencies test to make sure they are appropriate. Are they? Explain. Run a correlations table. Is this appropriate for PCA/FA? Explain. On this same selection of variables, conduct tests for KMO and Bartlett. Are we still on track for PCA/FA? Explain. Now conduct a factor analysis using these variables, setting the defaults as in the class example. Are you happy with this result? Then try setting the number of components differently, adding one or more, or subtracting, from the first result. Are you happy with this result? Explain. What can you say about components of Quality of Work?

16 Using Factor Scores Rarely are factor analyses conducted just for themselves. Rather, they are used as attitudinal measures to predict or be associated with other behavior or statuses. One could use factor scores as predictors in regression analyses. Or, as will be seen in segmentation later this semester, one can use factor scores to cluster with other characteristics to create typologies, or segments, of subgroups in a population. Today we’ll go back and use our music taste factors as predictors in other behaviors.

17 Review of Factor Analysis First, let’s not twist our brains into pretzels, so begin by doing an automatic recode on all musical variables. Give them a consistent new name, e.g., preface or end with an ‘r’, e.g., BIGBAND becomes RBIGBAND. /VARIABLES bigband blugrass country blues musicals classicl folk jazz opera rap hvymetal

18 Saving the Factor Score Analyze – data reduction – factor –Descriptives (check KMO-Bartletts) –Extraction (uncheck unrotated matrix, and check Screen Plot, select method = principal components) –Rotation (select varimax) –Scores (select Save as Variables) Run. Now look at your Variable View, and then at the Data View. Now run a Descriptive Statistics – Descriptives – Mean, Std Dev, Min, Max).

19 Using Factor Scores in a Regression Now, let’s predict tv viewing. First, run a frequencies of the variable TV hours watched per week. Recode it so that 8 hours and above = 8. Create a conceptual model: TV viewing = a + musical taste + education + sex + age. Run your regression with these variables.

20 Homework #9 Using the same factor analysis you ran last week with the employee data (see slide #14, run this factor analysis and save the factor score variables. Now run a regression: Overall satisfaction = a + (factor scores) + male + hours worked (hourswk)+ whether there was a layoff (v32) Explain why this model makes theoretical sense. Now explain the results. If you were an HR manager, what areas would you either try to improve, or make sure they stay as good?

21 Segmentation Using Factor Analysis and Cluster Analysis As you learned last week, segmentation analysis is used to create typologies or categorical groups of constituents, such as customers, patrons, etc. Often segmentations employ factor score results as well. In a segmentation, one first develops any necessary factor scores and saves them as output variables (you will see them added to your data set). Then, because the purpose of the segmentation is to create groups that can then be reached through some sort of marketing (social or commercial), or for some other actionable purpose, use demographics that can be employed to target the groups. Then, with the factor scores and the sociodemographic variables identified as being logical, use a clustering technique to create the groups. We will use cluster analysis, but other techniques include discriminant (also in SPSS), CHAID and CART (separate software packages), and the most adventurous is latent class models (also separate software, such as AMOS).

22 Cluster Analysis - 1 We’ll use GSS93 subset.sav. You will remember our musical factors (go back to slide #12 for results). First create names for your factor scores. I’ve labeled them: Classbig, bluejazz, cwgrass, heavyrap. Clients like meaningful labels, plus it helps you when reading the output. Then, consider possible demographic factors that might relate to musical taste, e.g., sex, age, race, region, education, income. Because this kind of analysis tends to be exploratory, you don’t need to specify the logic behind the relationships, but you should have some a priori idea about why these factors might be important in distinguishing the possible groups, in this case, musical taste. Cluster analysis doesn’t require recoding of IVs the way the other methods do…specify a categorical variable, or a covariate, as is appropriate.

23 Cluster Analysis - 2 Analyze - Classify – 2-step Cluster – select factors (categorical variables, e.g., sex) and covariates (ratio, interval or continuous variables). In our first round, do not specify the number of clusters. Because segmentations are part art, part science, you need to experiment until you find one that ‘works’ for you, so let’s try it with a different number of clusters.

24 Syntax for Cluster Analysis TWOSTEP CLUSTER /CATEGORICAL VARIABLES = sex politics /CONTINUOUS VARIABLES = bigclass bluejazz cwgrass heavyrap age educ /DISTANCE LIKELIHOOD /NUMCLUSTERS FIXED = 4 /HANDLENOISE 0 /MEMALLOCATE 64 /CRITERIA INITHRESHOLD (0) MXBRANCH (8) MXLEVEL (3) /PLOT BARFREQ PIEFREQ /PRINT COUNT SUMMARY /SAVE VARIABLE=TSC_4337. AIM TSC_4337 /CATEGORICAL sex politics /CONTINUOUS bigclass bluejazz cwgrass heavyrap age educ /PLOT ERRORBAR CATEGORY CLUSTER (TYPE=PIE).

25 Segmentation Homework Use the same data set, but this time use the variables for tv viewing and attendance at sports events and art museums for your factors. Label the factors, then cluster them with age, sex, political views. Try it with 3, 4, and 5 clusters. Which do you find, if any, to be believable? Why?