Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.

Slides:



Advertisements
Similar presentations
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Advertisements

Factor Analysis Continued
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Chapter Nineteen Factor Analysis.
© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
Lecture 7: Principal component analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
VALIDITY.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Principal Components An Introduction exploratory factoring meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Factor Analysis There are two main types of factor analysis:
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Factor Analysis Factor analysis is a method of dimension reduction.
Principal component analysis
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Education 795 Class Notes Factor Analysis II Note set 7.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
Principal Components An Introduction
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Correlation.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Chapter 9 Factor Analysis
Introduction to SAS Essentials Mastering SAS for Data Analytics
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Applied Quantitative Analysis and Practices
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Applied Quantitative Analysis and Practices
Exploratory Factor Analysis. Principal components analysis seeks linear combinations that best capture the variation in the original variables. Factor.
Education 795 Class Notes Factor Analysis Note set 6.
Exploratory Factor Analysis Principal Component Analysis Chapter 17.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Department of Cognitive Science Michael Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Factor Analysis 1 PSYC 4310 Advanced Experimental.
Multivariate Data Analysis Chapter 3 – Factor Analysis.
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Applied Quantitative Analysis and Practices LECTURE#19 By Dr. Osman Sadiq Paracha.
FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”
Principal Component Analysis
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Measuring latent variables
Measuring latent variables
Measuring latent variables
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Principal Component Analysis
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Chapter_19 Factor Analysis
Factor Analysis.
Measuring latent variables
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1

Chapter 17: FACTOR ANALYSIS SAS ESSENTIALS -- Elliott & Woodward2

LEARNING OBJECTIVES SAS ESSENTIALS -- Elliott & Woodward3 To be able to perform an exploratory factor analysis using PROC FACTOR To be able to use PROC FACTOR to identify underlying factors or latent variables in a data set To be able to use PROC FACTOR to rotate factors for improved interpretation To be able to use PROC FACTOR to compute factor scores

Factor Analysis SAS ESSENTIALS -- Elliott & Woodward4  Factor analysis is a dimension reduction technique designed to express the actual observed variables using a smaller number of underlying latent variables.  Exploratory factor analysis involves identifying factors, determining which factors are needed to satisfactorily describe the original data, interpreting the meaning of these factors, and so on.  Confirmatory factor analysis involves techniques for testing hypotheses to confirm theories, and so on.

17.1 FACTOR ANALYSIS BASICS SAS ESSENTIALS -- Elliott & Woodward5  The typical steps in performing an exploratory factor analysis are the following: (a) Compute a correlation (or covariance) matrix for the observed variables. (b) Extract the factors (this involves deciding how many factors to extract, the method to use, and the values to use for the prior communality estimates). (c) Rotate the factors to improve interpretation. (d) Compute factor scores (if needed).  Factor analysis can be quite subjective without unique solutions. Consequently, there is a certain amount of "art" involved in any factor analysis solution.

Using PROC Factor SAS ESSENTIALS -- Elliott & Woodward6  The SAS procedure used to perform exploratory factor analysis is PROC FACTOR. A simplified syntax for this procedure is as follows: PROC FACTOR ; VAR variables ; PRIORS communalities; RUN;

SAS ESSENTIALS -- Elliott & Woodward7 Table 17.1 Common Options for PROC FACTOR OptionExplanation DATA = datanameSpecifies which data set to use. METHOD=optionSpecifies the estimation method. Options include ML and PRINCIPAL MINEIGEN=nSpecifies the smallest eigenvalue for retaining a factor. NFACTORS=nSpecifies the maximum number of factors to retain NOPRINTSuppress output PRIORS= optionSpecifies the method for obtaining prior communalities ROTATE = nameSpecifies the rotation method. The default is ROTATE=NONE. Common rotation methods are VARIMAX, QUARTIMAX, EQUAMAX, and PROMAX. All of the above are orthogonal rotations except PROMAX. SCREEDisplays a Scree plot of the eigenvalues. SIMPLEDisplays means, standard deviations, and number of observations CORRDisplays the correlation matrix

SAS ESSENTIALS -- Elliott & Woodward8 Common Statements for PROC FACTOR (Table 17.1 Continued) VAR variable list;Specifies the numeric variables to be analyzed. Default is to use all numeric variables BY, FORMAT, LABEL, WHERE These statements are common to most procedures, and may be used here.  NOTE: If the Methods=Principal option is used, then principal component analysis is performed when the PRIORS= option is not used or is set to ONE (the default).  If you specify a PRIORS= value other than PRIORS=ONE, then a principal factor method analysis is performed.  A common usage is PRIORS=SMC in which case the prior communality for each variable is the squared multiple correlation of it with all other variables.  After extracting the factors, the communalities represent the proportion of the variance in each of the original variables retained after extracting the factors.

Do Hands On Exercise p 379 (AFACTOR1.SAS) SAS ESSENTIALS -- Elliott & Woodward9  Two of the types of intelligence are Logical-Mathematical Intelligence and Linguistic Intelligence. In this example, we examine a hypothetical data set that contains six variables, each measured on a scale as follows:  COMPUTATION - Test on mathematical computations  VOCABULARY - A vocabulary test  INFERENCE - A test of the use of inductive and deductive inference  REASONING - A test of sequential reasoning  WRITING - A score on a writing sample  GRAMMAR - A test measuring proper grammar usage.

Using PROC LOGISTIC SAS ESSENTIALS -- Elliott & Woodward10 PROC FACTOR DATA=MYSASLIB.INTEL SIMPLE CORR SCORE METHOD=PRINICPAL ROTATE=VARIMAX OUT=FS PRIORS=SMC PLOTS=SCREE; RUN; Specifies the estimation method. Displays common statistics Specifies rotation method Specifies the method for obtaining prior communalities Requests SCREE plot

Observe Output From PROC FACTOR SAS ESSENTIALS -- Elliott & Woodward11  Simple Statistics

Correlation Matrix for Six Variables SAS ESSENTIALS -- Elliott & Woodward12  The high pairwise correlations among COMPUTATION, INFERENCE, and REASONING (to a lesser extent) seem to indicate some tendency to measure Math Intelligence while the variables VOCABULARY, WRITING, and GRAMMAR that seem to be measuring Linguistic Intelligence are also positively pairwise correlated.

Prior Communality Estimates SAS ESSENTIALS -- Elliott & Woodward13  Because we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. These prior communality estimates are given in this table

Scree Plot SAS ESSENTIALS -- Elliott & Woodward14 The Scree Plot gives a visual illustration of the sizes of the eigenvalues. It is clear that there are two dominant eigenvalues.

Eigenvalues SAS ESSENTIALS -- Elliott & Woodward15  This table displays eigenvalues associated with the factors based on the reduced correlation matrix. It is clear from the table that there are two dominant eigenvalues (2.319 and 1.725). Based on any reasonable criterion, it is clear that a two-factor solution should be used.

Communality Estiamates SAS ESSENTIALS -- Elliott & Woodward16  The communalities in this table are the proportion of the variance in each of the original variables retained after extracting the factors. It seems that all six variables are sufficiently well represented by the two factors, with variable REASONING having the smallest communality,

Factor Pattern Matrix SAS ESSENTIALS -- Elliott & Woodward17  In this table, it can be seen that for Factor 1, each variable has a positive coefficient ranging from.41 for REASONING to.77 for WRITING.  A reasonable interpretation of this factor is that it is an overall measure of intelligence.  The second factor (Factor 2) has negative loadings on the variables measuring Linguistic Intelligence and positive coefficients on the others.

Interpreting the Factor Analysis Results SAS ESSENTIALS -- Elliott & Woodward18  Based on the less than ideal interpretability of these factors, we use a rotation in hope of producing more interpretable results. (Recall that by construction, there should be two factors: Math Intelligence and Linguistic Intelligence.)  Using the option ROTATE=VARIMAX, we have instructed SAS to perform a Varimax rotation.  SAS provides several rotation options, and Varimax is a popular "orthogonal rotation," which produces two orthogonal factors that are potentially easier to interpret.

Interpreting the Rotated Factor Pattern Matrix SAS ESSENTIALS -- Elliott & Woodward19  In this table the coefficients for COMPUTATION are the correlations of the variable COMPUTATION with each of the two factors.  There is a large positive correlation between COMPUTATION and Factor 2 and a very small correlation between COMPUTATION and Factor 1.  Similar interpretations show that Factor 1 is highly correlated with the three variables measuring Linguistic Intelligence and Factor 2 tends to correspond to Math Intelligence.

Storing Factor Scores SAS ESSENTIALS -- Elliott & Woodward20  Suppose you want to calculate factor scores and save them in a temporary working file FSCORES. In order to accomplish this, add the following PROC FACTOR options before PLOTS= SCREE; SCORE NFACTOR=2 OUT=FSCORE  Then, after the RUN; statement add the code PROC PRINT DATA=FSCORE; VAR FACTORl FACTOR2; RUN; Outputs a SAS dataset named FSCORE

Results of OUT=FSCORE SAS ESSENTIALS -- Elliott & Woodward21  The two-factor scores are given the default names FACTOR1 and FACTOR2 (the prefix "FACTOR" can be changed using the PREFIX= option).  Recalling that Factor 1 is a measure of Linguistic Intelligence and Factor 2 measures Math Intelligence, from the factor scores it can be seen that Subject 1 has a higher Linguistic Intelligence score, Subject 2 seems to have High Math Intelligence, and Subject 3 unfortunately doesn't seem to have strength in either dimension.

Do Hands On Example p 386 (AFACTOR2.SAS) SAS ESSENTIALS -- Elliott & Woodward22  Olympic Data  This data set contains scores of 193 athletes who completed all 10 decathlon events in the 1988 through 2012 Olympic Games.  The 10 events in the decathlon are 100-m run, long jump, shot put, high jump, 400-m run, 100-m hurdles, discus, pole vault, javelin, and 1500-m run.  These events measure a wide variety of athletic ability, and in this example we use this decathlon data set to explore whether there are some underlying dimensions of athletic ability.  It should be noted that the "times" in the running events are given negative signs so that " larger" values are better than "smaller" values as is the case in the distance measurements

Factor Analysis Code for Olympic Data SAS ESSENTIALS -- Elliott & Woodward23 PROC FACTOR SIMPLE CORR DATA MYSASLIB.OLYMPIC METHOD=PRINCIPAL MSA PRIORS=SMC ROTATE=VARIMAX OUTSTAT=FACT ALL PLOTS=SCREE; VAR RUNl0 LONGJUMP SHOTPUT HIGHJUMP RUN400 HURDLES DISCUS POLEVAULT JAVELIN RUNl500S; RUN;

Simple Statistics for Olympic Data SAS ESSENTIALS -- Elliott & Woodward24  As mentioned earlier, times in the running events are given negative signs so that "larger" values are better than "smaller" values as is the case in the distance measurements.  Moreover, the 1500-m results are given in (negative) seconds rather than the usual reporting of minutes and seconds.

Correlations for Olympic Data SAS ESSENTIALS -- Elliott & Woodward25  There are positive correlations between speed events such as the 100-m run and 100-m hurdles (0.692) and between strength events SHOTPUT and DISCUS (0.748). The 1500-m run is not highly correlated with any of the other events.  400-m run (0.368). X

Communality Estimates, Olympic Data SAS ESSENTIALS -- Elliott & Woodward26  Since we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. This table shows the prior communality estimates (slightly rearranged from the original output)

Eigenvalues for Olympic Data SAS ESSENTIALS -- Elliott & Woodward27  See next slide…

Eigenvalues for Olympic Data SAS ESSENTIALS -- Elliott & Woodward28  The eigenvalues table shows factors based on the reduced correlation matrix. PROC FACTOR selected three factors. It is clear from the previous table and the Scree plot that there are three dominant eigenvalues.

SAS ESSENTIALS -- Elliott & Woodward29  The communalities in this table (rearranged slightly from, output) are the proportion of the variance in each of the original variables retained after extracting the factors.  It seems that all 10 events are fairly well represented by the three factors, with all communalities above  However, HIGHJUMP, POLEVALULT, JAVELIN, and RUN1500S all having communalities below 0.4.

SAS ESSENTIALS -- Elliott & Woodward30  As was the case for the unrotated solution for the Intelligence Data, it can be seen that Factor 1 has a positive coefficient, all of which are above 0.4 except for RUN1500S, which has a coefficient of  A reasonable interpretation is that Factor 1 measures overall athletic ability, primarily related to the first nine events. Factors 2 and 3 are more difficult to interpret. Factor Patterns

Use ROTATE=VARIMAX SAS ESSENTIALS -- Elliott & Woodward31  Based on the confusing interpretations associated with the Three-Factor solutions given in the previous table, we again use a rotation to produce more interpretable results.  Using the option ROTATE=VARIMAX results in the Rotated Factor Pattern Matrix given in in the following slide…

Rotated Factor Patterns SAS ESSENTIALS -- Elliott & Woodward32  The first rotated factor seems to focus on events 100-m long jump, 400-m run, and 110-m hurdles that involve speed and spring.  Factor 2 seems to be primarily an arm strength factor with high coefficients for shot put and long jump and lesser in javelin, pole vault, and high jump.  The only event with a large coefficient in Factor 3 is the 1500-m hurdles. This is consistent the correlation matrix that suggested the 1500-m run was "different" from the other events.

17.2 SUMMARY SAS ESSENTIALS -- Elliott & Woodward33  In this chapter, we have discussed methods for using PROC FACTOR to perform exploratory factor analysis. In the Hands-on Examples, we have illustrated the use of rotation to obtain more understandable results.  Continue to Chapter 18: CREATING CUSTOM GRAPHS

SAS ESSENTIALS -- Elliott & Woodward34 These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2 nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: X ISBN-13: These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to Thanks.