Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.

Similar presentations


Presentation on theme: "Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1."— Presentation transcript:

1 Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1

2 Chapter 17: FACTOR ANALYSIS SAS ESSENTIALS -- Elliott & Woodward2

3 LEARNING OBJECTIVES SAS ESSENTIALS -- Elliott & Woodward3 To be able to perform an exploratory factor analysis using PROC FACTOR To be able to use PROC FACTOR to identify underlying factors or latent variables in a data set To be able to use PROC FACTOR to rotate factors for improved interpretation To be able to use PROC FACTOR to compute factor scores

4 Factor Analysis SAS ESSENTIALS -- Elliott & Woodward4  Factor analysis is a dimension reduction technique designed to express the actual observed variables using a smaller number of underlying latent variables.  Exploratory factor analysis involves identifying factors, determining which factors are needed to satisfactorily describe the original data, interpreting the meaning of these factors, and so on.  Confirmatory factor analysis involves techniques for testing hypotheses to confirm theories, and so on.

5 17.1 FACTOR ANALYSIS BASICS SAS ESSENTIALS -- Elliott & Woodward5  The typical steps in performing an exploratory factor analysis are the following: (a) Compute a correlation (or covariance) matrix for the observed variables. (b) Extract the factors (this involves deciding how many factors to extract, the method to use, and the values to use for the prior communality estimates). (c) Rotate the factors to improve interpretation. (d) Compute factor scores (if needed).  Factor analysis can be quite subjective without unique solutions. Consequently, there is a certain amount of "art" involved in any factor analysis solution.

6 Using PROC Factor SAS ESSENTIALS -- Elliott & Woodward6  The SAS procedure used to perform exploratory factor analysis is PROC FACTOR. A simplified syntax for this procedure is as follows: PROC FACTOR ; VAR variables ; PRIORS communalities; RUN;

7 SAS ESSENTIALS -- Elliott & Woodward7 Table 17.1 Common Options for PROC FACTOR OptionExplanation DATA = datanameSpecifies which data set to use. METHOD=optionSpecifies the estimation method. Options include ML and PRINCIPAL MINEIGEN=nSpecifies the smallest eigenvalue for retaining a factor. NFACTORS=nSpecifies the maximum number of factors to retain NOPRINTSuppress output PRIORS= optionSpecifies the method for obtaining prior communalities ROTATE = nameSpecifies the rotation method. The default is ROTATE=NONE. Common rotation methods are VARIMAX, QUARTIMAX, EQUAMAX, and PROMAX. All of the above are orthogonal rotations except PROMAX. SCREEDisplays a Scree plot of the eigenvalues. SIMPLEDisplays means, standard deviations, and number of observations CORRDisplays the correlation matrix

8 SAS ESSENTIALS -- Elliott & Woodward8 Common Statements for PROC FACTOR (Table 17.1 Continued) VAR variable list;Specifies the numeric variables to be analyzed. Default is to use all numeric variables BY, FORMAT, LABEL, WHERE These statements are common to most procedures, and may be used here.  NOTE: If the Methods=Principal option is used, then principal component analysis is performed when the PRIORS= option is not used or is set to ONE (the default).  If you specify a PRIORS= value other than PRIORS=ONE, then a principal factor method analysis is performed.  A common usage is PRIORS=SMC in which case the prior communality for each variable is the squared multiple correlation of it with all other variables.  After extracting the factors, the communalities represent the proportion of the variance in each of the original variables retained after extracting the factors.

9 Do Hands On Exercise p 379 (AFACTOR1.SAS) SAS ESSENTIALS -- Elliott & Woodward9  Two of the types of intelligence are Logical-Mathematical Intelligence and Linguistic Intelligence. In this example, we examine a hypothetical data set that contains six variables, each measured on a 0- 1 0 scale as follows:  COMPUTATION - Test on mathematical computations  VOCABULARY - A vocabulary test  INFERENCE - A test of the use of inductive and deductive inference  REASONING - A test of sequential reasoning  WRITING - A score on a writing sample  GRAMMAR - A test measuring proper grammar usage.

10 Using PROC LOGISTIC SAS ESSENTIALS -- Elliott & Woodward10 PROC FACTOR DATA=MYSASLIB.INTEL SIMPLE CORR SCORE METHOD=PRINICPAL ROTATE=VARIMAX OUT=FS PRIORS=SMC PLOTS=SCREE; RUN; Specifies the estimation method. Displays common statistics Specifies rotation method Specifies the method for obtaining prior communalities Requests SCREE plot

11 Observe Output From PROC FACTOR SAS ESSENTIALS -- Elliott & Woodward11  Simple Statistics

12 Correlation Matrix for Six Variables SAS ESSENTIALS -- Elliott & Woodward12  The high pairwise correlations among COMPUTATION, INFERENCE, and REASONING (to a lesser extent) seem to indicate some tendency to measure Math Intelligence while the variables VOCABULARY, WRITING, and GRAMMAR that seem to be measuring Linguistic Intelligence are also positively pairwise correlated.

13 Prior Communality Estimates SAS ESSENTIALS -- Elliott & Woodward13  Because we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. These prior communality estimates are given in this table

14 Scree Plot SAS ESSENTIALS -- Elliott & Woodward14 The Scree Plot gives a visual illustration of the sizes of the eigenvalues. It is clear that there are two dominant eigenvalues.

15 Eigenvalues SAS ESSENTIALS -- Elliott & Woodward15  This table displays eigenvalues associated with the factors based on the reduced correlation matrix. It is clear from the table that there are two dominant eigenvalues (2.319 and 1.725). Based on any reasonable criterion, it is clear that a two-factor solution should be used.

16 Communality Estiamates SAS ESSENTIALS -- Elliott & Woodward16  The communalities in this table are the proportion of the variance in each of the original variables retained after extracting the factors. It seems that all six variables are sufficiently well represented by the two factors, with variable REASONING having the smallest communality, 0.335.

17 Factor Pattern Matrix SAS ESSENTIALS -- Elliott & Woodward17  In this table, it can be seen that for Factor 1, each variable has a positive coefficient ranging from.41 for REASONING to.77 for WRITING.  A reasonable interpretation of this factor is that it is an overall measure of intelligence.  The second factor (Factor 2) has negative loadings on the variables measuring Linguistic Intelligence and positive coefficients on the others.

18 Interpreting the Factor Analysis Results SAS ESSENTIALS -- Elliott & Woodward18  Based on the less than ideal interpretability of these factors, we use a rotation in hope of producing more interpretable results. (Recall that by construction, there should be two factors: Math Intelligence and Linguistic Intelligence.)  Using the option ROTATE=VARIMAX, we have instructed SAS to perform a Varimax rotation.  SAS provides several rotation options, and Varimax is a popular "orthogonal rotation," which produces two orthogonal factors that are potentially easier to interpret.

19 Interpreting the Rotated Factor Pattern Matrix SAS ESSENTIALS -- Elliott & Woodward19  In this table the coefficients for COMPUTATION are the correlations of the variable COMPUTATION with each of the two factors.  There is a large positive correlation between COMPUTATION and Factor 2 and a very small correlation between COMPUTATION and Factor 1.  Similar interpretations show that Factor 1 is highly correlated with the three variables measuring Linguistic Intelligence and Factor 2 tends to correspond to Math Intelligence.

20 Storing Factor Scores SAS ESSENTIALS -- Elliott & Woodward20  Suppose you want to calculate factor scores and save them in a temporary working file FSCORES. In order to accomplish this, add the following PROC FACTOR options before PLOTS= SCREE; SCORE NFACTOR=2 OUT=FSCORE  Then, after the RUN; statement add the code PROC PRINT DATA=FSCORE; VAR FACTORl FACTOR2; RUN; Outputs a SAS dataset named FSCORE

21 Results of OUT=FSCORE SAS ESSENTIALS -- Elliott & Woodward21  The two-factor scores are given the default names FACTOR1 and FACTOR2 (the prefix "FACTOR" can be changed using the PREFIX= option).  Recalling that Factor 1 is a measure of Linguistic Intelligence and Factor 2 measures Math Intelligence, from the factor scores it can be seen that Subject 1 has a higher Linguistic Intelligence score, Subject 2 seems to have High Math Intelligence, and Subject 3 unfortunately doesn't seem to have strength in either dimension.

22 Do Hands On Example p 386 (AFACTOR2.SAS) SAS ESSENTIALS -- Elliott & Woodward22  Olympic Data  This data set contains scores of 193 athletes who completed all 10 decathlon events in the 1988 through 2012 Olympic Games.  The 10 events in the decathlon are 100-m run, long jump, shot put, high jump, 400-m run, 100-m hurdles, discus, pole vault, javelin, and 1500-m run.  These events measure a wide variety of athletic ability, and in this example we use this decathlon data set to explore whether there are some underlying dimensions of athletic ability.  It should be noted that the "times" in the running events are given negative signs so that " larger" values are better than "smaller" values as is the case in the distance measurements

23 Factor Analysis Code for Olympic Data SAS ESSENTIALS -- Elliott & Woodward23 PROC FACTOR SIMPLE CORR DATA MYSASLIB.OLYMPIC METHOD=PRINCIPAL MSA PRIORS=SMC ROTATE=VARIMAX OUTSTAT=FACT ALL PLOTS=SCREE; VAR RUNl0 LONGJUMP SHOTPUT HIGHJUMP RUN400 HURDLES DISCUS POLEVAULT JAVELIN RUNl500S; RUN;

24 Simple Statistics for Olympic Data SAS ESSENTIALS -- Elliott & Woodward24  As mentioned earlier, times in the running events are given negative signs so that "larger" values are better than "smaller" values as is the case in the distance measurements.  Moreover, the 1500-m results are given in (negative) seconds rather than the usual reporting of minutes and seconds.

25 Correlations for Olympic Data SAS ESSENTIALS -- Elliott & Woodward25  There are positive correlations between speed events such as the 100-m run and 100-m hurdles (0.692) and between strength events SHOTPUT and DISCUS (0.748). The 1500-m run is not highly correlated with any of the other events.  400-m run (0.368). X

26 Communality Estimates, Olympic Data SAS ESSENTIALS -- Elliott & Woodward26  Since we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. This table shows the prior communality estimates (slightly rearranged from the original output)

27 Eigenvalues for Olympic Data SAS ESSENTIALS -- Elliott & Woodward27  See next slide…

28 Eigenvalues for Olympic Data SAS ESSENTIALS -- Elliott & Woodward28  The eigenvalues table shows factors based on the reduced correlation matrix. PROC FACTOR selected three factors. It is clear from the previous table and the Scree plot that there are three dominant eigenvalues.

29 SAS ESSENTIALS -- Elliott & Woodward29  The communalities in this table (rearranged slightly from, output) are the proportion of the variance in each of the original variables retained after extracting the factors.  It seems that all 10 events are fairly well represented by the three factors, with all communalities above 0.33.  However, HIGHJUMP, POLEVALULT, JAVELIN, and RUN1500S all having communalities below 0.4.

30 SAS ESSENTIALS -- Elliott & Woodward30  As was the case for the unrotated solution for the Intelligence Data, it can be seen that Factor 1 has a positive coefficient, all of which are above 0.4 except for RUN1500S, which has a coefficient of 0.17.  A reasonable interpretation is that Factor 1 measures overall athletic ability, primarily related to the first nine events. Factors 2 and 3 are more difficult to interpret. Factor Patterns

31 Use ROTATE=VARIMAX SAS ESSENTIALS -- Elliott & Woodward31  Based on the confusing interpretations associated with the Three-Factor solutions given in the previous table, we again use a rotation to produce more interpretable results.  Using the option ROTATE=VARIMAX results in the Rotated Factor Pattern Matrix given in in the following slide…

32 Rotated Factor Patterns SAS ESSENTIALS -- Elliott & Woodward32  The first rotated factor seems to focus on events 100-m long jump, 400-m run, and 110-m hurdles that involve speed and spring.  Factor 2 seems to be primarily an arm strength factor with high coefficients for shot put and long jump and lesser in javelin, pole vault, and high jump.  The only event with a large coefficient in Factor 3 is the 1500-m hurdles. This is consistent the correlation matrix that suggested the 1500-m run was "different" from the other events.

33 17.2 SUMMARY SAS ESSENTIALS -- Elliott & Woodward33  In this chapter, we have discussed methods for using PROC FACTOR to perform exploratory factor analysis. In the Hands-on Examples, we have illustrated the use of rotation to obtain more understandable results.  Continue to Chapter 18: CREATING CUSTOM GRAPHS

34 SAS ESSENTIALS -- Elliott & Woodward34 These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2 nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: 111904216X ISBN-13: 978-1119042167 These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to acelliott@smu.edu. Thanks.


Download ppt "Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1."

Similar presentations


Ads by Google