Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
Exploratory Factor Analysis
Dimension reduction (1)
OVERVIEW What is Factor Analysis? (purpose) History Assumptions Steps Reliability Analysis Creating Composite Scores.
Principal Components Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University.
Lecture 7: Principal component analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
PSY 307 – Statistics for the Behavioral Sciences
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Factor Analysis There are two main types of factor analysis:
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Education 795 Class Notes Factor Analysis II Note set 7.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
Principal Components An Introduction
Measuring the Unobservable
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
Chapter 15 Correlation and Regression
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
MGMT 6971 PSYCHOMETRICS © 2014, Michael Kalsher
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 22, 2012.
Lecture 12 Factor Analysis.
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Applied Quantitative Analysis and Practices
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Education 795 Class Notes Factor Analysis Note set 6.
Exploratory Factor Analysis Principal Component Analysis Chapter 17.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Department of Cognitive Science Michael Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Factor Analysis 1 PSYC 4310 Advanced Experimental.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 3, 2013.
Principal Component Analysis
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
1 FACTOR ANALYSIS Kazimieras Pukėnas. 2 Factor analysis is used to uncover the latent (not observed directly) structure (dimensions) of a set of variables.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Exploratory Factor Analysis
Unsupervised Learning
Advanced Methods and Analysis for the Learning and Social Sciences
LECTURE 11: Advanced Discriminant Analysis
Factor analysis Advanced Quantitative Research Methods
Making Sense of Advanced Statistical Procedures in Research Articles
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Principal Component Analysis
Data Transformations targeted at minimizing experimental variance
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Unsupervised Learning
Presentation transcript:

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012

Today’s Class Factor Analysis

Goal 1 of Factor Analysis You have a large data space with many quantitative* variables You want to reduce that data space into a smaller number of factors

Goal 1 of Factor Analysis You have a large data space with many quantitative* variables You want to reduce that data space into a smaller number of factors * -- there is also a variant for categorical and dichotomous data, Latent Class Factor Analysis (LCFA -- Magidson & Vermunt, 2001; Vermunt & Magidson, 2004), as well as a variant for mixed data types, Exponential Family Principal Component Analysis (EPCA – Collins et al., 2001)

Goal 2 of Factor Analysis You have a large data space with many quantitative variables You want to understand the structure that unifies these variables

Classic Example You have a questionnaire with 100 items Do the 100 items group into a smaller number of factors – E.g. Do the 100 items actually tap only 6 deeper constructs? – Can the 100 items be divided into 6 scales? – Which items fit poorly in their scales? Common in attempting to design questionnaire with scales and sub-scales

Another Example You have a set of 600 features of student behavior You want to reduce the data space before running a classification algorithm Do the 600 features group into a smaller number of factors? – E.g. Do the 600 features actually tap only 15 deeper constructs?

Example from my work (Baker et al., 2009) We developed a taxonomy of 79 design features that a Cognitive Tutor lesson could possess We wanted to reduce the data space before running statistical significance tests Do the 79 design features group into a smaller number of factors? – E.g. Do the 79 features actually group into 6 major dimensions of tutor design? – The answer was yes

Two types of Factor Analysis Experimental – Determine variable groupings in bottom-up fashion – More common in EDM/DM Confirmatory – Take existing structure, verify its goodness – More common in Psychometrics

Mathematical Assumption in most Factor Analysis Each variable loads onto every factor, but with different strengths

Example F1F2F3 V V V V V V V V V V V V

Computing a Factor Score Can we write equation for F1? F1F2F3 V V V V V V V V V V V V

Which variables load strongly on F1? F1F2F3 V V V V V V V V V V V V

Wait… what’s a “strong” loading? One common guideline: > 0.4 or < -0.4 Comrey & Lee (1992) – 0.70 excellent (or -0.70) – 0.63 very good – 0.55 good – 0.45 fair – 0.32 poor One of those arbitrary things that people seem to take exceedingly seriously – Another approach is to look for a gap in the loadings in your actual data

Which variables load strongly on F2? F1F2F3 V V V V V V V V V V V V

Which variables load strongly on F3? F1F2F3 V V V V V V V V V V V V

Which variables don’t fit this scheme? F1F2F3 V V V V V V V V V V V V

Assign items to factors to create scales F1F2F3 V V V V V V V V V V V V

Item Selection After loading is created, you can create one- factor-per-variable models by iteratively – assigning each item to one factor – dropping item that loads most poorly in its factor, and has no strong loading – re-fitting factors

Item Selection Some researchers recommend conducting item selection based on face validity – e.g. if it doesn’t look like it should fit, don’t include it What do people think about this?

Final chance to decide on scales F1F2F3 V V V V V V V V V V V V

How does it work mathematically? Two algorithms – Principal axis factoring (PAF) Fits to shared variance between variables – Principal components analysis (PCA) Fits to all variance between variables, including variance unique to specific variables The reading discusses PAF PCA a little more common these days Very similar, especially as number of variables increases

How does it work mathematically? Tries to find lines, planes, and hyperplanes in the K-dimensional space (K variables) Which best fit the data

Figure From Eriksson et al (2006)

How does it work mathematically? First factor tries to find a combination of variable-weightings that gets the best fit to the data Second factor tries to find a combination of variable-weightings that best fits the remaining unexplained variance Third factor tries to find a combination of variable-weightings that best fits the remaining unexplained variance…

How does it work mathematically? Factors are then made orthogonal (e.g. uncorrelated to each other) – Uses statistical process called factor rotation, which takes a set of factors and re-fits to maintain equal fit while minimizing factor correlation – Essentially, there is a large equivalence class of possible solutions; factor rotation tries to find the solution that minimizes between-factor correlation

Goodness What proportion of the variance in the original variables is explained by the factoring? (e.g. r 2 – called in Factor Analysis land an estimate of the communality) Better to use cross-validated r 2 – Still not standard

How many factors? Best approach: decide using cross-validated r 2 Alternate approach: drop any factor with fewer than 3 strong loadings Alternate approach: add factors until you get an incomprehensible factor – But one person’s incomprehensible factor is another person’s research finding!

How many factors? Best approach: decide using cross-validated r 2 Alternate approach: drop any factor with fewer than 3 strong loadings Alternate approach: add factors until you get an incomprehensible factor – But one person’s incomprehensible factor is another person’s research finding! – WTF!

Relatively robust to violations of assumptions Non-linearity of relationships between variables – Leads to weaker associations Outliers – Leads to weaker associations Low correlations between variables – Leads to weaker associations

Desired Amount of Data At least 5 data points per variable (Gorsuch, 1983) At least 3-6 data points per variable (Cattell, 1978) At least 100 total data points (Gorsuch, 1983) Comrey and Lee (1992) guidelines for total sample size – 100= poor – 200 = fair – 300 = good – 500 = very good – 1,000 or more = excellent

Desired Amount of Data At least 5 data points per variable (Gorsuch, 1983) At least 3-6 data points per variable (Cattell, 1978) At least 100 total data points (Gorsuch, 1983) Comrey and Lee (1992) guidelines for total sample size – 100= poor – 200 = fair – 300 = good – 500 = very good – 1,000 or more = excellent My opinion: cross-validation controls for this

Desired Amount of Data More important for confirmatory factor analysis than exploratory factor analysis Why might this be?

OK you’ve done a factor analysis, and you’ve got scales One more thing to do before you publish

OK you’ve done a factor analysis, and you’ve got scales One more thing to do before you publish Check internal reliability of scales Cronbach’s 

N = number of items C = average inter-item covariance (averaged at subject level) V = average variance (averaged at subject level)

Cronbach’s  : magic numbers (George & Mallory, 2003) > 0.9 Excellent Good Acceptable Questionable Poor < 0.5 Unacceptable

Related Topic Clustering Not the same as factor analysis – Factor analysis finds how data features/variables/items group together – Clustering finds how data points/students group together In many cases, one problem can be transformed into the other But conceptually still not the same thing Next class!

Asgn. 7 Questions? Comments?

Next Class Wednesday, March 14 3pm-5pm AK232 Clustering Readings Witten, I.H., Frank, E. (2005)Data Mining: Practical Machine Learning Tools and Techniques. Sections 4.8, 6.6. Assignments Due: 7. Clustering

The End