Download presentation
Presentation is loading. Please wait.
1
Farah J. Al-Mahameed, Ph.D. Candidate
Integrating Exploratory Factor Analysis And Confirmatory Factor Analysis To Find Robust Predictors Of Pedestrian/Bicyclist Crashes Farah J. Al-Mahameed, Ph.D. Candidate Xiao Qin, Ph.D., PE, , Associate Professor Robert James Schneider, Ph.D., Associate Professor Md. Shaon, Ph.D. Candidate University of Wisconsin-Milwaukee
2
Outline Introduction Exploratory Factor Analysis (EFA) vs. Confirmatory Factor Analysis (CFA) Literature Review Data Collection EFA and CFA Results Final Model and Discussion Conclusions and Future Work
3
Introduction According to the Fatality Analysis Reporting (FAR) System (1); Pedestrian fatalities involving motor vehicles increased by 46% Bicyclist fatalities increased by 34% The statistical methods previously used are aimed to construct models that represent the direct relationships between explanatory and dependent variables. This study use exploratory factor analysis (EFA) to inform the construct of confirmatory factor analysis (CFA). (1)National Highway Traffic Safety Administration. Fatality Analysis Reporting System, Available online, accessed June 2018. 1. According to the Fatality Analysis Reporting (FAR) System, pedestrian fatalities involving motor vehicles increased by 46% (4,109 to 5,987) and bicyclist fatalities increased by 34% (628 to 840) between 2009 and Pedestrians and bicyclists, or vulnerable roadway users (VRUs), accounted for more than 18% of the 37,461 total US fatalities in 2016, up from a low of 13% in 2003. 2. The statistical methods previously used are aimed to construct models that represent the direct relationships between explanatory and dependent variables. However, the causes of crashes often involve intricate relationships among multiple variables, which may not be adequately captured. 3. This study integrated exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) which is a special case of structural equation model SEM to establish the relationship between pedestrian and bicyclist crashes and explanatory variables.
4
EFA vs. CFA EFA CFA Defines the content of factors, using latent constructs and determines the factor structure. Requires specification of a model a priori with the number of factors. Assumes random sampling and a linear relationship between observed variables. Provides a fit of the hypothesized factor structure to the observed data. Limited in a way that it does not provide causal inferences and assumes normal distribution of the data. Allows the researcher to specify correlated measurement errors and perform statistical comparisons of alternative models, test second-order factor models, and statistically compare the factor structure of two or more models. Discuss the difference between EFA and CFA.
5
Literature Review (2)Kim, K., and E. Yamashita. Motor Vehicle Crashes and Land Use: Empirical Analysis from Hawaii. Transportation Research Record: Journal of the Transportation Research Board, Vol. 1784, 2002, pp. 73–79. (3) Lee, J.-Y., J.-H. Chung, and B. Son. Analysis of Traffic Accident Size for Korean Highway Using Structural Equation Models. Accident Analysis & Prevention, Vol. 40, No. 6, 2008, pp. 1955– (4) Schorr, J. P., and S. H. Hamdar. Safety Propensity Index for Signalized and Unsignalized Intersections: Exploration and Assessment. Accident Analysis & Prevention, Vol. 71, 2014, pp. 93–105. (5) Wang, K., and X. Qin. Exploring Driver Error at Intersections: Key Contributors and Solutions. Transportation Research Record: Journal of the Transportation Research Board, Vol. 2514, 2015, pp. 1–9.
6
Data Collection Among our 200 study corridors, most are located in the Southeast Wisconsin; 115 had at least one reported pedestrian crash and 67 had at least one reported bicycle crash.
7
EFA ─ Number of Latent Factors
Eigen Values of Principal Factors 2 factors? 3 factors?? Parallel analysis showed that most suitable number of factors for such data is between 2 and 3 (elbow), loading values for both scenarios is shown. Explain how EFA with 2 factors have higher number of non-significant variables and more cross-loading, which is harder to interpret. Also mention that comparing both scenarios using goodness of fit indices, the model with 3 latent constructs outperformed the other. Factor Number
8
EFA ─ Factor Loading “Walk” variable, which present walking activity was grouped under factor 2 along with high speed limit, sidewalk density, and number of unsignalized intersections. This result creates confusion to the researcher when trying to name the latent variables. Parallel analysis showed that most suitable number of factors for such data is between 2 and 3 (elbow), loading values for both scenarios is shown. Explain how EFA with 2 factors have higher number of non-significant variables and more cross-loading, which is harder to interpret. Also mention that comparing both scenarios using goodness of fit indices, the model with 3 latent constructs outperformed the other. The bolded loadings are larger than a threshold of (0.40), few are correlated with more than one factor
9
EFA → CFA 64 Variables EFA Reduce data by excluding non significant variables Preliminary model structure is derived since no previous knowledge about the structure CFA Loadings of indicator variable on corresponding latent factor Derive estimated interrelation, Model structure is verified, find causal relations among variables, and solve the cross-loading Parallel analysis showed that most suitable number of factors for such data is between 2 and 3 (elbow), loading values for both scenarios is shown. Explain how EFA with 2 factors have higher number of non-significant variables and more cross-loading, which is harder to interpret. Also mention that comparing both scenarios using goodness of fit indices, the model with 3 latent constructs outperformed the other.
10
CFA Model Comparison Model Comparison
Three latent exogenous variables are: bicycle/pedestrian-oriented roadway, measured by un-signalized intersection density, paved shoulder, bike lane, sidewalk and posted speed limit exposure, measured by walk, bike, employment density and AADT low social status, measured by poverty, education, wage, and vehicle ownership. Model RMSEA CFI AIC Number of variables SEM with two latent exogenous variables 0.083 0.853 1,104.3 9 SEM with three exogenous latent variables 0.070 0.915 378.7 13
11
Final CFA Model Results
-
12
Conclusions In this study, 64 variables relating to pedestrian and bicycle safety have been collected from mile long street corridors. Among them, 13 variables are significant indicators of the three latent factors: pedestrian/bicycle oriented roadway design, exposure, and social status. The three latent factors are able to quantify the corridor safety through a crash index, which is a latent factor measured by observed pedestrian and bicycle crashes. The EFA discovers possible relationships among data items and informs the development of CFA. The CFA fits the data to a model structure that is hypothesized by EFA and a prior knowledge. Combining EFA and CFA enables effective distinction between direct, indirect, and synergic effects among variables and provides a better understanding of the intricate interrelationship of the influential factors. A unique conclusion is derived from the correlation between two exogenous latent variables, showing the high positive correlation between low social status and exposure. Owning zero vehicles (shows a low social status) will increase the individual’s exposure and therefore increase his/her crash index leading to more crash involvement. In addition, the presence of paved shoulders tends to decrease the crash index (indirectly) by improving the road design for pedestrians and bicyclists. Paved shoulders provide additional space for pedestrians and bicyclists outside of travel lanes, even the elevation between the shoulder and the roadway, and reduce the presence of gravel or sand that may contribute into bicyclist crashes.
13
Future Work The database contained crashes with motor vehicles only, as they appeared to be the most severe, but they have been found to represent only a fraction of total pedestrian and bicycle crashes. More sites are desirable to improve the model fit and significance of the input variables. Using behavioral data to ensure that these factors are well studied.
14
Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.