Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall 2015. Statistical Models For Crash Data Modeling Process Determine Modeling Objectives Definition (Intersections, Pedestrians, etc.) Data availability.

Similar presentations


Presentation on theme: "Fall 2015. Statistical Models For Crash Data Modeling Process Determine Modeling Objectives Definition (Intersections, Pedestrians, etc.) Data availability."— Presentation transcript:

1 Fall 2015

2 Statistical Models For Crash Data Modeling Process Determine Modeling Objectives Definition (Intersections, Pedestrians, etc.) Data availability Unit Scales (Crashes/year; Severity; etc.) Establish Appropriate Process Sampling Models Observational Models Process/System State Models Parameter Models (Bayesian Models Only)

3 Statistical Models For Crash Data Modeling Process Determine Inferential Goals Point estimate (Value + Standard Error) Distribution (Bayesian Models) Percentiles (2.5%, 85%, etc.; Bayesian Models) Select Computation Techniques Frequentist (MLE) Bayesian (via simulation) Empirical Bayes Evaluate Models Goodness-of-Fit Prediction Confidence Intervals

4 Data/Methodological IssueAssociated Problems OverdispersionCan violate some the basic count-data modeling assumptions of some modeling approaches UnderdispersionAs with overdispersion, can violate some the basic count-data modeling assumptions of some modeling approaches Time-varying explanatory variablesAveraging of variables over studied time intervals ignores potentially important variations within time intervals – which can result in erroneous parameter estimates Temporal and spatial correlationCorrelation over time and space causes losses in estimation efficiency Low sample mean and small sample sizeCauses an excess number of observations where zero crashes are observed which can cause errors in parameter estimates Injury severity and crash type correlationCorrelation between severities and crash types causes losses in estimation efficiency when separate severity-count models are estimated Under reportingUnder reporting can distort model predictions and lead to erroneous inferences with regard to the influence of explanatory variables Omitted variables biasIf significant variables are omitted from the model, parameter estimates will be biased and possibly erroneous inferences with regard to the influence of explanatory variables will result Endogenous variablesIf endogenous variables are included without appropriate statistical corrections parameter estimates will be biased and erroneous inferences with regard to the influence of explanatory variables may be drawn Functional formIf incorrect functional for is used, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables Fixed parametersIf parameters are estimated as fixed when they actually vary across observations, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables Statistical Models For Crash Data Data and Methodological Issues Associated with Crash-Frequency Data

5 Statistical Models For Crash Data Summary of Existing Models for Analyzing Crash-Frequency Data Model TypeAdvantagesDisadvantages PoissonMost basic model; easy to estimateCannot handle over- and under-dispersion; negatively influenced by the low sample mean and small sample size bias Negative binomial/Poisson- gamma Easy to estimate can account for overdispersion Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias Poisson-lognormalMore flexible than the Poisson-gamma to handle over-dispersion Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias (less than the Poisson-gamma); cannot estimate a varying dispersion parameter Zero-inflated Poisson and negative binomial Handles datasets that have a large number of zero-crash observations Can create theoretical inconsistencies; zero- inflated negative binomial can be adversely influenced by the low sample mean and small sample size bias Conway-Maxwell-PoissonCan handle under- and over-dispersion or combination of both using a variable dispersion (scaling) parameter Could be negatively influenced by the low sample mean and small sample size bias; no multivariate extensions available to date GammaCan handle under-dispersed dataTruncated distribution (full gamma function); independence of data (incomplete gamma function) Generalized estimating equation models Can handle temporal correlationMay need to determine or evaluate the type of temporal correlation a priori; results sensitive to missing values Generalized additive models More flexible than the traditional generalized estimating equation models; allows non-linear variable interactions Relatively complex to implement; may not be easily transferable to other datasets

6 Statistical Models For Crash Data Summary of Existing Models for Analyzing Crash-Frequency Data Model TypeAdvantagesDisadvantages Random-effects modelsHandles temporal and spatial correlationMay not be easily transferable to other datasets Negative multinomialCan account for overdispersion and serial correlation; panel count data. Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias Random-parameters models More flexible than the traditional fixed parameter models in accounting for unobserved heterogeneity Complex estimation process; may not be easily transferable to other datasets Bivariate/multivariate models Can model different crash types simultaneously; more flexible functional form than the generalized estimating equation models (can use non-linear functions) Complex estimation process; requires formulation of correlation matrix Finite mixture/Markov Switching Can be used for analyzing sources of dispersion in the data Complex estimation process; may not be easily transferable to other datasets Duration modelsBy considering the time between crashes (as opposed to crash frequency directly), allows for a very in-depth analysis of data and duration effects Requires more detailed data than traditional crash frequency models; time- varying explanatory variables are difficult to handle Hierarchical/Multilevel Models Can handle temporal, spatial and other correlations among groups of observations May not be easily transferable to other datasets; correlation results can be difficult to interpret Neural Network, Bayesian Neural Network, and support vector machine Non parametric approach does not require an assumption about distribution of data; flexible functional form; usually provides better statistical fit than traditional parametric models Complex estimation process; may not be transferable to other datasets; work as black-boxes; may not have interpretable parameters

7 Review of Multivariate Linear Models Ordinary Least Square Method: This is an estimation technique that is used for estimating unknown coefficients. It consists of solving p = k + 1 simultaneously linear equations and by minimizing the sum of square errors. Let Note: E(ε) = 0 and var(ε) = σ 2

8 Review of Multivariate Linear Models The least square function S is given by The S function is to be minimized with respect to β 1, β 2, …, β k. The least square estimators, say b 0, b 1, …, b k, must satisfy j = 1, 2, …, k

9 Review of Multivariate Linear Models It is easier to solve the equations by using a matrix format. The equations can be written the following way: where

10 Review of Multivariate Linear Models Need to find the least square estimator b that minimizes It can be shown that S(β) can be expressed this way The least square estimator* must satisfy which simplifies to * b is called the ordinary least squares estimator of β.

11 Review of Multivariate Linear Models Maximum Likelihood Method: The likelihood function is found from the joint probability distribution of the observations. Given the assumption that the distribution of errors is normally distributed and the variance σ 2 is constant, the likelihood function is the following (normal distribution) Same model as before:

12 Review of Multivariate Linear Models The maximum likelihood estimators are the values of the parameters β and σ 2 that maximize the likelihood function. Maximizing the likelihood is equivalent to maximizing the log-likelihood,. The log-likelihood is: The derivative of the log-likelihood function is called the score function. Taking the derivatives with respect to the coefficients β and equating to zero yields

13 Review of Multivariate Linear Models Taking the partial derivative with respect to gives Which is

14 Generalized Linear Models In the previous overheads, it was obvious how the normal distribution played an important role in estimating the coefficients and inferences of probabilistic models. Unfortunately, there are many practical situations where the normal assumption is not valid. Count data, binary response (0 or 1) or other continuous variables with positive and high-skewed distribution cannot be modeled with a normally distributed errors. The generalized linear model (GLM) was developed to allow fitting regression models for univariate response data that follows a very general distribution called exponential family. This family includes the normal, binomial, negative binomial, geometric, gamma, etc.

15 Statistical Models For Crash Data Poisson-gamma Model (NB) The crash count (or any count) follows a Poisson distribution: The mean of y i, conditional on μ i, is Poisson with the conditional mean and variance given by

16 Statistical Models For Crash Data Poisson-gamma Model (NB) The PDF of the Poisson-gamma regression for y i is The mean and variance are given by The mean function is given by or

17 Statistical Models For Crash Data Poisson-gamma Model Example – Crash Data at 3-legged signalized intersections: Expected number of crashes Where, Major traffic flow Functional form: Functional form needed to model crash data: Minor traffic flow Need to take the natural log of the flow variables

18 Statistical Models For Crash Data Poisson-gamma Model The GENMOD Procedure Model Information Data Set WORK.C Distribution Negative Binomial Link Function Log Dependent Variable Total Total Number of Observations Read 255 Number of Observations Used 255 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 252 288.8580 1.1463 Scaled Deviance 252 288.8580 1.1463 Pearson Chi-Square 252 312.6975 1.2409 Scaled Pearson X2 252 312.6975 1.2409 Log Likelihood 836.0686 Full Log Likelihood -606.7989 AIC (smaller is better) 1221.5978 AICC (smaller is better) 1221.7578 BIC (smaller is better) 1235.7628 Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -10.0648 1.3659 -12.7420 -7.3876 54.29 <.0001 logf_maj 1 0.7517 0.1320 0.4929 1.0105 32.41 <.0001 logf_min 1 0.4837 0.0562 0.3735 0.5939 74.01 <.0001 Dispersion 1 0.3153 0.0519 0.2135 0.4170 NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.

19 Statistical Models For Crash Data Statistical fit (Goodness of fit) There are various methods for estimating the statistical fit of models. The methods cane be divided into two categories: Likelihood Statistics Log-Likelihood Deviance Pearson Chi-Square Akaike’s Information Criterion (AIC) Bayesian Information Criterion (BIC) Model Errors Mean Absolute Deviance Mean Squared Prediction Errors

20 Log-likelihood Statistical Models For Crash Data Poisson: NB: Where:

21 Log-likelihood Statistical Models For Crash Data Poisson: NB: -685.34 -606.80 Example – Crash Data at 3-legged signalized intersections:

22 Statistical Models For Crash Data Statistical fit (Goodness of fit) The deviance statistic is defined as twice the difference between the maximum log-likelihood achievable (y=μ) and the log-likelihood of the fitted model: When competitive models are compared, the model with the lowest deviance offers the best statistical fit. A note of caution: this is only valid when the dispersion parameter Φ is the same for each competitive model.

23 Statistical Models For Crash Data Statistical fit (Goodness of fit) The deviance statistic for the Poisson model is the following: The deviance statistic for the Poisson-gamma model is the following:

24 Statistical Models For Crash Data Statistical fit (Goodness of fit) The deviance statistic for the Poisson model is the following: The deviance statistic for the Poisson-gamma model is the following:

25 Statistical fit (Goodness of fit) Statistical Models For Crash Data AIC: BIC: P = estimated coefficients + 1 n = number of observations AIC and BIC penalize the fit when additional variables are added to the model.

26 AIC and BIC Statistical Models For Crash Data AIC: BIC: AIC and BIC penalize the fit when additional variables are added to the model.

27 Statistical fit (Model Errors) Statistical Models For Crash Data Mean Absolute Deviation (MAD) This criterion has been proposed by Oh et al. (2003) to evaluate the fit of models. The Mean Absolute Deviance (MAD) calculates the absolute difference between the estimated and observed values Mean Squared Prediction Error (MSPE) The Man Squared Prediction Error (MSPE) is a traditional indicator of error and calculates the difference between the estimated and observed values squared.

28  Recent Models for Over-dispersion: ◦ Poisson-lognormal  Poisson mean follows a lognormal distribution ◦ Poisson-Weibull  Poisson mean follows a Weibull distribution ◦ Random-Parameters (investigation of the variance) ◦ Negative Binomial-Lindley (highly dispersed data)  Overcome problems with zero-inflated models. ◦ Generalized Sichel (highly dispersed data) ◦ Generalized Waring (highly dispersed data – investigation of variance) ◦ Finite mixture (Poisson and Poisson-gamma – investigation of variance and structure of data) ◦ Bayesian Model Averaging (automatically compare different models) ◦ See AA&P and Safety Science for info on some of these models.

29  Recent Models for Under-dispersion: ◦ Not very common; usually with low sample mean and often based on model output (conditional on the mean). ◦ All the models below can be also used for over-dispersion ◦ Gamma time-dependent  Observations not independent. ◦ Conway-Maxwell-Poisson  Has become increasingly popular ◦ Double-Poisson  Work published ◦ Hyper-Poisson  Work published

30  Crash data have often the characteristics that the mean μ can be very low (below 1.0)  Create problems with goodness-of-fit and prediction  Read papers by ◦ Wood, G.R. (2004) Generalised Linear Models and Goodness of Fit Testing. Accident Analysis & Prevention, Vol. 34, pp. 417-427. ◦ Lord, D. (2006) Modeling Motor Vehicle Crashes using Poisson-gamma Models: Examining the Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter. Accident Analysis & Prevention, Vol. 38, No. 4, pp. 751-766.

31 Statistical Models For Crash Data Low Mean Issue

32 Statistical Models For Crash Data Time Trend Effects

33 Statistical Models For Crash Data Time Trend Effects Goal: capture changes that vary from year to year directly into the model. The model structure is given by the following: Time Trend captured with the intercept (i.e., one intercept for each year) Characteristic: each year is defined as a different observation. Issues: Since each site is observed at a different point in time, a temporal serial correlation exits and affects the statistical inferences of statistical models. Therefore, you need to account for this correlation into the model. Modeling approach: Generalized Estimating Equations (GEE); Random-Effects models, etc.

34  The Bayes method approaches the analysis of data differently than the classical method (frequentist)  Subjective judgment more easily incorporated with the observed data and models  Treat unknown coefficients of regression models as random variables  Data analysis less limited by the number of observations (can be supplemented with subjective judgment)  Computationally intensive (no longer an issue)

35  The Bayes method makes inferences from data using probability models for quantities that are observed and for quantities one is interested to learn about  Bayesian data analysis can be divided into three steps: ◦ Setting up a full probability model: provide a joint probability distribution for all observable and unobservable quantities ◦ Conditioning on observed data: calculating and interpreting the appropriate posterior distribution (conditional probability distribution) ◦ Evaluating the fit of the model and implication of the posterior distribution  Emphasis placed on interval estimation (confidence interval) rather than hypothesis testing

36  For the EB method, a different weight is assigned to the prior distribution and standard estimate respectively  In safety analyses, the weights are estimated with the assumption that the mean () for each site follows a Gamma distribution  The EB estimates has been found to outperform other estimates, such as the MLE  The EB framework is presented on next overhead

37 Formulation: where Dispersion parameter of NB regression Mean of a Poisson-gamma regression

38 Using the same example shown earlier: F 1 = 24,164; F 2 = 3,392; y=10 The values are estimated as follows Crashes per year

39 Crashes per Year Year 1 2 t MLE estimate 3.9 EB estimate 7.63 Observed value 10


Download ppt "Fall 2015. Statistical Models For Crash Data Modeling Process Determine Modeling Objectives Definition (Intersections, Pedestrians, etc.) Data availability."

Similar presentations


Ads by Google