Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratory Analysis of Crash Data

Similar presentations


Presentation on theme: "Exploratory Analysis of Crash Data"— Presentation transcript:

1 Exploratory Analysis of Crash Data
Fall 2017

2 Sampling Frame Sampling frame: the sampling frame is the list of the population (this is a general term) from which the sample is drawn. It is important to understand how the sampling frame defines the population represented. Example: If the study seeks to identify the safety effects of traffic signals, the sample frame should include a sample of signalized intersections in a given geographical area. If a control group is included, the sampling frame will include sites categorized under this group. Sig Int #2 Sig Int #1 Unsig Int #2 Unsig Int #1 Unsig Int #7 Sig Int #9 Signalized Unsignalized

3 Sampling Frame Map crashes for Year 1 Map crashes for Year 2

4 Sampling Frame Number of Crashes for Year 1
3 10 5 2 7 1 1 4 2 11 2 6 3 1 8 10 5 1 2 4 6 1 3 Number of Crashes for Year 2 6 3 7

5 Signalized Intersections Database
Sampling Frame Signalized Intersections Database Intersection Number Crashes/Year Traffic Flow – Major Other Site Characteristics* Year 1 11,500 2 3 12,000 10 10,000 9 6 6,300 12,200 6,100 * ex: Nb of lanes, actuated signals, exclusive left-turn lane, etc.

6 Signalized Intersections Database
Sampling Frame Signalized Intersections Database Crash Count 1 Intersection 1 Year 1 2 Crash Count 6 3 Intersection 9 Year 1 2

7 Unsignalized Intersections Database
Sampling Frame Unsignalized Intersections Database Intersection Number Crashes/Year Traffic Flow – Major Other Site Characteristics* Year 1 2 8,400 9,000 3 8,500 7 7,900 5 8,600 9,400 9 7,800 * ex: Nb of lanes, actuated signals, exclusive left-turn lane, etc.

8 Histograms

9 Ogives Source: Washington et al. (2003)

10 Box Plots

11 Scatter Diagrams

12 Scatter Diagrams

13 Scatter Diagrams

14 Scatter Diagrams

15 Bar and Line Charts Source: Washington et al. (2003)

16 3D Bar Charts

17 Two by Two Tables Crash Severity / Flow Range < 5,000 5,000-9,999
≥ 10,000 Fatal 10 12 15 Non-Fatal Injury 100 120 135 PDO 550 700 900

18 Maps

19 Maps – GIS Information

20 Confidence Intervals Statistics are usually calculated from samples, such as the sample average X, variance s2, the standard deviation s, are used to estimate the population parameters. For instance: X is used as an estimate of the population μx s2 is used as an estimate of the population variance σ2 Interval estimates, defined as Confidence Intervals, allow inferences to be drawn about the population by providing an interval, a lower and upper value, within which the unknown parameter will lie with a prescribed level of confidence. In other words, the true value of the population is assumed to be located within the estimated interval.

21 Confidence Interval for μ and known σ2
Confidence Intervals Confidence Interval for μ and known σ2 95% CI Any CI 90% CI

22 Confidence Intervals Compute the 95% confidence interval for the mean vehicular speed. Assume the data is normally distributed. The sample size is 1,296 and the sample mean X is Suppose the population standard deviation (σ) has previously been computed to be 5.5.

23 Confidence Intervals Answer
Compute the 95% confidence interval for the mean vehicular speed. Assume the data is normally distributed. The sample size is 1,296 and the sample mean X is Suppose the population standard deviation (σ) has previously been computed to be 5.5. Answer

24 Confidence Interval for μ and unknown σ2
Confidence Intervals Confidence Interval for μ and unknown σ2 95% CI Any CI 90% CI Only valid if n > 30

25 Confidence Intervals Answer
Same example: Compute the 95% confidence interval for the mean vehicular speed. Assume the data is normally distributed. The sample size is 1,296 and the sample mean X is Now, suppose a sample standard deviation (s) has previously been computed to be 4.41. Answer

26 Confidence Interval for a Population Proportion
Confidence Intervals Confidence Interval for a Population Proportion The relative frequency in a population may sometimes be of interest. The confidence interval can be computed using the following equation: Where, p is an estimator of the proportion in a population; and, q = 1 – p. Normal approximation is only good when np > 5 and nq > 5. ^ ^ ^

27 Confidence Intervals A transportation agency located in a small city is interested to know the percentage of people who were involved in a collision during the last calendar year. A random sample is conducted using 1000 drivers. From the sample, it was found that 110 drivers were involved in at least one collision. Compute the 90% CI.

28 Confidence Intervals Answer
A transportation agency located in a small city is interested to know the percentage of people who were involved in a collision during the last calendar year. A random sample is conducted using 1,000 drivers. From the sample, it was estimated that 110 drivers were involved in at least one collision. Compute the 90% CI. Answer

29 Population Proportion

30 Confidence Interval Population Variance
Confidence Intervals Confidence Interval Population Variance When the population variance is of interest, the confidence interval can be computed using the following equation: Where, X 2 is Chi-Square with n-1 degrees of freedom Assumption: the population is normally distributed.

31 Confidence Intervals Taking the same example before on the vehicular speed, compute the confidence interval (95%) for variance for the speed distribution. A sample of 100 vehicles has shown a variance equal to mph.

32 Confidence Intervals Taken from Chi-Square Table Answer
Taking the same example before on the vehicular speed, compute the confidence interval (95%) for variance for the speed distribution. A sample of 100 vehicles has shown a variance equal to mph. Taken from Chi-Square Table Answer

33 The Chi-Square Goodness-of -fit
Non-parametric test useful for observations that are assumed to be normally distributed. Need to have more than 5 observations per cell. The test statistic is If the value on the right-hand side is less than the Chi-Square with n-1 degrees of freedom, the observed and estimated values are the same. If not, the observed and estimated values are not the same. You can also perform this test for two-way contingency tables.


Download ppt "Exploratory Analysis of Crash Data"

Similar presentations


Ads by Google