Presentation on theme: "Direction Direction = 0: Northbound Direction = 1: Southbound."— Presentation transcript:
Direction Direction = 0: Northbound Direction = 1: Southbound
Trip Time Points to include in the paper Trip time is a very powerful tool. Important to generate regularly and for every route. Can use to understand the effect of different variables on trip time. Can give the example of considering the impact of adding another stop, etc.
Trip Time Model Modified Ahmed Version – outliers removed tripmiles > 0 & tripmiles 0 X1 = Distance (in miles) X2 = Scheduled Number of Stops X3 = Direction or Southbound X4 = AM Peak X5 = PM Peak X6 = Actual Number of Stops X7 = Boardings + Alightings X8 = Lift X9 = Average Passenger Load X10 = Total Dwell Time X11 = Precipitation X12 = Average Temperature X13 = Summer (dummy variable if month = June thru August) X14 = (Boardings + Alightings) 2 Ahmed says use this version N = 53,072 (# of trips)
Trip Time Model Modified Ahmed Version – outliers removed tripmiles > 0 & tripmiles 0 X1 = Distance (in miles) X2 = Scheduled Number of Stops X3 = Direction or Southbound X4 = AM Peak X5 = PM Peak X6 = Actual Number of Stops X7 = Boardings + Alightings X8 = Lift X9 = Average Passenger Load X10 = Total Dwell Time X11 = Precipitation X12 = Average Temperature X13 = Summer (dummy variable if month = June thru August) X14 = (Boardings + Alightings) 2 This is the same as the prior model except this one is not excluding runs where difference between boardings and alightings is greater than 100 N = 53,130 (# of trips)
Tips for Utilizing a Large Data Set Managing a data set in excess of 6 million rows was a challenge that required careful thought and experimentation. Ultimately, we were able to arrange the data such that complex calculations could be performed very quickly. Even with the added capacity of Excel 2007, 6 million rows of data cannot be opened. Instead, we imported the data into a statistical software application called Matlab™. Initial calculations, however, were prohibitively slow. For example, with more than 54,000 unique bus trips, looping through each trip in order to calculate trip level information such as trip distance, trip time, etc. took several hours. The keys to our ultimately success were 1) sorting the data and 2) setting up variables and scripts that optimized Matlab’s powers of calculation. Below is are examples of the basic logic we used. The first step was to create a unique identifier for each trip, which we called ‘bus_run’. The combination of fields in the data that make up a unique trip are described earlier in the paper. The ‘unique’ function in Matlab not only identifies the unique values in a variable, it is also capable of indexing the first and last points in the data where each value occurs. In the script below, the indexed location is stored in the ‘low’ and ‘hi’ variables respectively. These two lines of code calculate in a matter of seconds. % create a unique trips variable [uniqueTrips,low,n] = unique(bus_run, 'first'); [uniqueTrips,hi,n] = unique(bus_run, 'last'); Now we can calculate the total dwell time (and several other variables) for each trip by summing up all of the dwells in each trip. The ordering of the data by trip and the fact that we know the location of the first and last occurance of each unique trip (‘low’ and ‘hi’ variables) enables us to loop through 54,000 trips and 6 million records in a matter of seconds. for x = 1:length(uniqueTrips) ave_load_perrun(x) = mean(estimated_load(low(x):hi(x))) ; total_dwell(x) = sum(dwell(low(x):hi(x))); total_ons(x) = sum(ons(low(x):hi(x))); total_offs(x) = sum(offs(low(x):hi(x))); end Others calculations do not require a for loop and can be made almost instantaneously. For example, the ‘starttime’ of each trip is equal to the ‘leave_time’ of the first occurance of each unique trip. As written below, this variable contains 54,311 records, each one indicating the time that the respective trip began its service. x = 1:length(uniqueTrips); starttime(x) = leave_time(low(x)); Similarly, the total trip time for each trip was calculated by subtracting the ‘leave_time’ on the last instance of a trip from the ‘leave_time’ of the first instance. Again, the result is a trip level variable with 54,311 records. Total trip mileage is calculated in a similar way. % total trip time per run triptime(x) = leave_time(hi(x)) - leave_time(low(x)); % total trip miles per run tripmiles(x) = train_mileage(hi(x)) - train_mileage(low(x));