Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boilé M., M.M. Golias, & S. Ivey. Contents Introduction Motivation Case study.

Similar presentations


Presentation on theme: "Boilé M., M.M. Golias, & S. Ivey. Contents Introduction Motivation Case study."— Presentation transcript:

1 Boilé M., M.M. Golias, & S. Ivey

2 Contents Introduction Motivation Case study

3 Introduction Freight demand modeling & Regression techniques One of the best and worst tools we have Problems come from: Data Misleading performance measures When data is limited regression techniques cannot perform well (after all they are pattern recognition techniques) Even worse sometimes we rely on training-based measures of performance

4 Typical regression Input We want to predict Y We believe that a number of known inputs X can predict Y based on a function Black Box (Or Not) Run an algorithm to select which of the variables we selected are actually meaningful and what the parameters of the function Output Obtain a model and performance measures Use the model

5 Although not really the case we assume that X is a linear function of Y Is this assumption correct? Given our usual data availability non-linear models will not (necessarily) perform better What are the best inputs? Two mentalities: Throw in what ever you can find Use some rational Input We want to predict Y We believe that a number of known inputs X can predict Y based on a function

6 There are a number of algorithms for regression Most of them select some of the X’s (variable selection) Some of them add constraints to the Y’s (constraint regression) Some of them add constraints to the effect of X’s (shrinkage techniques) Black Box (Or Not) Run an algorithm to select which of the variables we selected are actually meaningful and what the parameters of the function

7 Two main measures of performance: What is the error of the model (R 2 )? Is the model and input significant (p-values)? When many independent variables are used, variable selection techniques can lead to models with high R 2 Some accept performance measures based on data used to train the model (not such a good idea) Some use what is called a hold out sample (more appropriate) Output Obtain a model and performance measures Use the model

8 Data, data, data Selection of input: we need data Performance measures: we need data Testing of the model: we need data So what can we do when we have limited data? Simulation looks like a good approach that has worked in other areas

9 Markov Chain Monte Carlo Simulation Typical regression linear model with selection. Close form solution using some heuristic (e.g. backward selection, forward selection) instead of going through all the possible subsets Instead of a closed form solution we can assume prior and posterior distributions for the variables (we can also do that for the parameters but lets talk about that some other time) and use simulation (more precise MCMC simulation) Why use simulation? Integrals are intractable MCMC simulation to go from the priors to the posteriors Is it better? One way to find out!

10 Case Study Prediction of truck volumes on state highways in New Jersey Major Assumption Truck volumes can be predicted given socioeconomic data surrounding the highway

11 Case study: Data Dependent dataset : 270 locations throughout NJ (long and short duration classification counts ) Long duration counts: Weight-In-Motion (WIM) locations Short duration vehicle classification counts Vehicle class 5 through 13 (FHWA classification) 34 Independent variables: Population Number of employees (11 SIC codes) Sales volume (11 SIC codes) Number of establishments (11 SIC codes)

12 Case Study: Traffic counts by roadway class Functional Class (FC) Counts (#Observations) A: 1,2 (Rural interstate and major arterials)31 B: 6, 7, 8, 9 (Rural minor arterials, collectors, and local)51 C: 11 (Urban interstate)29 D: 12 (Urban expressways and parkways)20 E: 14 (Urban major arterials)59 F: 16, 17, 19 (Urban minor arterials, collectors, and local)80 Table 1. Clustered Dataset by Highway FC and Count Availability

13 Case Study: Bandwidth of sections Uniform highway sections Major interchanges, roadway functionality, geometry Nine different bandwidths (0.25, 0.50, 0.75, 1.0, 1.25, 1.5, 2, 3 and 5 miles) Nine different models were estimated, for each FC Different models =>sensitivity with increasing size of the area

14 Model What do we want to achieve: 1.Select the most appropriate X’s out of a pool of candidate predictors 2.Constrain the values of Y 3.Constrain the influence of the selected X’s A priori non of the variables can explain truck volumes The depended variable can only take positive values Diffuse priors

15 Results Bayesian model (BRM) Stepwise linear regression (SLR) Statewide model (4-step planning model) (SWTM) Cross-validation with a 90% - 10% estimation-validation dataset split R 2 Values

16 Usability for Practitioner BUGS: The best thing since sliced bread!!! Its free and easy to use http://www.mrc- bsu.cam.ac.uk/bugs/ winbugs/contents.sh tml http://www.mrc- bsu.cam.ac.uk/bugs/ winbugs/contents.sh tml

17 Boilé M., M.M. Golias, & S. Ivey


Download ppt "Boilé M., M.M. Golias, & S. Ivey. Contents Introduction Motivation Case study."

Similar presentations


Ads by Google