1 Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling Yongping Zhang Kouros.

1 Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling Yongping Zhang Kouros Mohammadian, PhD Department of Civil and Materials Engineering University of Illinois at Chicago The 11th TRB National Transportation Planning Applications Conference May 7, 2007

2 Data Transferability The idea is to use data collected in one context in a new context. This can reduce or eliminate the need for a large data collection in the application context. Previous Studies ITE trip generation tables ITE trip generation tables NCHRP 365 (Nancy McGuckin, et al) NCHRP 365 (Nancy McGuckin, et al) Highly aggregate Highly aggregate ORNL’s NPTS/NHTS transferability study (Pat Hu, et al) ORNL’s NPTS/NHTS transferability study (Pat Hu, et al) Aggregate (CT level) Aggregate (CT level) Data simulation (Stopher and Greaves) Data simulation (Stopher and Greaves) Disaggregate (HH level), C&RT classification method, limited number of independent variables Disaggregate (HH level), C&RT classification method, limited number of independent variables

3 Project Approach Consider larger set of variables NHTS and CTPP datasets Use quantifiable variables that can be easily predicted or are available from other sources (e.g., PUMS) Consider variables representing Land-use, Urban form, and transportation system characteristics Consider variables representing Land-use, Urban form, and transportation system characteristics Advanced clustering, updating, and simulation approaches

4 Data Data Sources Data Sources 2001 NHTS, 2000 CTPP, PUMS, 2003 TTI, Tiger/Line GIS data files 2001 NHTS, 2000 CTPP, PUMS, 2003 TTI, Tiger/Line GIS data files Data Cleaning Data Cleaning 33 variables of demographics, socio-economics and land use 33 variables of demographics, socio-economics and land use Individual level: Age group, Race/Ethnicity, Education, Occupation Individual level: Age group, Race/Ethnicity, Education, Occupation Household level: HH size, Income, Adults, Vehicles, Drivers, Workers Household level: HH size, Income, Adults, Vehicles, Drivers, Workers Census tract level: Housing, Employment, and Population densities Census tract level: Housing, Employment, and Population densities New Variables New Variables

5 Intersection density (Tiger/Line) Intersection density (Tiger/Line) No. of intersections / Area Road density (Tiger/Line) Road density (Tiger/Line) Road length / Area Pedestrian environment (Tiger/Line) Pedestrian environment (Tiger/Line) Block size: Road length / No. of intersections Transit friendly environment (CTPP) Transit friendly environment (CTPP) Transit users / Total no. of workers Transit trips / Total no. of trips Congestion factor Congestion factor Travel time index (TTI report for 85 MSAs) Avg. travel time / Free flow TT in that region Avg. travel time / Free flow TT in that region New Variables

6 Dependent Variables Travel Characteristics (from NHTS trip file aggregated to HH level) VMT for each household VMT for each household No. of trips No. of trips No. of mandatory trips No. of mandatory trips No. of maintenance trips No. of maintenance trips No. of discretionary trips No. of discretionary trips No. of transit trips in the HH No. of transit trips in the HH No. of private vehicle trips No. of private vehicle trips No. of non-motorized (bicycles and walk) trips No. of non-motorized (bicycles and walk) trips No. of tours No. of tours Average trips per tour Average trips per tour Average trip distance in miles for all HH members Average trip distance in miles for all HH members No. of transit users in the HH No. of transit users in the HH No. of carpool users in the HH No. of carpool users in the HH Percentage of public transit usage in the HH Percentage of public transit usage in the HH Percentage of carpool usage among workers in the HH Percentage of carpool usage among workers in the HH Total commute distance in the HH Total commute distance in the HH Average commute distance in the HH Average commute distance in the HH

8 Clustering Classification schema is a critical issue Clustering methods tested include: K- Means, hierarchical, C&RT, TwoStep, ANN 11 clusters were generated using TwoStep clustering method ONLY national data is used

9 Clusters 1.Rich and Smart : middle age families middle age families professional or managerial white collar jobs professional or managerial white collar jobs graduate degrees graduate degrees high incomes high incomes majority live in suburbs. majority live in suburbs. greater part are White but also some Asian greater part are White but also some Asian 2.Young Achievers: Young couples without children or mainly with pre-school children Young couples without children or mainly with pre-school children college degrees college degrees white collar jobs in sales, service, technical, and professional white collar jobs in sales, service, technical, and professional mid-range income. mid-range income. higher percentages live in suburb or rural areas. higher percentages live in suburb or rural areas. 3.Kids-centered Families : middle aged and working class families middle aged and working class families pre-school and school age children pre-school and school age children usually have college education usually have college education mid-rage to high level income mid-rage to high level income primarily White and live in suburb or town primarily White and live in suburb or town

10 Clusters, cont. 4.Rural Blues : working class, middle aged families working class, middle aged families pre-school and school age children pre-school and school age children mainly high school graduates mainly high school graduates blue collar jobs (farming, manufacturing, etc) blue collar jobs (farming, manufacturing, etc) low to mid-range income low to mid-range income greater part are White and mainly live in rural area or small towns. greater part are White and mainly live in rural area or small towns. 5.Working Mixing Pot : working class White, Black, Asian, or Hispanic working class White, Black, Asian, or Hispanic single adults or couples single adults or couples college or high school education college or high school education low to mid-range income low to mid-range income 6.Mainstream Families: mid-scale, upper mid age, White mid-scale, upper mid age, White large working class couples or families with older children large working class couples or families with older children college or high school education college or high school education mid-range to high level income mid-range to high level income suburb or rural areas suburb or rural areas

11 7.Senior Couples : senior couples, senior couples, majority working and some are retired majority working and some are retired greater part is White but include some Black, Asian, or American- Indians greater part is White but include some Black, Asian, or American- Indians suburb or rural areas. suburb or rural areas. 8.Sustaining Minority Families: low income, low income, middle aged, working class families middle aged, working class families mainly Hispanic or Black but also some Asian and White mainly Hispanic or Black but also some Asian and White majority have not finished high school majority have not finished high school service, sales, manufacturing, farming, or construction jobs service, sales, manufacturing, farming, or construction jobs 9.Forever Youngs : White senior couples, empty nesters White senior couples, empty nesters mostly retired but some have sales, service, or managerial jobs mostly retired but some have sales, service, or managerial jobs low to mid-range income low to mid-range income Clusters, cont.

12 10.Traditional Seniors: mainly retired single individuals and some retired couples mainly retired single individuals and some retired couples low income. low income. majority are White but some Black, Asian, or American-Indians majority are White but some Black, Asian, or American-Indians 11.Neo Urbans : Small families/couples or single individuals Small families/couples or single individuals dense urban areas dense urban areas college education college education low to mid-range income low to mid-range income sales, service, or professional jobs sales, service, or professional jobs dominant race is White but a significant number are Black, Asian, and Hispanic dominant race is White but a significant number are Black, Asian, and Hispanic Clusters, cont.

13 Cluster-Based Travel Characteristics

15 Transferability An ANN model (with genetic algorithm) is used to simulate cluster membership as a function of 11 factors for each HH in add-on datasets The model has 92.4% prediction potential Travel characteristics are transferred from national clusters to add-on data according to their cluster membership Weighted observed and Predicted travel characteristics are compared

16 Comparison of Weighted Trip Count per Person

17 Comparison of Weighted Mandatory Trips per Person

18 Original Comparison of Transit Usage Not so good! some clusters need improvement Compared to No. of Trips, the prediction of transit usage is not so good. Cluster 5,8,10,11 show significant difference and need improvement.

19 Improvement to Clusters Using C&RT 1. The first level of tree is grown upon the difference of the No. of vehicles in the household (own vehicle or not). 2. Improvement of the model due to this level is defined by improvement/(Variance of Node 0). 3.For example, here 0.0017 equals to 13.3%, and 0.009 equals to 7.05% and 0.0002 equals to 1.57%. 4.Total model improvement is about 22%.

20 Nice match shown! however, not always the case. How to improve the transferability? Considering Distributions: Trip Rate

21 Considering Distributions: Trip Distance Not So Good! Needs to be improved

22 Considering Distributions: Various distributions were fitted to the dataset including: Normal, Gamma, Weibull, Exponential, Max Extreme, Lognormal, Logistic, Student’s t, Min Extreme, Triangular, General Beta, Pareto, Uniform, Binomial, Geometric, Hyper Geometric, and Poisson. Normal, Gamma, Weibull, Exponential, Max Extreme, Lognormal, Logistic, Student’s t, Min Extreme, Triangular, General Beta, Pareto, Uniform, Binomial, Geometric, Hyper Geometric, and Poisson. The fitting results are interpreted by examining the rankings of the three fit statistics examining the rankings of the three fit statistics A-D, K-S, and Chi-squared statistics visually judging of plots, density and cumulative curves visually judging of plots, density and cumulative curves p-value and critical values at different sig. levels. p-value and critical values at different sig. levels. Non-normal distributions are dominant (e.g., Gamma)

23 Gamma Distribution PDFCDF Gamma function: k > 0 is the shape parameter θ > 0 is the scale parameter the location parameter determines where the origin is located

24 Fitted Distribution with Parameters for each Variable by Cluster

26 Bayesian Updating Local updating can significantly improve the quality of the transferred data Used Bayesian updating Traditionally in transferability literature only variables with normal distributions have been studied due to the simplicity in calculation of posterior from normal prior and likelihood. Traditionally in transferability literature only variables with normal distributions have been studied due to the simplicity in calculation of posterior from normal prior and likelihood. In practice, the variables of interest (i.e., the likelihood) can take various distributional forms. In practice, the variables of interest (i.e., the likelihood) can take various distributional forms.

27 f(x|θ) is the probability function for the observed data x (i.e., local sample), given the unknown parameter θ, g(θ) is the prior distribution for θ, k(θ|x) is the posterior distribution for θ given observed data x The technique can be expanded to situations when no prior data is available. The analyst can do successive updating, using the new information without losing the gains from the old one. using the new information without losing the gains from the old one. Bayesian Updating

28 Bayesian Updating (2) The National sample of NHTS 2001 is used as the source for the prior information A small local sample is randomly selected from the NY add-on, leaving the rest for validation Bootstrap method is used to resample the data and justify the prior distribution assumptions of parameters of interest (i.e., scale and shape for Normal distribution), Normal distribution is fitted to each of the resample datasets.

29 Bayesian Updating (3) Then, Markov Chain Monte Carlo (MCMC) simulation with Gibbs Sampling is utilized to update the prior with the small local sample. Assuming the updated variables of interest are still Gamma distributed, the posterior of parameters are used to derive the updated means and SD of the variables. Updated parameters are then compared with the validation data and national data to test the effectiveness of the updating procedure. The comparisons prove that significant improvement is achieved. The improvement increases with the local sample size a relatively cost-effective sample size is suggested a relatively cost-effective sample size is suggested

30 Root Mean Square Error (RMSE) decreases with the increase of sample size. There is instability when the sample size within each cluster is smaller than 45 observations. A sample size of 75 per cluster seems to be the most cost-effective plan.

31 Updating Results Updated mean values are significantly improved towards validation data.

32 Trip Rates per Person Cluster NationalNational-updatedState of New York LocationShapeScaleMeanLocationShapeScaleMeanLocationShapeScaleMean 2-0.835.420.883.94-0.835.150.923.91-0.303.471.143.66 3-3.1312.310.614.38-3.1312.050.614.22-1.668.440.673.99 4-0.996.420.773.95-0.996.050.803.85-0.424.430.893.53 8-0.133.141.153.48-0.132.901.123.130.182.401.243.16 110.042.521.473.750.042.441.453.580.322.201.403.39 Trip Distance per Person Cluster NationalNational-updatedState of New York LocationShapeScaleMeanLocationShapeScaleMeanLocationShapeScaleMean 2-0.091.4521.2830.67-0.091.3421.0428.10-0.071.3220.8427.33 3-0.491.6818.9131.18-0.491.6218.9330.180.111.5319.3129.62 4-0.221.6118.5529.59-0.221.4519.9828.75-0.021.3020.5926.67 5-0.091.2024.9329.93-0.091.2024.0328.84-0.091.1923.9728.36 6-0.431.9118.1234.18-0.431.8918.2234.01-0.081.5821.4033.69 70.111.4822.6933.580.111.5421.6933.51-0.081.5220.7531.55 8-0.121.0624.0825.38-0.121.0324.0324.63-0.090.9022.9120.53 9-0.091.1621.4324.72-0.091.1622.2325.65-0.031.1722.1725.91 Summary of Updating Results

34 Population Synthesizing and Travel Data Simulation Using PUMS Data, NYC population is synthesized. All of the contextual factors were calculated for each HH. Synthetic population with all required 33 variables was generated. Using the ANN model, cluster memberships are obtained. Travel data are simulated for each HH using Monte Carlo simulation of each travel attribute with updated parameters of the fitted distributions.

35 Comparison of Simulated and Add-on NYC Samples (Trips per Person)

36 Comparison of Simulated and Add-on NYC Samples (Trip Distance per Person)

1 Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling Yongping Zhang Kouros.

Similar presentations

Presentation on theme: "1 Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling Yongping Zhang Kouros."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling Yongping Zhang Kouros.

Similar presentations

Presentation on theme: "1 Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling Yongping Zhang Kouros."— Presentation transcript:

Similar presentations

About project

Feedback