Presentation is loading. Please wait.

Presentation is loading. Please wait.

What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13.

Similar presentations


Presentation on theme: "What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13."— Presentation transcript:

1 What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13

2 The Process  Generate Schools  Generate Employee Patronage File  Assign Patronage  Generate Patronage-Employee Ratios  A Look at the Data  Generate Census File (with Microsoft Access)  NN Files through 7 NJ Modules by Jake and Talal  Trip File Generator: Out-of-State commuters, students, workplace assignment, 18 Tour Type (Activity Patterns) assignment, Temporal Dimension

3 Roadmap  Schools Data  Employee-Patronage Data  A Look at the Data  Census Data  Further Steps

4 Schools Data

5 Public Schools in the US

6 Quick stats on Public Schools (2011) School Type# of CHARTER# of PUBLICTotal Primary 2,584 51,79354,377 Middle 615 16,33216,947 High 1,316 19,76221,078 Other 1,145 5,8476,992 No Answer 564 3,5254,089 Total6,22497,259103,483

7 Public Schools: Enrollment School TypeCHARTERPUBLICTotal Primary 896,544 23,226,606 24,123,150 Middle 166,519 9,425,155 9,591,674 High 368,109 13,767,489 14,135,598 Other 626,562 1,289,050 1,915,612 No Answer (1,128) (7,016) (8,144) Total 2,056,606 47,701,284 49,757,890

8 Private Schools in the US TypeNumber of Schools Primary18,400 Secondary2,517 Combined7,300 Total28,217

9 Private Schools: Enrollment Type# students Primary 2,134,007 Secondary 738,600 Combined 1,431,252 Total4,303,859

10 Private Schools: School Size

11 Post-secondary schools (2009) Institution type# of Students Enrolled# of students as percent totalNumber of Schools Graduate2910%350 Primarily Baccalaureate1,483,01893%2,169 Primarily Non-Bacc53,9033%623 Associate's49,2633%1,745 Nondegree-granting postbac170%14 Nondegree-granting pre-bac10,9601%2,698 Total1,597,452100%7,735

12 Employee-Patronage Data

13 The Process  2012 InfoGroup US Businesses File (5.80 GB)  30 CSV files with 500,000 entries (~200MB) – Shell Script  30 CSV files with patronage generation and data cleaning and mapping (~115MB) – R Script  1570 Segmented State Files (1KB to 20MB) – R Script  51 Merged State Files (8MB to 390MB) – Python Script

14 Patronage Generation  Previous Process – Manual Fine-Tuning  Inconsistent: Same NAICS Code, Different Patronage/Employee Ratio  Current Process – Employee Size Range, Sales Volume Range  Not Perfect Data  Matching businesses (Zip, County, NAICS, Latt/Long)  Same Employee Size Range  Assumption: Sales Volume same across time  Trying to acquire the 2005 Data for better correlations  Ratios from Averaging Previous EP file

15 Comparison: Distributions

16 Conclusion: Need to use NAICS Codes, in addition A large number of 0-1 ratio values are offset by the 7-20. Therefore, we get a surge averages of around 4-5. Difficult to capture nuances with just employee size and sales volume. Next Steps: Man-Power needed to assign ratio for each NAICS Code, Sales Volume, Employee Size combination

17 A Look at the Data

18 NJ Counties (Change in NJ EP File) UncensoredUn-Named Removed

19 NJ Wide UncensoredUn-named Removed  No Businesses +73,500  Tot Emp +4.8M  Emp Size +7.85  Tot Patrons -4.9M  Avg Patrons -17.17  No Businesses +39,350  Tot Emp +4.8M  Emp Size +9.09  Tot Patrons -5.3M  Avg Patrons -16.29

20 Nation-Wide RankState Sales VolumeNo. Businesses Total Employees Avg Employee SizeTotal Patrons Average Patrons 1California$1,8891,579,34223,518,02214.8936,820,12923.31 2Texas$2,115999,33117,624,23517.6424,846,69524.86 3Florida$1,702895,58612,331,52413.7721,231,86423.71 4New York$1,822837,77318,327,93321.8819,610,81323.41 5Pennsylvania$2,134550,67810,498,44219.0613,704,90324.89 9New Jersey$1,919428,5968,833,89020.619,986,52923.30 45Washington DC$1,31749,4885,702,617115.231,067,93821.58 47Rhode Island$1,81446,5031,117,14024.021,201,12425.83 48North Dakota$1,97844,518492,54711.061,021,07722.94 49Delaware$2,10841,296670,62216.241,011,40024.49 50Vermont$1,55439,230379,2919.67821,19320.93 51Wyoming$1,67935,881340,3429.49772,09021.52

21 Census Data

22 Inputs  2010 Census Summary File 1  http://www2.census.gov/census_2010/04-Summary_File_1/ http://www2.census.gov/census_2010/04-Summary_File_1/  Does not convert to CSV/TXT; Files made for MS Access  Process Tables (P12, P16, P29, H13, P43) with Talal’s VBA macro in MS Access (p.78)  VBA Code – whereabouts unknown, perhaps with Prof K  2012 5-Year Census American Community Survey  http://www2.census.gov/acs2012_5yr/summaryfile/ http://www2.census.gov/acs2012_5yr/summaryfile/  Income Data to assign incomes to households and residents

23 Generation  Module 1 – Outputs resident file for each county in state  Rows: Individual People  Attributes/Columns: County Number (replace with State Number_County Number for national file), Household ID, Household Type, Latt/Long, ID Number, Age, Sex, Traveler Type, Income Bracket  Module 2 – Out of state/region/nation nodes  For commenting on code, go to p.17-19  http://www.princeton.edu/~alaink/Orf467F12/MuftiTripSynth esizer_v.1.pdf http://www.princeton.edu/~alaink/Orf467F12/MuftiTripSynth esizer_v.1.pdf

24 Further Steps

25 What To Do Next?  Patronage Generation with NAICS, Sales Volume, Employee Size and Research – Low Difficulty  I already generated a file mapping all NAICS and employment counts along with payrolls for patronage assignment using 2010 Census Data (200K entries)  Census Data Generation and Rework NN Generation Modules – High Difficulty  Optional: Data Verification for Employee-Patronage Files

26 Modules  Very hard-coded for NJ; not very well-commented  Initial National Implementation Ideas:  Treat US as one entity with external nodes at airports to represent foreigners  Problem: Computationally intensive for 330M people  Solution: Do a semi-randomized sample  Regionalize the US and use out-of-region external nodes  Less labor-intensive and parallel processing  Doing each state  Problem: Hard to generalize code, out-of-state nodes  Extremely labor-intensive

27 The Code: Thought Process  Trips generated state-by-state  Use state-level demographic information on residents  Ignore state-level boundaries since we have employer and attraction information for the nation.  Example:  John Smith lives in NYC and works in CT.  We will get his household from NYC Census file and the probability distribution of workplace in CT E-P file.  When we map NYC Trips, we will see John Smith going to CT for work. When we map CT Trips, we will see John Smith returning from work.  Trip destinations can be approximated using destination county centroids  Requires assigning centroid to each county

28 The Code: Thought Process  Workplace assignment (without replacement):  Census maps individuals to workplace  John Smith lives in NYC and works in CT  Use distribution to match workplace to E-P file (keep a count of employees to match the number given)  John Smith mapped to an employer in CT  If more than x (e.g. 250) miles, assume arrival at airport  School Assignment (without replacement):  Use bounds and distribution to match students with schools (assume same county)  Jane (8) is mapped to elementary school in her county

29 The Code: Thought Process  Tour Type assignment and Temporal Dimension  Can try to repurpose Talal’s code  Add in Time Zones in Temporal Dimension  Can do this with replacement (patrons)  Assumptions: Same behavior across states in terms of work time and leisure time and activity patterns  Out-of-Country Commuters / Non-Resident Workers  International nodes for the states along the Canadian and Mexican borders  Trip to the nearest border crossing


Download ppt "What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13."

Similar presentations


Ads by Google