What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13.

Slides:



Advertisements
Similar presentations
Tracking the Money. The Government Performance and Results Act (1993) Value-added outcomes (not outputs) Performance Measurement GPRA requires that each.
Advertisements

National Center for Higher Education Management Systems 3035 Center Green Drive, Suite 150 Boulder, Colorado Using Education to Make the Most of.
What are Wage Records? Wage records are an administrative database used to calculate Unemployment Insurance benefits for employees who have been laid-off.
Situational Scan - Advancing Arizona’s Educational Attainment AzAIR Conference - Prescott April 4, 2008.
PROJECT DESCRIPTION & GOALS The Trip Demand Synthesis Process and the aTaxi + Rail Transit Mobility Concepts By Julia Phillips Hill Wyrough.
Q Homeowner Confidence Survey Results Feb. 18, 2009.
Demographic Trends and the Education Pipeline: Implications for Educating Latinos for the Future of America.
Tutorial 6 & 7 Symbol Table
GEOG 111/211A Transportation Planning UTPS (Review from last time) Urban Transportation Planning System –Also known as the Four - Step Process –A methodology.
Texas & San Antonio: Characteristics and Trends of the Hispanic Population KVDA Telemundo November 10, 2011 San Antonio, TX.
© John M. Abowd 2005, all rights reserved Sampling Frame Maintenance John M. Abowd February 2005.
Employment in the United States Where do Americans Work? In 2008, there were 144 million workers in the American Labor Force Small Business 34.86% Large.
Comparison of Cell, GPS, and Bluetooth Derived External Data Results from the 2014 Tyler, Texas Study 15 th TRB National Transportation Planning Conference.
Your Community by the Numbers Accessing the most current and relevant Census data Alexandra Barker Data Dissemination Specialist U.S Census Bureau New.
Census Basics UP206A: Introduction to GIS. History When was the first census? – 1790 How many people were counted? – 3.9 million How many states did we.
An Experimental Procedure for Mid Block-Based Traffic Assignment on Sub-area with Detailed Road Network Tao Ye M.A.Sc Candidate University of Toronto MCRI.
KY 4/22 Module 1b Chapter 3 in the TS Manual Main Survey Types.
Baltimore City African American Middle Class Analysis and Metrics Matthew Kachura Program Manager BNIA-JFI, University of Baltimore January 10, 2008.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
Lowry Model Pam Perlich URBPL 5/6020 University of Utah.
Chart 6.1: National Health Expenditures as a Percentage of Gross Domestic Product and Breakdown of National Health Expenditures, 2011 Source: Centers.
Evaluating the Concept of a Jobs- Housing Balance through LA County Diana Gonzalez Gonzalez 3/19/2012.
U.S. DEPARTMENT OF EDUCATION. Title I - Part A In a nutshell….a primer.
Prepared by: © 2012 Command Spanish ®, Inc., 1  Hispanics are persons whose speech, customs or cultural heritage pertain to, or derive from, any of.
“How To” Derive the Economic Impact of the Health Sector.
Chart 6. 12: Impact of Community Hospitals on U. S
The Retirement Prospects of Immigrants: Getting Worse? Presentation to PMC Winnipeg Node Meeting September 29, 2009 Derek Hum Wayne Simpson.
James Palma Maryland State Data Center Maryland Department of Planning 301 West Preston Street, Suite 702 Baltimore, Maryland September 20, 2010.
Liesl Eathington Iowa Community Indicators Program Iowa State University October 2014.
American Factfinder Workshop Nola du Toit Spring 2007.
REVIEW OF NEW DATA FROM THE AMERICAN COMMUNITY SURVEY ON RATES OF INSURANCE AND INCOME DISTRIBUTION FOR ALASKA NATIVES AND AMERICAN INDIANS 33.
Selected Data for West Virginia Higher Education J. Michael Mullen WVFAA November 6, 2003.
This chart compares the percentage of cases filed in Maine under chapter 13 with the national average between 1999 and As a percent of total filings,
Fasten your seatbelts we’re off on a cross country road trip!
Copyright 2010, The World Bank Group. All Rights Reserved. Tourism statistics, 1 Business Statistics and Registers 1.
1 Activity Based Models Review Thomas Rossi Krishnan Viswanathan Cambridge Systematics Inc. Model Task Force Data Committee October 17, 2008.
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
UP206A: Introduction to GIS. » When was the first census? ˃1790 » How many people were counted? ˃3.9 million » How many states did we have then? ˃13 original.
The Relation between Jobs-Housing Ratio and Commuting in LA County Diana Gonzalez Gonzalez 2/13/2012.
The Uninsured in Alameda County 2010 December 2010.
Figure 1. Growth of HSA/HDHP Enrollment from March 2005 to January Source: 2010 AHIP HSA/HDHP Census.
VMT Reduction Programs: Time for a Change? Stacey Bricka, PhD, NuStats 12 th TRB Planning Applications Conference Products of Your.
Selected Data for West Virginia Higher Education National Center for Higher Education Management Systems Presented on June 4, 2003 National Collaborative.
A-38 Table 5.1: Total Number of Active Physicians (1) per 1,000 Persons by State, 2007 and 2008 Source: National Center for Health Statistics. (2011).
Farm Definition Any place from which $1,000 of agricultural products were produced and sold, or normally would have been sold, during the census.
Income-Based Work Trip Stratification within the Puget Sound Regional Council Travel Model Framework 20 th International Emme Users’ Conference Montreal,
Get your hands dirty cleaning data European EMu Users Meeting, 3rd June. - Elizabeth Bruton, Museum of the History of Science, Oxford
SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY DTA Anyway: Code Base & Network Development Lisa Zorn DTA Peer Review Panel Meeting July 25 th, 2012.
Backcasting United Nations Statistics Division. Overview  Any change in classifications creates a break in time series, since they are suddenly based.
TYBEE ISLAND TOURISM STUDY, OUTLINE 1.Introduction 2.Survey of Tybee Island Visitors 3.Visitor Expenditure Patterns 4.Estimated Annual Visitation.
Demographic Analysis Update This presentation is released to inform interested parties of research and to encourage discussion. Any views expressed.
Journey to Work from 1990 Census and ACS National test (C2SS) Elaine Murakami, USDOT, FHWA Nanda Srinivasan, Cambridge Systematics Inc.
March 13 th, 2007 by Indraneel Kumar, AICP; Spatial and GIS Analyst Christine Nolan, Senior Associate Purdue Center for Regional Development Purdue University.
PERCENTAGE OF U.S. RESIDENTS WHO HAVE OBTAINED A BACHELORS DEGREE, 2010 D. C. = 50.1% MASSACHUSETTS = 39.0 COLORADO = 36.4 MARYLAND = 36.1 CONNECTTICUT.
VerdierView Graph # 1 OVERVIEW Problems With State-Level Estimates in National Surveys of the Uninsured Statistically Enhancing the Current Population.
Attractiveness Mapping Modeling Land Use Preference.
Impact of Aging Population on Regional Travel Patterns: The San Diego Experience 14th TRB National Transportation Planning Applications Conference, Columbus.
Travel Model Validation - Key Considerations - Presented to Iowa DOT Peer Review 31 March 2004.
3,437 miles 71.6 miles / day 48 days of biking 9 build days + 9 days off__________ 9 weeks 18 whole way riders 39 segment riders + 47 day riders_______.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Kevin A Henry, Ph.D New Jersey Cancer Registry Cancer Epidemiology Services Frank Boscoe, Ph.D New York State Cancer Registry Estimating the accuracy of.
INFO 7470 Statistical Tools: Edit and Imputation Examples of Multiple Imputation John M. Abowd and Lars Vilhuber April 18, 2016.
Finding and Mapping Census Data Kathleen Fear, Data Librarian Blair Tinker, GIS Research Specialist.
By Alain L. Kornhauser, PhD Professor, Operations Research & Financial Engineering Director, Program in Transportation Faculty Chair, PAVE (Princeton Autonomous.
Chart 6. 12: Impact of Community Hospitals on U. S
Project description & goals
Identifying Worker Characteristics Using LEHD and GIS
Ohio Traffic Forecasting Manual
Presentation transcript:

What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13

The Process  Generate Schools  Generate Employee Patronage File  Assign Patronage  Generate Patronage-Employee Ratios  A Look at the Data  Generate Census File (with Microsoft Access)  NN Files through 7 NJ Modules by Jake and Talal  Trip File Generator: Out-of-State commuters, students, workplace assignment, 18 Tour Type (Activity Patterns) assignment, Temporal Dimension

Roadmap  Schools Data  Employee-Patronage Data  A Look at the Data  Census Data  Further Steps

Schools Data

Public Schools in the US

Quick stats on Public Schools (2011) School Type# of CHARTER# of PUBLICTotal Primary 2,584 51,79354,377 Middle ,33216,947 High 1,316 19,76221,078 Other 1,145 5,8476,992 No Answer 564 3,5254,089 Total6,22497,259103,483

Public Schools: Enrollment School TypeCHARTERPUBLICTotal Primary 896,544 23,226,606 24,123,150 Middle 166,519 9,425,155 9,591,674 High 368,109 13,767,489 14,135,598 Other 626,562 1,289,050 1,915,612 No Answer (1,128) (7,016) (8,144) Total 2,056,606 47,701,284 49,757,890

Private Schools in the US TypeNumber of Schools Primary18,400 Secondary2,517 Combined7,300 Total28,217

Private Schools: Enrollment Type# students Primary 2,134,007 Secondary 738,600 Combined 1,431,252 Total4,303,859

Private Schools: School Size

Post-secondary schools (2009) Institution type# of Students Enrolled# of students as percent totalNumber of Schools Graduate2910%350 Primarily Baccalaureate1,483,01893%2,169 Primarily Non-Bacc53,9033%623 Associate's49,2633%1,745 Nondegree-granting postbac170%14 Nondegree-granting pre-bac10,9601%2,698 Total1,597,452100%7,735

Employee-Patronage Data

The Process  2012 InfoGroup US Businesses File (5.80 GB)  30 CSV files with 500,000 entries (~200MB) – Shell Script  30 CSV files with patronage generation and data cleaning and mapping (~115MB) – R Script  1570 Segmented State Files (1KB to 20MB) – R Script  51 Merged State Files (8MB to 390MB) – Python Script

Patronage Generation  Previous Process – Manual Fine-Tuning  Inconsistent: Same NAICS Code, Different Patronage/Employee Ratio  Current Process – Employee Size Range, Sales Volume Range  Not Perfect Data  Matching businesses (Zip, County, NAICS, Latt/Long)  Same Employee Size Range  Assumption: Sales Volume same across time  Trying to acquire the 2005 Data for better correlations  Ratios from Averaging Previous EP file

Comparison: Distributions

Conclusion: Need to use NAICS Codes, in addition A large number of 0-1 ratio values are offset by the Therefore, we get a surge averages of around 4-5. Difficult to capture nuances with just employee size and sales volume. Next Steps: Man-Power needed to assign ratio for each NAICS Code, Sales Volume, Employee Size combination

A Look at the Data

NJ Counties (Change in NJ EP File) UncensoredUn-Named Removed

NJ Wide UncensoredUn-named Removed  No Businesses +73,500  Tot Emp +4.8M  Emp Size  Tot Patrons -4.9M  Avg Patrons  No Businesses +39,350  Tot Emp +4.8M  Emp Size  Tot Patrons -5.3M  Avg Patrons

Nation-Wide RankState Sales VolumeNo. Businesses Total Employees Avg Employee SizeTotal Patrons Average Patrons 1California$1,8891,579,34223,518, ,820, Texas$2,115999,33117,624, ,846, Florida$1,702895,58612,331, ,231, New York$1,822837,77318,327, ,610, Pennsylvania$2,134550,67810,498, ,704, New Jersey$1,919428,5968,833, ,986, Washington DC$1,31749,4885,702, ,067, Rhode Island$1,81446,5031,117, ,201, North Dakota$1,97844,518492, ,021, Delaware$2,10841,296670, ,011, Vermont$1,55439,230379, , Wyoming$1,67935,881340, ,

Census Data

Inputs  2010 Census Summary File 1   Does not convert to CSV/TXT; Files made for MS Access  Process Tables (P12, P16, P29, H13, P43) with Talal’s VBA macro in MS Access (p.78)  VBA Code – whereabouts unknown, perhaps with Prof K  Year Census American Community Survey   Income Data to assign incomes to households and residents

Generation  Module 1 – Outputs resident file for each county in state  Rows: Individual People  Attributes/Columns: County Number (replace with State Number_County Number for national file), Household ID, Household Type, Latt/Long, ID Number, Age, Sex, Traveler Type, Income Bracket  Module 2 – Out of state/region/nation nodes  For commenting on code, go to p  esizer_v.1.pdf esizer_v.1.pdf

Further Steps

What To Do Next?  Patronage Generation with NAICS, Sales Volume, Employee Size and Research – Low Difficulty  I already generated a file mapping all NAICS and employment counts along with payrolls for patronage assignment using 2010 Census Data (200K entries)  Census Data Generation and Rework NN Generation Modules – High Difficulty  Optional: Data Verification for Employee-Patronage Files

Modules  Very hard-coded for NJ; not very well-commented  Initial National Implementation Ideas:  Treat US as one entity with external nodes at airports to represent foreigners  Problem: Computationally intensive for 330M people  Solution: Do a semi-randomized sample  Regionalize the US and use out-of-region external nodes  Less labor-intensive and parallel processing  Doing each state  Problem: Hard to generalize code, out-of-state nodes  Extremely labor-intensive

The Code: Thought Process  Trips generated state-by-state  Use state-level demographic information on residents  Ignore state-level boundaries since we have employer and attraction information for the nation.  Example:  John Smith lives in NYC and works in CT.  We will get his household from NYC Census file and the probability distribution of workplace in CT E-P file.  When we map NYC Trips, we will see John Smith going to CT for work. When we map CT Trips, we will see John Smith returning from work.  Trip destinations can be approximated using destination county centroids  Requires assigning centroid to each county

The Code: Thought Process  Workplace assignment (without replacement):  Census maps individuals to workplace  John Smith lives in NYC and works in CT  Use distribution to match workplace to E-P file (keep a count of employees to match the number given)  John Smith mapped to an employer in CT  If more than x (e.g. 250) miles, assume arrival at airport  School Assignment (without replacement):  Use bounds and distribution to match students with schools (assume same county)  Jane (8) is mapped to elementary school in her county

The Code: Thought Process  Tour Type assignment and Temporal Dimension  Can try to repurpose Talal’s code  Add in Time Zones in Temporal Dimension  Can do this with replacement (patrons)  Assumptions: Same behavior across states in terms of work time and leisure time and activity patterns  Out-of-Country Commuters / Non-Resident Workers  International nodes for the states along the Canadian and Mexican borders  Trip to the nearest border crossing