Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State University.

Slides:

Advertisements

Similar presentations

Using American FactFinder John DeWitt Project Manager Social Science Data Analysis Network Lisa Neidert Data Services Population Studies Center.

Advertisements

Evaluating Community Analyst for Use in School Demography Studies Richard Lycan and Charles Rynerson Population Research Center Portland State University.

Grid Based School Enrollment Forecasting Richard Lycan – Institute on Aging Charles Rynerson – Population Research Center Portland State University Portland.

Psychometric Aspects of Linking Tests to the CEF Norman Verhelst National Institute for Educational Measurement (Cito) Arnhem – The Netherlands.

Forecasting Using the Simple Linear Regression Model and Correlation

Class prep Go to S:\classes\UEP_ENV Copy whole folder “American Community Survey Error Exploration” to your Desktop Make writable: right-click on folder.

Older Moms Deliver. How Increased Births to Older Mothers Are Impacting School Enrollment Richard Lycan and Charles Rynerson Population Research Center.

Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.

Chuck Humphrey Data Library University of Alberta.

A Language Analysis of Atlantic County, NJ using ACS The following “language analysis” takes a spatial look at the diverse languages spoken by Atlantic.

11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.

1 Case Study 1: How to Deal with Estimates with Low Reliability 2009 Population Association of America ACS Workshop April 29, 2009.

Neighborhood Walkability and Bikeability Andrew Rundle, Dr.P.H. Associate Professor of Epidemiology Mailman School of Public Health Columbia University.

Hiroyuki KITADA, Yumi SEKINE National Statistics Center Japan.

More Raster and Surface Analysis in Spatial Analyst

Simple Linear Regression

1 U.S. Census Bureau Data Availability for Geographic Areas March 25, 2008.

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.

Clustered or Multilevel Data

The process of [social research theory/model/framework conceptual relationships hypotheses working hypotheses and measurement research design data collection.

Linear Regression and Correlation Analysis

More on Asset Allocation

Chapter 13 Introduction to Linear Regression and Correlation Analysis

The Analysis of Variance

© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.

Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.

Descriptive Spatial Analysis

Fundamentals of GIS Lecture Materials by Austin Troy except where noted © 2008 Lecture 14: More Raster and Surface Analysis in Spatial Analyst Using.

Your Community by the Numbers Accessing the most current and relevant Census data Alexandra Barker Data Dissemination Specialist U.S Census Bureau New.

Moving from the Census to the American Community Survey Richard Lycan Population Research Center Portland State University North American Cartographic.

Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.

David Card, Carlos Dobkin, Nicole Maestas

Diane Stockton Trend analysis. Introduction Why do we want to look at trends over time? –To see how things have changed What is the information used for?

Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.

U.S. Decennial Census Finding and Accessing Data Summer Durrant October 20, 2014 Data & Geographical Information Librarian Research Data Services

Older Moms Deliver. How Increased Births to Older Mothers Are Impacting School Enrollment Richard Lycan and Charles Rynerson Population Research Center.

American Factfinder Workshop Nola du Toit Spring 2007.

Chap 1-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Business Statistics: A First Course 6 th Edition Chapter 1 Introduction.

Exploring Error and the American Community Survey.

1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009.

Sample size vs. Error A tutorial By Bill Thomas, Colby-Sawyer College.

The ACS and the 2010 Census Richard Lycan and Charles Rynerson Population Research Center Portland State University GIS in Action March, 2011.

Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.

UP206A: Introduction to GIS. » When was the first census? ˃1790 » How many people were counted? ˃3.9 million » How many states did we have then? ˃13 original.

Using the ACS: Issues with studying small areas and change over time Presented to Association of Public Data Users January 20, 2011.

Using ArcView to Create a Transit Need Index John Babcock GRG394 Final Presentation.

Data and Social Research Chuck Humphrey Data Library Rutherford North Library.

American Community Survey Overview September 4, 2013 Tim Gilbert American Community Survey Office.

Why Is It There? Getting Started with Geographic Information Systems Chapter 6.

Evaluation of Alternative Methods for Identifying High Collision Concentration Locations Raghavan Srinivasan 1 Craig Lyon 2 Bhagwant Persaud 2 Carol Martell.

Utah Department of Health 1 1 Identifying Peer Areas for Community Health Collaboration and Data Smoothing Brian Paoli Utah Department of Health 6/6/2007.

American Community Survey (ACS) 1 Oregon State Data Center Meeting Portland State University April 14,

Geographic Information Science

 Using Data for Demographic Analysis Country Course on Analysis and Dissemination of Population and Housing Census Data with Gender Concern October.

Some ACS Data Issues and Statistical Significance (MOEs) Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance.

Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.

CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.

Chapter 4 – Descriptive Spatial Statistics Scott Kilker Geog Advanced Geographic Statistics.

Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.

Descriptive Research Study Investigation of Positive and Negative Affect of UniJos PhD Students toward their PhD Research Project Dr. K. A. Korb University.

American Community Survey (ACS) Product Types: Tables and Maps Samples Revised

What’s the Point? Working with 0-D Spatial Data in ArcGIS

United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.

Statistics Canada Citizenship and Immigration Canada Methodological issues.

Technical Details of Network Assessment Methodology: Concentration Estimation Uncertainty Area of Station Sampling Zone Population in Station Sampling.

Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.

The School of Education- St. John’s University

Disparities between Metro’s Metroscope Model and the Demographers’ Forecasts Richard Lycan Institute on Aging, Portland State University Oregon Academy.

Kevin A Henry, Ph.D New Jersey Cancer Registry Cancer Epidemiology Services Frank Boscoe, Ph.D New York State Cancer Registry Estimating the accuracy of.

Travelling to School.

Presentation transcript:

Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State University

Context We examined use of adaptive bandwidth density mapping to spatially generalize block group data from the American Community Survey and tabulate to new geographies. Used disability rates for the age 65 and older population to test the concepts. Disability rate data are published in the tables from the American Community Survey. The survey asks about several types of disability such as hearing or problems getting around in the residence. Methods were evaluated in two contexts: - 1. A study of rural transportation needs in Oregon for - Census urban area geographies (2015 conference proceedings) - 2. Development of neighborhood statistics for Portland, Oregon for - Neighborhood association districts (this paper) Earlier study of grid based methods in school enrollment forecasting (2014 Education Conference Proceedings)

Some characteristics of disability rates Partly to escape the problems of small sample size we use the data for any disability. Here are the individual level correlations between the six measures. Disability rates rise with age. We use rates for the population age 65 plus but the rates for the very old are high. Persons with higher educational levels respond less frequently to the disability questions and incur disabilities later in life. Disability rates are slightly higher for persons living in urban than rural settings.

Disability rates for block groups, tracts, and PUMAs Block groups, census tracts, and PUMAs show disability rates at varying levels of detail. At the block group level we see interesting detail, but know much of the variation is due to sampling error. Census tract data averages out some of the sampling error but also reduces the amount of geographical detail. Going all the way to PUMA geography, areas with at least 60,000 population, provides stable results, but little geographical detail.

Our goal - We would like to have data for various geographies that are reliable. - There is a tradeoff between risk from sampling error and geographical specificity. - Here is a map of disability rates by neighborhood compiled by allocating block group data to neighborhoods using block centroid populations. Can grid based methods improve on this standard method for allocating data to new geographies?

How density grids are used to calculate disability rates The map to the right shows a combination of: - A grid map of disability rates - A point map of block centroids with population age Neighborhood boundaries Spatial Analyst extract values to points is used to add the rate to the point file. Disability rates for neighborhoods are summarized using ArcMap or Excel.

Quantifying the error The graphs to the right show coefficient of variation (CV) – the ratio of the MOE to the estimate. The CV declines from huge to acceptable as the size of the geography increases. We include an intermediate geography referred to as a “super tract” in between the tract and PUMA. Supertracts were constructed to have about 3,000 persons age 65+ Supertracts were constructed to have about 3,000 persons age 65+

Sample size by geography In the following grid based analyses we use data for geographies about the size of the supertract. The median value of the CV declines by about a factor of two as one progresses from small to large geographies. The median number of persons age 65+ rises more rapidly – block group – census tract - 2,923 – super tract - 14,714 PUMA These values give us some rough guidelines about the population desired in our grid based analysis.

The type of grid mapping affects the results A fixed distance grid and an adaptive grid are used to calculate disability rates. With a fixed density grid all the points within a fixed radius from each grid cell are used to calculate density at that grid location. Here the radius is 4,400 feet. Note that the number of age 65+ varies considerably between the two red circles with a 4,400 foot radius and thus the sampling error would vary considerably across the map

The adaptive grid reaches out to get a number of 65+ When only 500 persons age 65+ are required the required number can be found within a few thousand feet. This range roughly equates to the sample in a single census tract. As the number required increases to 1,000, 2,000, … the search needs to reach out over a longer radius. The 3,000 persons distance roughly equates to the sample in four or five census tracts.

Disability rates using adaptive bandwidth grid This animation shows the computation of disability rates for various adaptive bandwidths. A bandwidth of 500 means that the grid is based on a search radius sufficient to include 500 persons age 65+ A bandwidth of 3000 persons is roughly equivalent to a supertract.

This sequence of slides walks through the calculation of number of disabled by bandwidth. As bandwidth increases the range of computed values decreases. Note the outlier values in ALLOC for Sunderland and Portland Airport.

Comparing fixed and adaptive methods This scatter chart shows the correlation between the estimate of disabled age 65+ from the fixed and adaptive bandwidth models. The fixed bandwidth is for 4,400 feet and the adaptive is for 3,000 persons. The results of the two models in this case are quite similar. Both reduce the effects of the sampling errors. Applied to Census Urban area data the fixed bandwidth did not perform as well.

Software for adaptive kernel density analysis Developed my own program to better understand. - Two stages - 1. Computes distance to grid for particular bandwidth, e.g. includes 1,000 persons - 2. Uses distances to grid from “1” to calculate a ratio grid between numerator and denominator, e.g. disabled/all persons - Features - Grid for numerator and denominator use same bandwidth, based on same area - Weights values by reciprocal of distance to a power, used d 2 in examples - Cuts off search at some distance value, used 8 miles in example - Written in low level code using True Basic Available software - Spatial Analyst does not include. Statistical Analyst includes adaptive tool, but not applicable to this computation. - CrimeStat includes an adaptive bandwidth grid model. It does not use common geography for numerator and denominator. - SaTScan includes an adaptive bandwidth model similar to my program - Various. Researchers in such fields as remote sensing, epidemiology, criminology, and ecology make use of the adaptive bandwidth grid model using various scientific computing software, such as StatA.

Conclusions There is high interest in publishing data from the ACS for local geographies, such as neighborhoods. Allocating block group data makes this possible, but with problems due to sampling error. The use of grid based models in the allocation process is one option for compiling ACS data for areas like neighborhoods. - Advantages - Reduces wild swings due to sampling error. - Recompilation to different geographies is easy. - Disadvantages - The computed values represent a general region, not a specific geography - Error measures can only be stated in broad terms - Works best where values vary in a smooth form Fixed and adaptive distance grids behave differently - Adaptive distance grid emphasizes sample size, hard to visualize what area is included in calculations - For the fixed distance grid it is difficult to decide what radius to use when density of the observed variable varies greatly.

Richard Lycan Institute on Aging College of Urban and Public Affairs Portland State University, Oregon