Kevin A Henry, Ph.D New Jersey Cancer Registry Cancer Epidemiology Services Frank Boscoe, Ph.D New York State Cancer Registry Estimating the accuracy of.

Slides:



Advertisements
Similar presentations
A Synthetic Environment to Evaluate Alternative Trip Distribution Models Xin Ye Wen Cheng Xudong Jia Civil Engineering Department California State Polytechnic.
Advertisements

Nonresponse Bias Correction in Telephone Surveys Using Census Geocoding: An Evaluation of Error Properties Paul Biemer RTI International and University.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Wisconsin HIV/AIDS Surveillance Annual Review: Slide Set New diagnoses, prevalent cases, and deaths through December 2014 April 2015 P Wisconsin.
BEN ANDERSON PROJECT MANAGER UNIVERSITY OF LOUISVILLE CENTER FOR HAZARDS RESEARCH AND POLICY DEVELOPMENT Using Dasymetric Mapping.
Modeling travel distance to health care using geographic information systems Anupam Goel, MD Wayne State University Detroit, MI (USA)
Claire DeVaughan U.S. Geological Survey NSDI Partnership Office Austin, Texas COGNA October 20, 2004 Integrating Local Data Sets into The National Map.
©2007 Austin Troy Lecture 8: Introduction to GIS 1.Multi-layer vector query operations in Arc GIS 2.Vector Spatial Joining Lecture by Austin Troy, University.
WSS/DC-AAPOR Seminar November 10, 2009 Uses of and Experiences with Address-Based Sampling Jill Montaquila Westat.
Neighborhood Walkability and Bikeability Andrew Rundle, Dr.P.H. Associate Professor of Epidemiology Mailman School of Public Health Columbia University.
St. Louis City Crime Analysis 2015 Homicide Prediction Presented by: Kranthi Kancharla Scott Manns Eric Rodis Kenneth Stecher Sisi Yang.
Lecture 16: Data input 1: Digitizing and Geocoding By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
Figure 2. Areas of Zero Access Relative to A) Unemployment, B) Poverty, C-E) Distribution of Race/Ethnicity EVALUATE the distribution of agencies providing.
Do Magnet Schools Attract All Families Equally? A GIS Mapping Analysis of Latinos Naralys Estevez Jack Dougherty Trinity College, Hartford CT.
David Martin Department of Geography University of Southampton 2001 Census: the emergence of a new geographical framework.
GIS Internet Map Servers for Health Applications Carol L. Hanchette, Ph.D. Rebecca D. Martin, Ph.D. Research Triangle Institute Research Triangle Park,
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. How to Get a Good Sample Chapter 4.
Introduction to the Use of Geographic Information Systems in Public Health Elio Spinello, MPH California State University, Northridge.
Your Community by the Numbers Accessing the most current and relevant Census data Alexandra Barker Data Dissemination Specialist U.S Census Bureau New.
ELCA Research and Evaluation Demographic Services.
Census Basics UP206A: Introduction to GIS. History When was the first census? – 1790 How many people were counted? – 3.9 million How many states did we.
Socio-Economic & Demographic Data Tools for Proactive Planning Robin Blakely-Armitage STATE OF NEW YORK CITIES: Creative Responses to Fiscal Stress March.
Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.
17 June, 2003Sampling TWO-STAGE CLUSTER SAMPLING (WITH QUOTA SAMPLING AT SECOND STAGE)
Welcome to Geocoding using ArcGIS Presented by: The Nevada Division of Public and Behavioral Health, Office of Public Health Informatics and Epidemiology.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
American Factfinder Workshop Nola du Toit Spring 2007.
Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State University.
Addressed Based Sampling as an Alternative to Traditional Sampling Approaches: An Exploration May 6, 2013.
C2ER 52 nd Annual Conference & LMI Training Institute Annual Forum Regional Socioeconomic Statistics Update on U.S. Census Bureau Programs June 8, 2012.
NeighborhoodReach.com Willowbend Corporation. Neighborhoods A geographically localized community within a larger geography where the inhabitants share.
Consumer Market Chapter 6. Three Most Important Demographic Variables??? Ethnicity Income Age.
UP206A: Introduction to GIS. » When was the first census? ˃1790 » How many people were counted? ˃3.9 million » How many states did we have then? ˃13 original.
UP206A: Introduction to GIS. » Geocoding is the process of assigning a location, usually in the form of coordinate values (points), to an address by.
A Geographic Analysis of Making Connections Movers: Preliminary Results Ned English, Colm O’Muircheartaigh, Cathy Haggerty, and Erika Garcia Presented.
Old Louisville by the Numbers A Statistical Profile by Michael Price Urban Studies Institute University of Louisville Spring 2006.
CDRI Cancer Disparities Geocoding Project November 29, 2006 Chris Johnson, CDRI
A Demographic Evaluation of the Stability of American Community Survey Estimates for Selected Test Sites: 2000 to 2011 J. Gregory Robinson and Eric B.
Planning for 2010: A Reengineered Census of Population and Housing Preston Jay Waite Associate Director for Decennial Census U.S. Census Bureau Presentation.
CANCER INCIDENCE IN NEW JERSEY BY COUNTY, for the Comprehensive Cancer Control Plan County Needs Assessments August 2003 Prepared by: Cancer.
Wilderness & Primitive Area Recreation Participation & Consumption: An Examination of Demographic & Spatial Factors Danielle Murphy, John C. Bergstrom,
Improvements in Ohio’s Vital Statistics residence data with geo-coding software Presented at NAPHSIS/VSCP meeting Cincinnati, Ohio June 5, 2005 John O.
American Community Survey Maryland State Data Center Affiliate Meeting September 16, 2010.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
American Community Survey “It Don’t Come Easy”, Ringo Starr Jane Traynham Maryland State Data Center March 15, 2011.
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
Hospital racial segregation and racial disparity in mortality after injury Melanie Arthur University of Alaska Fairbanks.
Finding a Predictive Model for Post-Hospitalization Adverse Events Henry Carretta 1, PhD, MPH; Katrina McAfee 1,2, MS; Dennis Tsilimingras 1,3, MD, MPH.
Introduction to Survey Sampling
LIS 570 Selecting a Sample.
Urban/Rural Differences in Survival Among Medicare Beneficiaries with Breast Cancer Melony E.S. Sorbero, Ph.D. RAND Corporation Funded by Health Resources.
The Quality of Reporting on Race & Ethnicity in Medicare Data: Assessing the Effect of Improved Coding Ernest Moy, Linda G. Greenberg Center for Quality.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Sampling Designs Outline
HIV Care Continuum New Diagnoses, 2011, Georgia. Persons with HIV Engaged in Selected Stages of the Continuum of Care, United States Percent
Widening of Socioeconomic Disparities in U.S. Mortality from Major Cancers Ahmedin Jemal, PhD Elizabeth Ward, PhD June 10, 2008 Kinsey T, Jemal A, Liff.
NAACCR’S National Provider Identifier Resource Kevin Henry Frank Boscoe Chris Johnson NAACCR GIS Committee June 10, 2008.
TruVue LLC Visual Decision Support Tools TruVue provides location-based solutions to the healthcare industry for facility and physician network optimization.
Assessing Quality of Geocoded Data The Florida Registry Experience.
Sampling Dr Hidayathulla Shaikh. Contents At the end of lecture student should know  Why sampling is done  Terminologies involved  Different Sampling.
Trends in Colorectal Cancer Incidence Rates by Race, Age and Indices of Access to Medical Care in the U.S., Yongping Hao, PhD 1 Ahmedin Jemal,
Using NAPIIA to Improve the Accuracy of Asian Race Code in Registry Data Mei-Chin Hsieh, MSPH, CTR Lisa A. Pareti, BS, RHIT, CTR Vivien W. Chen, PhD NAACCR.
GIS Database. Why - Geography Gene x Environment Flights x Environment Environment: the surroundings of a physical system that may interact with the system.
Cervical cancer among Asian subgroups in California, Janet Bates, MD MPH California Cancer Registry NAACCR Annual Meeting Denver, Colorado June.
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics Injury and illness episodes.
James R. Elliott & Junia Howell
Evaluation of Geocoding Quality in Montana
Presentation transcript:

Kevin A Henry, Ph.D New Jersey Cancer Registry Cancer Epidemiology Services Frank Boscoe, Ph.D New York State Cancer Registry Estimating the accuracy of different geographical imputation methods Paper Presentation: NAACCR Annual Meeting, 2007, Detroit, MI

Introduction Geographical Imputation: Methods to assign a case a geographic location that is approximate or accurate given available geographic and demographic data Goal of geo-imputation is to assign a case a location at one geographical aggregate level based on information from one or more known geographical aggregates (Boscoe 2007). Assigned locations can be: Geo-imputation Example: Zip code to census tract Available Case Information: Zip code:’08648’ Race: ‘Black’ 1, Black Population % 8.9% 19.5% 50.% Area (e.g. census tract, block group) Point (e.g. latitude & longitude within census tract)

Introduction Why should we geo-impute? Studies can be biased due to the geographic non-randomness of ungeocoded cases or cases geocoded to zip code centroid (Oliver et al. 2006). Cases geocoded to a zip code centroid may not be located in the correct census tract. Removing cases geocoded by zip code can result in selection bias. Cases geocoded to zip code centroids can inflate case counts at the location where the zip centroid falls. No systematic evaluation of geo-imputation has been completed to determine which method offers the best predictive power. Should we geo-impute?

Study Objective What census tract demographic information (e.g. race, age) provides the best predictive value to assign a case to the correct census tract? Is demographic based geo-imputation better than two alternatives? 1) Selecting census tracts within a zip code zone randomly 2) Using the census tracts originally assigned to cases based on the zip code centroid location. Study Questions Examine the usefulness of geo-imputation for assigning census tracts to cases that have been previously geocoded to only a zip code centroid.

Background: What is a zip code ZIP or ‘Zone Improvement Program’ are linear features associated with specific roads or specific addresses Zip code zones are created by digitizing boundaries around geographically street ranges Census Tracts Falling Within in Zip Code Zone Street Segments Used for Geocoding Zip Code Centroid

Background: New Jersey Zip Codes 558 zip code zones 92% of zip codes have 2 or more potential census tracts 1 zip code has 23 potential census tracts Average tracts per zip code: % 5% 10% 15% 20% 25% Tract Frequency Percent Census Tracts Per Zip Code

Methods: Study Population New Jersey residents diagnosed with breast, prostate and colorectal cancer geocoded to a full street address ( , N=96,852, NJSCR) Additional study exclusions (N=4100) : No age or race Invalid zip codes Invalid census tracts Cases geocoded to zip centroids with only one census tract Registry Variables: Race Age Census Tract Zip Code Census Tract Certainty Census Tracts Assigned to Cases Compared with: ‘Truth’ Census Tracts Assigned to Cases Imputed Case Data Original Case Data

Methods: Demographic Data Creation of Census Tract Populations: 2000 Census block populations aggregated into zip codes (Tele Atlas, 2006). Census tract populations created to include only populations within zip code. Total Tract Population 6,774 3,101 Zip code: Cumulative probabilities calculated for each tract per zip code. Census Block Population 2000 SF1 Census populations included: -Total Population (P001001) -White alone (P003003) -Black or African Amer. alone (P003004) -Asian alone (P003006) -Hispanic or Latino (P004002) -Total Population by age (P P012049)

Method: Geo-imputation Step 1 Calculate Cumulative Probabilities From CT Population Step 2 Generate random number for each case (0-1) Generate census tract based on random number ranges Step Percent Cum Probability % 32.8% 18.4% 15.9.%

Methods: Test Samples Random samples for race and age groups stratified by population density (Quintiles). Geo-imputations completed for each subset: Compared imputed census tracts with the tracts from the original case data (truth). Each imputation was run 1000 times. Results: Boxplots of mean % of matches.

Urban Rural 10% 15% 20% 25% 30% 35% Mean Percent Correct <1,132 Population Per Square Mile by Census Tract 1, ,882 2, ,078 5, ,579 >11,579 No imputation (17.1%) Random 13% Results:

N=1500 N=25000 N=4000 N=3000 Asian White Black Random Hispanic N=33,500 Asia, White, Black & Hispanic Combined No imputation (17.1%) 10% 15% 20% 25% 30% 23.1% Mean Percent Correct 22% 26.3% 22.2% 13% 24.6% Total Population (24%) Population Results:

10% 15% 20% 25% 30% Mean Percent Correct Age groups >85 No imputation (17.1%) Random 13% Age Combined (24.9%) Results:

Conclusion Geo-imputation provides a higher match rate than no-imputation or randomly allocating tracts. Percent correct dependent on population density. Imputation based on race specific population slightly higher than total population (23.1% vs 24% ). States with larger rural populations would likely have better match rates than New Jersey. Geographic imputation does offer some advantages and no serious drawbacks compared with the alternative of excluding ungeocoded cases from an analysis.

Thank you Note: New Jersey Case counts for Breast, Prostate, Colorectal & Cervical Cancer ( );( N=154,071) Data extracted from NJ Registry analytical database March 5, 2007