Global Clustering Tests. Tests for Spatial Randomness H 0 : The risk of disease is the same everywhere after adjustment for age, gender and/or other covariates.

Slides:



Advertisements
Similar presentations
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Hypotheses Test.
Advertisements

Hotspot/cluster detection methods(1) Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model Null hypothesis H0: points.
11 Pre-conference Training MCH Epidemiology – CityMatCH Joint 2012 Annual Meeting Intermediate/Advanced Spatial Analysis Techniques for the Analysis of.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Statistical approaches for detecting clusters of disease. Feb. 26, 2013 Thomas Talbot New York State Department of Health Bureau of Environmental and Occupational.
Spatial statistics Lecture 3.
Brain Cancer Mortality in the United States Joint work with: Zixing Fang, UCLA David Gregorio, Univ Connecticut.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
A Spatial Scan Statistic for Survival Data Lan Huang, Dep Statistics, Univ Connecticut Martin Kulldorff, Harvard Medical School David Gregorio, Dep Community.
Topic 6: Introduction to Hypothesis Testing
GIS and Spatial Statistics: Methods and Applications in Public Health
Smoothed Maps. This is a Smoothed Map Ideas Behind Smoothing To avoid arbitrary political boundaries To adjust unstable estimates towards a global mean.
Correlation and Autocorrelation
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Horng-Chyi HorngStatistics II127 Summary Table of Influence Procedures for a Single Sample (I) &4-8 (&8-6)
1 A Novel Binary Particle Swarm Optimization. 2 Binary PSO- One version In this version of PSO, each solution in the population is a binary string. –Each.
SA basics Lack of independence for nearby obs
Inferences About Process Quality
Advanced GIS Using ESRI ArcGIS 9.3 Arc ToolBox 5 (Spatial Statistics)
1 Spatial Statistics and Analysis Methods (for GEOG 104 class). Provided by Dr. An Li, San Diego State University.
Spatial Statistics for Cancer Surveillance Martin Kulldorff Harvard Medical School and Harvard Pilgrim Health Care.
Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.
Geographic Information Science
Using ArcGIS/SaTScan to detect higher than expected breast cancer incidence Jim Files, BS Appathurai Balamurugan, MD, MPH.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
The Spatial Scan Statistic. Null Hypothesis The risk of disease is the same in all parts of the map.
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Tue, Oct 23, 2007.
Claims about a Population Mean when σ is Known Objective: test a claim.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Esri Southeast User Conference Lara Hall May 5, 2014.
Extending Spatial Hot Spot Detection Techniques to Temporal Dimensions Sungsoon Hwang Department of Geography State University of New York at Buffalo DMGIS.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Section 9.3 ~ Hypothesis Tests for Population Proportions Introduction to Probability and Statistics Ms. Young.
Taking ‘Geography’ Seriously: Disaggregating the Study of Civil Wars. John O’Loughlin and Frank Witmer Institute of Behavioral Science University of Colorado.
Proportions. A proportion is the fraction of individuals having a particular attribute.
Introduction to Hypothesis Testing: the z test. Testing a hypothesis about SAT Scores (p210) Standard error of the mean Normal curve Finding Boundaries.
Chapter Outline Goodness of Fit test Test of Independence.
So, what’s the “point” to all of this?….
Local Spatial Statistics Local statistics are developed to measure dependence in only a portion of the area. They measure the association between Xi and.
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Fri, Nov 12, 2004.
Analyzing the Geospatial Imbalance of the Primary Care Physician Labor Supply in the Contiguous United States By Russ Frith University of W. Florida Capstone.
Spatial Statistics and Analysis Methods (for GEOG 104 class).
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Wed, Nov 1, 2006.
Point Pattern Analysis
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Ch8.2 Ch8.2 Population Mean Test Case I: A Normal Population With Known Null hypothesis: Test statistic value: Alternative Hypothesis Rejection Region.
Testing Hypotheses about a Population Proportion Lecture 31 Sections 9.1 – 9.3 Wed, Mar 22, 2006.
Statistical Significance: Tests for Spatial Randomness.
Material from Prof. Briggs UT Dallas
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Spatial Point Processes Eric Feigelson Institut d’Astrophysique April 2014.
AP Statistics Chapter 24 Notes “Comparing Two Sample Means”
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
2 NURS/HSCI 597 NURSING RESEARCH & DATA ANALYSIS GEORGE MASON UNIVERSITY.
Applications of the Poisson Distribution
Journal Club Notes.
CONCEPTS OF HYPOTHESIS TESTING
Recipe for any Hypothesis Test
Inference Confidence Interval for p.
Testing Hypotheses about a Population Proportion
Covering Principle to Address Multiplicity in Hypothesis Testing
Tests of Significance Section 10.2.
Testing Hypotheses about a Population Proportion
Presentation transcript:

Global Clustering Tests

Tests for Spatial Randomness H 0 : The risk of disease is the same everywhere after adjustment for age, gender and/or other covariates.

Tests for Global Clustering Evaluates whether clustering exist as a global phenomena throughout the map, without pinpointing the location of specific clusters.

Tests for Global Clustering More than 100 different tests for global clustering proposed by different scientists in different fields. For example: Whittemore’s Test, Biometrika 1987 Cuzick-Edwards k-NN, JRSS 1990 Besag-Newell’s R, JRSS 1991 Tango’s Excess Events Test, StatMed 1995 Swartz Entropy Test, Health and Place 1998 Tango’s Max Excess Events Test, StatMed 2000

Cuzick-Edward’s k-NN Test  i  c i  j  c j I(d ij <d ik(i) ) where c i = number of deaths in county i d ij = distance from county i to county j k(i) = the county with the ‘k-nearest neighbor’ to an individual in county i, defined in terms of expected cases rather than individuals.

Cuzick-Edward’s k-NN Test Special case of the Weighted Moran’s I Test, proposed by Cliff and Ord, 1981

Tango’s Excess Events Test  i  j  c j -E(c j )]  c j -E(c j )] e -4d 2 ij / 2 where c i = number of deaths in county i E(c j ) = expected cases in county i | H 0 d ij = distance from county i to county j = clustering scale parameter

Whittemore's Test Whittemore et al. proposed the statistic

Besag- Newell’s R For each case, find the collection of nearest counties so that there are a total of at least k cases in the area of the original and neighboring counties. Using the Poisson distribution, check if this area is statistically significant (not adjusting for multiple testing) R is the the number of cases for which this procedure creates a significant area

Besag-Newell's R Let um(i)=min{j:(D j(i) +1) k}. Under null hypothesis, the case number s will have Poisson distribution with probability where p=C/N. For each county R is defined as

Swartz ’ s Entropy Test The test statistic is defined as where n i is the population in county I, and N is the total population

Global Clustering Tests Power Evaluation Joint work with Toshiro Tango, Peter Park and Changhong Song

Power Evaluation, Setup 245 counties and county equivalents in Northeastern United States Female population 600 randomly distributed cases, according to different probability models

Note Besag-Newell’s R and Cuzick-Edwards k-NN tests depend on a clustering scale parameter. For each test we evaluate three different parameters.

Global Chain Clustering Each county has the same expected number of cases under the null and alternative hypotheses 300 cases are distributed according to complete spatial randomness Each of these have a twin case, located at the same or a nearby location.

Power Zero Distance Besag-Newell Cuzick-Edwards Tango’s MEET0.99 Swartz Entropy1.00 Whittemore’s Test0.13 Spatial Scan0.79

Power Fixed Distance, 1% Besag-Newell Cuzick-Edwards Tango’s MEET0.41 Swartz Entropy0.14 Whittemore’s Test0.12 Spatial Scan0.28

Power Fixed Distance, 4% Besag-Newell Cuzick-Edwards Tango’s MEET0.17 Swartz Entropy0.06 Whittemore’s Test0.10 Spatial Scan0.12

Power Random Distance, 1% Besag-Newell Cuzick-Edwards Tango’s MEET0.56 Swartz Entropy0.39 Whittemore’s Test0.12 Spatial Scan0.35

Power Random Distance, 4% Besag-Newell Cuzick-Edwards Tango’s MEET0.25 Swartz Entropy0.13 Whittemore’s Test0.10 Spatial Scan0.18

Hot Spot Clusters One or more neighboring counties have higher risk that outside. Constant risks among counties in the cluster, as well as among those outside the cluster

Power Grand Isle, Vermont (RR=193) Besag-Newell Cuzick-Edwards Tango’s MEET0.20 Swartz Entropy0.94 Whittemore’s Test0.02 Spatial Scan1.00

Power Grand Isle +15 neigbors (RR=3.9) Besag-Newell Cuzick-Edwards Tango’s MEET0.23 Swartz Entropy0.71 Whittemore’s Test0.01 Spatial Scan0.97

Power Pittsburgh, PA (RR=2.85) Besag-Newell Cuzick-Edwards Tango’s MEET0.92 Swartz Entropy0.27 Whittemore’s Test0.00 Spatial Scan0.94

Power Pittsburgh + 15 neighbors (RR=2.1) Besag-Newell Cuzick-Edwards Tango’s MEET0.83 Swartz Entropy0.35 Whittemore’s Test0.00 Spatial Scan0.95

Power Manhattan (RR=2.73) Besag-Newell Cuzick-Edwards Tango’s MEET0.94 Swartz Entropy0.26 Whittemore’s Test0.27 Spatial Scan0.92

Power Manhattan + 15 neighbors (RR=1.53) Besag-Newell Cuzick-Edwards Tango’s MEET0.99 Swartz Entropy0.05 Whittemore’s Test0.87 Spatial Scan0.93

Power, Three Clusters Grand Isle (RR=193), Pittsburgh (RR=2.85), Manhattan (RR=2.73 Besag-Newell Cuzick-Edwards Tango’s MEET1.00 Swartz Entropy0.99 Whittemore’s Test0.01 Spatial Scan1.00

Power, Three Clusters Grand Isle +15, Pittsburgh +15, Manhattan +15 Besag-Newell Cuzick-Edwards Tango’s MEET0.98 Swartz Entropy0.74 Whittemore’s Test0.12 Spatial Scan0.98

Conclusions Besag-Newell’s R and Cuzick-Edward’s k-NN often perform very well, but are highly dependent on the chosen parameter Moran’s I and Whittemore’s Test have problems with many types of clustering Tango’s MEET perform well for global clustering The spatial scan statistic perform well for hot-spot clusters

Limitations Only a few alternative models evaluated, on one particular geographical data set. Results may be different for other types of alternative models and data sets.