A Spatial Scan Statistic for Survival Data Lan Huang, Dep Statistics, Univ Connecticut Martin Kulldorff, Harvard Medical School David Gregorio, Dep Community.

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta SAMSI September 29, 2005.
Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta Mid-Year Meeting February 3, 2006.
Class 6: Hypothesis testing and confidence intervals
SADC Course in Statistics Importance of the normal distribution (Session 09)
Hotspot/cluster detection methods(1) Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model Null hypothesis H0: points.
Departments of Medicine and Biostatistics
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic Allyson Abrams, Martin.
Introduction to Risk Factors & Measures of Effect Meg McCarron, CDC.
Probability and Statistics for Engineers (ENGC 6310) Review.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Stat 301 – Day 21 Large sample methods. Announcements HW 4  Updated solutions Especially Simpson’s Paradox  Should always show your work and explain.
Scan Statistics via Permutation Tests David Madigan.
Cumulative Geographic Residual Test Example: Taiwan Petrochemical Study Andrea Cook.
Introduction to Survival Analysis PROC LIFETEST and Survival Curves.
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li December 2, 2004.
Sample Size Determination
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Spatial Statistics for Cancer Surveillance Martin Kulldorff Harvard Medical School and Harvard Pilgrim Health Care.
Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.
Geographic Information Science
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
One Sample  M ean μ, Variance σ 2, Proportion π Two Samples  M eans, Variances, Proportions μ1 vs. μ2 σ12 vs. σ22 π1 vs. π Multiple.
Using ArcGIS/SaTScan to detect higher than expected breast cancer incidence Jim Files, BS Appathurai Balamurugan, MD, MPH.
Inference for regression - Simple linear regression
The Spatial Scan Statistic. Null Hypothesis The risk of disease is the same in all parts of the map.
Claims about a Population Mean when σ is Known Objective: test a claim.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
Statistical approaches to analyse interval-censored data in a confirmatory trial Margareta Puu, AstraZeneca Mölndal 26 April 2006.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
GM Monawar Hosain State Cancer Epidemiologist Bureau of Public Health Statistics and Informatics NH Division of Pubic Health Services.
Prostate Cancer: A Case for Active Surveillance Philip Kantoff MD Dana-Farber Cancer Institute Professor of Medicine Harvard Medical School.
Probability Distributions and Dataset Properties Lecture 2 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.
Bayesian Analysis and Applications of A Cure Rate Model.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
HSRP 734: Advanced Statistical Methods July 17, 2008.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Nonparametric Statistics
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
Two Sample t-tests and t-intervals. Resting pulse rates for a random sample of 26 smokers had a mean of 80 beats per minute (bpm) and a standard deviation.
Ch8.2 Ch8.2 Population Mean Test Case I: A Normal Population With Known Null hypothesis: Test statistic value: Alternative Hypothesis Rejection Region.
1 SMU EMIS 7364 NTU TO-570-N Control Charts Basic Concepts and Mathematical Basis Updated: 3/2/04 Statistical Quality Control Dr. Jerrell T. Stracener,
Statistical Significance: Tests for Spatial Randomness.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
The Burden of Cancer in Connecticut Lou Gonsalves Connecticut Tumor Registry
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Spatial Scan Statistic for Geographical and Network Hotspot Detection C. Taillie and G. P. Patil Center for Statistical Ecology and Environmental Statistics.
AP Stat 2007 Free Response. 1. A. Roughly speaking, the standard deviation (s = 2.141) measures a “typical” distance between the individual discoloration.
RESIDENTIAL MOVEMENT BETWEEN TIME OF CANCER DIAGNOSIS & DEATH Recinda Sherman, MPH, CTR Florida Cancer Data System.
A genetic algorithm for irregularly shaped spatial clusters Luiz Duczmal André L. F. Cançado Lupércio F. Bessegato 2005 Syndromic Surveillance Conference.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
April 18 Intro to survival analysis Le 11.1 – 11.2
IEE 380 Review.
Dept of Biostatistics, Emory University
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
National Cancer Statistics in Korea, 2015
Comparing Two Proportions
Software Cluster, Aldrich-Wayne Free, Old
Presentation transcript:

A Spatial Scan Statistic for Survival Data Lan Huang, Dep Statistics, Univ Connecticut Martin Kulldorff, Harvard Medical School David Gregorio, Dep Community Medicine, Univ Connecticut

Motivation and Background What is the geographical distribution of prostate cancer survival in Connecticut? Are there geographical clusters with exceptionally short or long survival?

Survival Data For each person: Time of diagnosis. Whether dead or censored Time until death/censoring Residential geographical coordinates Age etc

Motivation and Background Spatial scan-statistics with Bernoulli and Poisson models are designed for count data. Length of survival is continuous data. Survival data is often censored.

Solution Spatial Scan Statistic using an Exponential Probability Model

Methodology Exponential model based spatial statistic H 0: θ in = θ out H a: θ in θ out Exponential likelihood Spatial scan-statistic distribution Permutation test Stat inference Hypothesis test Detect a significant cluster

Methods Evaluation Location of 610 Connecticut prostate cancer patients diagnosed in patients in southwest Connecticut constitute a cluster with shorter survival (cluster radius: 8.65 km) Each of the 610 patients assigned a random survival or censoring time using different distributions inside and outside the cluster

Model Evaluation Exponential Gamma Log-normal θ in θ out θ diff Non-cen censored random fixed 610 individuals =

#individuals inside the true cluster, successfully detected for the simulated datasets without censoring θ diff P-value<0.05 s

#individuals inside the true cluster, successfully detected for censored datasets with fixed censoring time θ diff P-value<0.05 s

#individuals inside the true cluster, successfully detected for censored datasets with random censoring time P-value<0.05 θ diff s

Model Evaluation Exponential model is robust, since the exponential based scan statistic is able to reject the null hypothesis with a low p-value when the distribution difference is moderate or large, no matter the distribution and censoring mechanism.

Application to Prostate Cancer Data Between 1984 and 1995, the Connecticut Tumor registry recorded invasive prostate cancer incidence cases among the population-at-risk (roughly 1.2 million males 20+ years old in 1990) records available after data cleaning. Follow-up through December had died and 8753 were censored.

Significant clusters using exponential model

clusterIn clusterRRLLRP #death#indivi Short survival Long survival Application to Prostate Cancer Data

Covariate Adjustment Younger patients may live longer Geographical variation in histology or stage

Significant clusters after age-adjustment

Discuss Exponential model works well for censored and non- censored survival data from difference distribution, but probably no do well for all continuous variables, like data that is approximated normally distributed. The statistical inference is valid even though the survival times are not exponentially distributed because of the permutation based test procedure.

Discussion The covariate adjustment method here is based on the exponential model, assuming a constant hazard. It could be extended to non-constant hazard with several levels, or as a function of survival time associated with different kind of models. It could be extends to a space-time scan statistic when time series data are available. It could also be extended to create a scan-statistic with elliptical or other cluster shapes. Unfortunatly, no statistical software available.