Download presentation

Presentation is loading. Please wait.

Published byChristian Sack Modified about 1 year ago

1
What is the true shape of a disease cluster? The multi-objective genetic scan Luiz Duczmal Ricardo C.H. Takahashi André L.F. Cançado Univ. Federal Minas Gerais, Brazil, Statistics Dept., Electrical Engineering Dept., Mathematics Dept. Geoinfo 2006

2
Irregularly shaped spatial disease clusters occur commonly in epidemiological studies, but their geographic delineation is poorly defined. Most current spatial scan software usually displays only one of the many possible cluster solutions with different shapes, from the most compact round cluster to the most irregularly shaped one, corresponding to varying degrees of penalization parameters imposed to the freedom of shape. Even when a fairly complete set of solutions is available, the choice of the most appropriate parameter setting is left to the practitioner, whose decision is often subjective.

3
We propose quantitative criteria for choosing the best cluster solution, through multi-objective optimization, by finding the Pareto-set in the solution space. Two competing objectives are involved in the search: regularity of shape, and scan statistic value. Instead of running sequentially a cluster finding algorithm with varying degrees of penalization, the complete set of solutions is found in parallel, employing a genetic algorithm.

4
The cluster significance concept is extended for this set in a natural and unbiased way, being employed as a decision criterion for choosing the optimal solution. The Gumbel distribution is used to approximate the empiric scan statistic distribution, speeding up the significance estimation. The method is fast, with good power of detection. An application to breast cancer clusters is discussed. Keywords: spatial scan statistic, disease clusters, geometric compactness penalty correction, Pareto-sets, multi-objective optimization, vector optimization, Gumbel distribution, genetic algorithm.

5
Spatial Scan Statistics Kulldorff (1997) Map with m regions Total population N C cases Under the null hypothesis there is no cluster in the map, and the number of cases in each region is Poisson distributed.

6
For each circle centered in each centroid’s region, let z be the collection of regions that lie inside it. Let = number of cases inside z = expected cases inside z z if and one otherwise. The scan statistic is defined as

7
The collection (or zone) z with the highest L(z) is the most likely cluster. We sweep through all the m 2 possible circular zones, looking for the highest L(z) value. The whole procedure is repeated for thousands of times, for each set of randomly distributed cases. (Monte Carlo, Dwass(1957)). We need to compare this value against the max L(z) for maps with cases distributed randomly under the null hypothesis.

8
Penalty function to control the freedom of shape (joint work with Kulldorff and Huang) Extreme example of an irregularly shaped cluster

9
A(z)=area of the zone z H(z)=perimeter of the convex hull of z Compactness: Intuitively, the convex hull of a planar object is the cell inside a rubber band stretched around it. K(z) = the area of z divided by the area of the circle with perimeter H(z).

10
Circle: K(z) = 1 Square: K(z) = π/4 Compactness for some common shapes

11
Penalty function for the log of the likelihood ratio (LLR(z)) K(z).LLR(z).LLR(z) Generalized compactness correction: a = 1 : full compactness correction a = 0.5 : medium compactness correction a = 0.0 : no compactness correction

12
The Elliptic Scan Statistic (joint work with Kulldorff, Huang and Pickle) The scanning window has variable location, size, shape and angle. A penalty function may be used.

13
Breast Cancer Mortality Rates Most likely cluster Pickle et al., Atlas of United States Mortality, NCHS, 1996 Circular Elliptical, axis ratio = 2 Elliptical, axis ratio = 5

14

15
penalty correction 1 0 circular

16
penalty correction 1 0 elliptical

17
penalty correction 1 0 irregular

18
no penalty correction 1 0 = disaster ! irregular

19
(joint work with Martin Kulldorff and Lan Huang) Extreme example of an irregularly shaped cluster

20
Homicide average Minas Gerais State, Brazil Hom./100,000 inhab./year 853 municipalities Source: DATASUS Map by Ricardo Tavares 100 km

21
OBJECTIVE: Find a quasi-optimal solution for a maximization problem. Initial population. Random crossing-over of parents and offspring generation. Selection of children and parents for the next generation. Random mutation. Repeat the previous steps for a predefined number of generations or until there is no improvement in the functional. Genetic Algorithms (joint work with Cançado, Takahashi and Bessegato)

22
We minimize the graph-related operations by means of a fast offspring generation and evaluation of the Kulldorff´s scan likelihood ratio statistic. This algorithm is more than ten times faster and exhibits less variance compared to a similar approach using simulated annealing, and thus gives better confidence intervals for the Monte Carlo inference process of significance evaluation for the most likely cluster found.

23

24

25
Incidence of Malaria Deaths in the Brazilian Amazon ( )

26

27
Initial population construction Start at a region of the map.

28
Initial population construction Add the neighbor which forms the highest LLR 2-cell zone.

29
Initial population construction Add the neighbor which forms the highest LLR 3-cell zone.

30
Initial population construction Add the neighbor which forms the highest LLR 4-cell zone.

31
Initial population construction Stop. (It is impossible to form a higher LLR 5-cell zone)

32
Initial population construction Start at another region of the map.

33
Initial population construction Add the neighbor which forms the highest LLR 2-cell zone.

34
Initial population construction etc. Repeat the previous steps for all the regions of the map.

35
THE OFFSPRING GENERATION (a simple example)

36
THE OFFSPRING GENERATION (a simple example)

37
THE OFFSPRING GENERATION (a simple example)

38
THE OFFSPRING GENERATION (a simple example) Another possible numbering

39
THE OFFSPRING GENERATION (a more sofisticated example)

40
One instance of two parent trees

41
Advantages: The offspring generation is very inexpensive; All the children zones are automatically connected; Random mutations are easy to implement; The selection for the next generation is straightforward; Fast evolution convergence; The variance between different test runs is small.

42
Population Evolution Performance

43
Irregularly shaped clusters benchmark, Northeast US counties map. Duczmal L, Kulldorff M, Huang L. (2006) Evaluation of spatial scan statistics for irregularly shaped clusters. J. Comput. Graph. Stat.

44
Power evaluation of the genetic algorithm, compared to the simulated annealing algorithm.

45
Cluster of high incidence of breast cancer. São Paulo State, Brazil, Population adjusted for age and under-reporting.

46
0 100 km Cluster of high incidence of breast cancer. São Paulo State, Brazil, Population adjusted for age and under-reporting. Compactness correction: 1.0 Cluster cases: 2,924 Cluster population: 346,024 Incidence: LLR: p-value:0.001 Data source: DATASUS, G.L.Souza

47
0 100 km Compactness correction: 0.5 Cluster cases: 3,078 Cluster population: 361,373 Incidence: LLR: p-value:0.001 Data source: DATASUS, G.L.Souza Cluster of high incidence of breast cancer. São Paulo State, Brazil, Population adjusted for age and under-reporting.

48
0 100 km Compactness correction: 0.0 Cluster cases: 3,324 Cluster population: 394,294 Incidence: LLR: p-value:0.001 Data source: DATASUS, G.L.Souza Cluster of high incidence of breast cancer. São Paulo State, Brazil, Population adjusted for age and under-reporting.

49
The genetic algorithm for disease cluster detection is fast and exhibits less variance compared to similar approaches; The potential use for epidemiological studies and syndromic surveillance is encouraged; The need of penalty functions for the irregularity of cluster’s shape is clearly demonstrated by the power evaluation tests; The power of detection of clusters is similar to the simulated annealing algorithm; The flexibility of shape control gives to the practitioner more insight of the geographic cluster delineation.

50
Northeast US counties map with observed cases: Age adjusted female breast cancer, Kulldorff M., Feuer E.J., Miller B.A., Freedman L.S. (1997) Breast cancer clusters in the Northeast United States: a geographic analysis. American Journal of Epidemiology, 146: Percent below/above expected > 20% 12% to 20% 4% to 12% -4% to +4% -12% to -4% -20% to -12% < -20%

51
The Gumbel parametric approximation to the log likelihhod ratio scan. Joint work with Cançado and Takahashi. Based on the results of Abrams, Kulldorff and Kleinmann. LLR

52
Pareto Sets The detection of irregularly shaped disease clusters through multi-objective optimization.

53
The genetic algorithm is used to maximize two objectives: -the scan statistic. -the regularity of shape (compactness).

54
log likelihood ratio compactness Elite (red dots): Each red dot is not surpassed by any other point on all variables simultaneously.

55
log likelihood ratio compactness Elite (red dots): Each red dot is not surpassed by any other point on all variables simultaneously.

56
log likelihood ratio compactness Elite (red dots): Each red dot is not surpassed by any other point on all variables simultaneously.

57
log likelihood ratio compactness Elite (red dots): Each red dot is not surpassed by any other point on all variables simultaneously.

58
log likelihood ratio compactness The Pareto Surface is formed joining the elite points.

59

60

61

62

63

64

65

66
Null Hypothesis Critical Value Pareto Surface, 95 percentile (circles). 100 elites (from 100 simulations under the null hypothesis). log likelihood ratio compactness

67
log likelihood ratio Power Test Pareto Surface, 95 percentile under null hypothesis (red circles). 100 elites (from 100 simulations under the alternative hypothesis).

68

69

70

71

72

73
Northeast US counties map with observed cases: Age adjusted female breast cancer, Kulldorff M., Feuer E.J., Miller B.A., Freedman L.S. (1997) Breast cancer clusters in the Northeast United States: a geographic analysis. American Journal of Epidemiology, 146: Percent below/above expected > 20% 12% to 20% 4% to 12% -4% to +4% -12% to -4% -20% to -12% < -20%

74

75

76
Duczmal L, Kulldorff M, Huang L. (2006) Evaluation of spatial scan statistics for irregularly shaped clusters. J. Comput. Graph. Stat. 15;2,1-15. Duczmal L, Cançado ALF, Takahashi RHC, Bessegato LF, A genetic algorithm for irregularly shaped spatial scan statistics (submitted). Duczmal L, Cançado ALF, Takahashi RHC, Delineation of Irregularly Shaped Disease Clusters through Multi-Objective Optimization (submitted). Duczmal L, Assunção R. (2004), A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters, Comp. Stat. & Data Anal., 45, Kulldorff M, Huang L, Pickle L, Duczmal L. (2005) An Elliptic Spatial Scan Statistic. Statistics in Medicine (to appear). Patil GP, Taillie C. (2004) Upper level set scan statistic for detecting arbitrarily shaped hotspots. Envir. Ecol. Stat., 11, Kulldorff M. (1997), A Spatial Scan Statistic, Comm. Statist. Theory Meth., 26(6), Kulldorff M, Tango T, Park PJ. (2003) Power comparisons for disease clustering sets, Comp. Stat. & Data Anal., 42, Kulldorff M, Feuer EJ, Miller BA, Freedman LS. (1997) Breast cancer clusters in the Northeast United States: a geographic analysis. Amer. J. Epidem., 146: de Souza Jr. GL (2005) The Detection of Clusters of Breast Cancer in São Paulo State, Brazil. M.Sc. Dissertation, Univ. Fed. Minas Gerais. References

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google