Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics G.P.Patil, Penn State University Reza Modarres,

Similar presentations


Presentation on theme: "Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics G.P.Patil, Penn State University Reza Modarres,"— Presentation transcript:

1 Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics G.P.Patil, Penn State University Reza Modarres, George Washington University Pushkar Patankar, Penn State University Yun Cai, Penn State University W.L. Myers, Penn State University

2 Spatially Constrained Clustering
Constrained clustering is part of a family of methods whose purpose is to delimit homogeneous regions on a univariate or multivariate surface, by forming blocks of pieces that are also adjacent in space or in time. (Legendre, 1987) Besides the similarity of measured variables, the clustering is spatially constrained if the cells within a cluster are adjacent to each other.

3 Motivations Spatially constrained clustering is useful in applications in …… Landscape Ecology Geospatial Environment Image Analysis Spatial Economics Public Policy in Geospace Geography of Disease ……

4 Comparing the Two Methods…
The result of method I is better interpreted in the variable space, but it tends to give small patches and splinters. Method II yields tighter clusters, but it is harder to interpret. As far as programming is concerned, method I has programs ready to use, and it is more computationally efficient. Usually, the two methods do not yield the same result.

5 Spatially Constrained Clustering in Hotspot Detection
Definition: A hotspot is that portion of the study region with an elevated risk of an adverse outcome. A major task of geospatial hotspot detection is to delineate areas that exhibit elevated risks over geographical regions. Upper Level Set (ULS) scan turns out to be a special case of spatially constrained clustering.

6 NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.

7 Geoinformatic hotspot surveillance system

8 Geographic and Network Surveillance for Arbitrarily Shaped Hotspots
Center for Statistical Ecology and Environmental Statistics G.P. Patil, R. Acharya, W.L. Myers, P. Patankar, Y. Cai, and S.L. Rathbun The Penn State University, University Park, PA 16802 R. Modarres George Washington University, Washington, D.C. Overview Geospatial Surveillance Upper Level Set Scan Statistic System Spatial-Temporal Surveillance Typology of Space-Time Hotspots Hotspot Prioritization Ranking Without Having to Integrate Multiple Indicators Surveillance Geoinformatics for Hotspot Detection, Prioritization, Early Warning and Sustainable Management Upper Level Set Scan System Definition: A hotspot is that portion of the study region with an elevated risk of an adverse outcome Example: West Nile Virus First isolated in 1937, this mosquito born disease, indigenous to north Africa, the Middle East and west Asia was first introduced into the United States in 1999. Example: Lyme Disease Infections from the bacterium Borelia burgdorfei vectored by ticks from the genus Ixodes. Example: Human-environment indicator values for 16 European countries. Changing Connectivity of ULS as Level Drops g Comparison of ULS Scan with Cylindrical Scan ULS Scan Disease Count Quintiles Population Quintiles Year Disease Rates Cylindrical Scan Features of ULS Scan Statistic: Identifies arbitrarily shaped hotspots Applicable to data on a network Confidence sets and hotspot ratings Computationally efficient Generalizes to space-time scan 1997 1998 Haase Diagram Poset Prioritization System Objective: Prioritize or rank hotspots based on multiple indicator and stakeholder criteria without having to integrate indicators into an index, using Haase diagrams and partially ordered sets. Example: Prioritization of disease clusters with Multiple Indicators Disease Rate Quintiles Likelihood Quintiles 1999 2000 There are a total of 3,764,448 admissible linear extensions. The cumulative rank function for Sweden exceeds that of all remaining countries. The crf’s of all countries dominate that of Ireland. The remaining countries cannot be uniquely ordered based on their crf’s. Belgium, Netherlands and United Kingdom have identical crf’s. Comparison of ULS Scan with Circular Scan 2001 ULS Scan Circular Scan Admissible linear extensions are comprised of rankings compatible with the rankings of all indicators. Treating each linear extension as a voter, the cumulative rank function is obtained from the frequencies at which each object receives each rank. 2002 2003 The crf’s also form a partially ordered set. There are only 182 admissible linear extensions for this poset, yielding the cumulative rank function: Federal Agency Partnerships CDC DOD EPA NASA NIH NOAA USFS USGS Confidence set for ULS Hotspot Hotspot Membership Rating National Applications and Case Studies Biosurveillance Carbon Management Costal Management Community Infrastructure Crop Surveillance Disaster Management Disease Surveillance Ecosystem Health Environmental Justice Sensor Networks Robotic Networks Environmental Management Environmental Policy Homeland Security Invasive Species Poverty Policy Public Health Public Health and Environment Syndromic Surveillance Social Networks Stream Networks One more iteration yields the rankings in the data table.

9 ULS Connectivity Tree -- 1
Ingredients: Tessellation of a geographic region: Intensity value G on each cell. Determines a cellular (piece-wise constant) surface with G as elevation. Imagine surface initially inundated with water Water evaporates gradually exposing the surface which appears as islands in the sea How does connectivity (number of connected components) of the exposed surface change with falling water level? a c b k d e f g h i j a, b, c, … are cell labels

10 ULS Connectivity Tree -- 2
Think of the tessellated surface as a landform Initially the entire surface is under water As the water level recedes, more and more of the landform is exposed At each water level, cells are colored as follows: Green for previously exposed cells (green = vegetated) Yellow for newly exposed cells (yellow = sandy beach) Blue for unexposed cells (blue = under water) For each newly exposed cell, one of three things happens: New island emerges. Cell is a local maximum. Morse index=2. Connectivity increases. Existing island increases in size. Cell is not a critical point. Connectivity unchanged. Two (or more) islands are joined. Cell is a saddle point Morse index=1. Connectivity decreases.

11 ULS Connectivity Tree -- 3
Newly exposed island ULS Tree g j a f d h b c a i e k Island grows a g j f d h b,c b c a i e k

12 ULS Connectivity Tree -- 4
Second island appears ULS Tree a g j d f d h b,c b c a New leaf node (local maximum) i e k Both islands grow a g b,c j d f d h e b f,g c a i e k

13 ULS Connectivity Tree -- 5
ULS Tree Islands join – saddle point a b,c g j d f d h e f,g b c a i e h k Junction node a Exposed land grows b,c g d j f d h e f,g b a c i e h k Root node i,j,k

14 Changing Connectivity of ULS as Level Drops
Hotspot zones at level g (Connected Components of upper level set)

15 ULS Connectivity Tree Schematic intensity “surface” N.B. Intensity surface is cellular (piece-wise constant), with only finitely many levels A, B, C are junction nodes where multiple zones coalesce into a single zone A B C

16 Demonstration Example
Disease Rate

17 Demonstration Example Upper Level Set Tree: Method I
12 9 2 1 3 4 5 6 7 8 13 14 15 16 17 18 19 11 10 Level 1: 3 Level 2: 18 Level 3: 8 0 Level 4: Level 5: Level 6: Level 7: 9 11 Level 8: 1 Level 9: 10 Level 10: 6 Level 11: 13 Level 12: 2

18 Demonstration Example Upper Level Set Tree: Method II
12 9 2 1 3 4 5 6 7 8 13 14 15 16 17 18 19 11 10 [3] [3,18] [3,18,0] [17] [17,16] [14] [17,16;14;15] [3,18,0,4,8,7,19,5;17,16,14,15;11] [8] [3,18,0,4] [3,18,0,4;8,7;19] [8,7]

19 The Two Methods Give the Same Results Using the ULS Tree
The ULS tree is different from general classification methods in the way that it defines its similarity matrix. The similarity matrix is determined by the exceedance of incidence rates. It does not matter at what stage the spatial constraint is applied, either at the end or as you go along, the result is the same.

20 Maximum Likelihood Estimate and Resultant Cluster Partition

21 ULS Scan on Multivariate Data Spatial Coincidence
In a scenario of multivariate data, ULS is operated as many times as the dimensions of the data, with every individual run of ULS operating only on one dimension. Finally the clusters are those that are the intersection of the clusters obtained by the runs of ULS.

22 Conclusion In many situations, the usual clustering analysis is not sufficient. The spatial structure needs to be taken into consideration and accounted for. Therefore, spatially constrained clustering has been introduced. It can be accomplished by method I and method II. These give rise to different results. In the hotspot detection work, the upper level set scan approach has been adopted. ULS is a spatially constrained clustering method based on upper level sets of rates. Similarity is defined by exceedance. It delivers identical clusters in both methods.

23 References De Soete, Carroll, G., J.D. and DeSarbo, W.S. (1987). Least squares algorithms for constructing constrained ultrametric and additive tree representations of symmetric proximity data. Journal of Classification, 4, pp BoundarySeer (2005). Software for Geographic Boundary Analysis. TerraSeer. Ann Arbor MI. Everitt, B., Landau,S., and Leese, M. (2001). Cluster Analysis. Arnold, London, pp Kulldorff, M. and Nagarwalla, N. (1995). Spatial disease clusters: detection and inference. Statistics in Medicine, 14, pp799–810. Lebart, L. (1978). Programme d’agrégation avec constraints (C.A.H. contiguïté). Cah. Anal. Donnée, 3, Legendre, P. (1987). Constrained clustering in Developments in Numerical Ecology. P. and L. Legendre, eds. Springer-Verlag, Berlin. pp Legendre, P. and Legendre, L.(1998). Numerical Ecology. Elservier. NY. 853pp. Patil, G.P. and Taillie, C.(2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11, Urban, D.L. (2004). Multivariate Analysis. Nonhierarchical Agglomeration. Spatially Constrained Classification.


Download ppt "Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics G.P.Patil, Penn State University Reza Modarres,"

Similar presentations


Ads by Google