Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics G.P.Patil, Penn State University Reza Modarres,

Slides:



Advertisements
Similar presentations
© CSCOPE 2009 Introduction to World Geography. © CSCOPE 2009 Geography is the study of place and space: Geographers look at where things are and why they.
Advertisements

Raster Based GIS Analysis
Spatial Mining.
Topographic Maps.
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
IPCC WGII Third Assessment Report – Regional Issues with Emphasis on Developing Countries of Africa Paul V. Desanker (Malawi) Coordinating Lead Author.
GIS in Real Estate Phil Hurvitz CAUP-Urban Form Lab April 13, 2005.
Remote Sensing of Urban Landscape Lecture 11 November 10, 2004.
8. Geographic Data Modeling. Outline Definitions Data models / modeling GIS data models – Topology.
The Implementation of Land and Ecosystem Accounts in Europe Towards integrated land and ecosystem accounting Roy Haines-Young, University of Nottingham.
Spatially Assessing Model Error Using Geographically Weighted Regression Shawn Laffan Geography Dept ANU.
1 Spatial Data Models and Structure. 2 Part 1: Basic Geographic Concepts Real world -> Digital Environment –GIS data represent a simplified view of physical.
GOS Economic Model (GEM) Overview Uses the same underlying simulation software (Stella) which was used in developing TNM Economic Model (NB-Sim) Provides.
Defining Landscapes Forman and Godron (1986): A
The Statistical Urban Zoning. The Experience of the Municipality of Firenze La zonizzazione statistica in ambito urbano. L’esperienza del Comune di Firenze.
GIS September 27, Announcements Next lecture is on October 18th (read chapters 9 and 10) Next lecture is on October 18th (read chapters 9 and 10)
Introduction to World Geography ©2012, TESCCC. Geography is the study of place and space: Geographers look at where things are and why they are there.
The Accuracy of Raster Data Tree Height Study in Prince George, Virginia Aerial Images Inaccurate Raster Data Created Using the LAS Dataset to Raster Tool.
Real-Time Hierarchical Scene Segmentation and Classification Andre Uckermann, Christof Elbrechter, Robert Haschke and Helge Ritter John Grossmann.
INDIAN SCIENCE CONGRESS Mumbai 2015 Actuarial Science Symposium G. P. Patil Penn State University, University Park, PA USA.
1 RTI SYMPOSIUM on HOMELAND and HEALTH SECURITY Biosurveillance Geoinformatics of Hotspot Detection and Prioritization for Biosecurity G. P. Patil November.
Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems.
NIEHS G. P. Patil. This report is very disappointing. What kind of software are you using?
Spatial Scan Statistic for Geographical and Network Hotspot Detection C. Taillie and G. P. Patil Center for Statistical Ecology and Environmental Statistics.
NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.
1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.
1 Cleveland Clinic G. P. Patil October 8, 2004 Cleveland.
This report is very disappointing. What kind of software are you using?
Myers, W. L., Bishop, J., Brooks, R., and Patil, G. P. (2001). Composite spatial indexing of regional habitat importance. Community Ecology, 2(2), 213—220.
Motivation, Description, and Timeliness Geoinformatics for spatial and temporal hotspot detection and prioritization is a critical need for.
Project Geoinformatic Surveillance NSF DGP Grant G. P. Patil, Penn State, PI EPA: Watershed Characterization and Prioritization PADOH: Disease Clusters.
1 Surveillance GeoInformatics Hotspot Detection, Prioritization, and Early Warning G. P. Patil December 2004 – January 2005.
1 Bivariate Hotspot Detection The circle-based SaTScan and data- driven ULS scan statistic are designed to identify hotspots based on the elevated responses.
1 Seattle JSM Session G. P. Patil August 7, 2006.
Hotspot Detection, Delineation, and Prioritization for Geographic Surveillance and Early Warning Organizer and Chair : G. P. Patil  2:00—2:05 Chair 
1 NJ DHSS CES SEER G. P. Patil January 17, This report is very disappointing. What kind of software are you using?
Albany New York (1) G. P. Patil. Albany New York (2) G. P. Patil.
Multiscale Raster Map Analysis for Sustainble Environment and Development A Research and Outreach Prospectus of Advanced Mathematical, Statistical and.
1 Seattle JSM Session G. P. Patil August 6, 2006.
Geographic and Network Surveillance for Arbitrarily Shaped Hotspots Overview Geospatial Surveillance Upper Level Set Scan Statistic System Spatial-Temporal.
1 Annual Digital Government Research Conference San Diego, CA Project Highlights G.P. Patil May 2006.
Comparative Knowledge Discovery with Partial Order and Composite Indicator Partial Order Ranking of Objects with Weights for Indicators and Its Representability.
1 Multi-criterion Ranking and Poset Prioritization G. P. Patil December 2004 – January 2005.
JalaSRI Consortium Delhi – Jalgaon Workshop TERI U G.P. Patil June 1, 2009.
1 Spatial Temporal Surveillance. 2 3 Geographic Surveillance and Hotspot Detection for Homeland Security: Cyber Security and Computer Network Diagnostics.
A genetic algorithm for irregularly shaped spatial clusters Luiz Duczmal André L. F. Cançado Lupércio F. Bessegato 2005 Syndromic Surveillance Conference.
1 Fukuoka Conference, Japan G. P. Patil November 2005.
4.6.1 Upper Echelons of Surfaces

Chapter 1 This Is Geography
Health GeoInformatics
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
5/22/2018 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.
1.2. 5/28/2018 Austin(1): 65 of 87. Austin(2): 2 of 29.
Geoinformatics Seminar G. P. Patil March 2003
Workshop on Land Accounts and urban morphology, ETC-CE, 12 july 2006
EPA Presentation March 13,2003 G. P. Patil
12.14 Myers, W. L., Bishop, J., Brooks, R., and Patil, G. P. (2001).
Ohio USA Yr.2002 Population Quintiles
NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.
World Regional Geography January 13, 2010
Data Queries Raster & Vector Data Models
Introduction to Geographic Information Science
Chapter 1 This Is Geography
Pan-European Assessment of Riparian Zones
Geographic and Network Surveillance for Arbitrarily Shaped Hotspots
Albany New York (1) G. P. Patil
Geographic Concepts These are the ideas that link the studies in Geography together and give a focus for our investigations.
Comparing the Degree of Urbanization to the US Census Bureau’s Urbanized Areas, Urban Clusters, and Rural Areas Michael Ratcliffe, Michael Commons, and.
Forest health and global change
Presentation transcript:

Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics G.P.Patil, Penn State University Reza Modarres, George Washington University Pushkar Patankar, Penn State University Yun Cai, Penn State University W.L. Myers, Penn State University

Spatially Constrained Clustering Constrained clustering is part of a family of methods whose purpose is to delimit homogeneous regions on a univariate or multivariate surface, by forming blocks of pieces that are also adjacent in space or in time. (Legendre, 1987) Besides the similarity of measured variables, the clustering is spatially constrained if the cells within a cluster are adjacent to each other.

Motivations Spatially constrained clustering is useful in applications in …… Landscape Ecology Geospatial Environment Image Analysis Spatial Economics Public Policy in Geospace Geography of Disease ……

Comparing the Two Methods… The result of method I is better interpreted in the variable space, but it tends to give small patches and splinters. Method II yields tighter clusters, but it is harder to interpret. As far as programming is concerned, method I has programs ready to use, and it is more computationally efficient. Usually, the two methods do not yield the same result.

Spatially Constrained Clustering in Hotspot Detection Definition: A hotspot is that portion of the study region with an elevated risk of an adverse outcome. A major task of geospatial hotspot detection is to delineate areas that exhibit elevated risks over geographical regions. Upper Level Set (ULS) scan turns out to be a special case of spatially constrained clustering.

NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.

Geoinformatic hotspot surveillance system

Geographic and Network Surveillance for Arbitrarily Shaped Hotspots Center for Statistical Ecology and Environmental Statistics G.P. Patil, R. Acharya, W.L. Myers, P. Patankar, Y. Cai, and S.L. Rathbun The Penn State University, University Park, PA 16802 R. Modarres George Washington University, Washington, D.C. Overview Geospatial Surveillance Upper Level Set Scan Statistic System Spatial-Temporal Surveillance Typology of Space-Time Hotspots Hotspot Prioritization Ranking Without Having to Integrate Multiple Indicators Surveillance Geoinformatics for Hotspot Detection, Prioritization, Early Warning and Sustainable Management Upper Level Set Scan System Definition: A hotspot is that portion of the study region with an elevated risk of an adverse outcome Example: West Nile Virus First isolated in 1937, this mosquito born disease, indigenous to north Africa, the Middle East and west Asia was first introduced into the United States in 1999. Example: Lyme Disease Infections from the bacterium Borelia burgdorfei vectored by ticks from the genus Ixodes. Example: Human-environment indicator values for 16 European countries. Changing Connectivity of ULS as Level Drops g Comparison of ULS Scan with Cylindrical Scan ULS Scan Disease Count Quintiles Population Quintiles Year Disease Rates Cylindrical Scan Features of ULS Scan Statistic: Identifies arbitrarily shaped hotspots Applicable to data on a network Confidence sets and hotspot ratings Computationally efficient Generalizes to space-time scan 1997 1998 Haase Diagram Poset Prioritization System Objective: Prioritize or rank hotspots based on multiple indicator and stakeholder criteria without having to integrate indicators into an index, using Haase diagrams and partially ordered sets. Example: Prioritization of disease clusters with Multiple Indicators Disease Rate Quintiles Likelihood Quintiles 1999 2000 There are a total of 3,764,448 admissible linear extensions. The cumulative rank function for Sweden exceeds that of all remaining countries. The crf’s of all countries dominate that of Ireland. The remaining countries cannot be uniquely ordered based on their crf’s. Belgium, Netherlands and United Kingdom have identical crf’s. Comparison of ULS Scan with Circular Scan 2001 ULS Scan Circular Scan Admissible linear extensions are comprised of rankings compatible with the rankings of all indicators. Treating each linear extension as a voter, the cumulative rank function is obtained from the frequencies at which each object receives each rank. 2002 2003 The crf’s also form a partially ordered set. There are only 182 admissible linear extensions for this poset, yielding the cumulative rank function: Federal Agency Partnerships CDC DOD EPA NASA NIH NOAA USFS USGS Confidence set for ULS Hotspot Hotspot Membership Rating National Applications and Case Studies Biosurveillance Carbon Management Costal Management Community Infrastructure Crop Surveillance Disaster Management Disease Surveillance Ecosystem Health Environmental Justice Sensor Networks Robotic Networks Environmental Management Environmental Policy Homeland Security Invasive Species Poverty Policy Public Health Public Health and Environment Syndromic Surveillance Social Networks Stream Networks One more iteration yields the rankings in the data table.

ULS Connectivity Tree -- 1 Ingredients: Tessellation of a geographic region: Intensity value G on each cell. Determines a cellular (piece-wise constant) surface with G as elevation. Imagine surface initially inundated with water Water evaporates gradually exposing the surface which appears as islands in the sea How does connectivity (number of connected components) of the exposed surface change with falling water level? a c b k d e f g h i j a, b, c, … are cell labels

ULS Connectivity Tree -- 2 Think of the tessellated surface as a landform Initially the entire surface is under water As the water level recedes, more and more of the landform is exposed At each water level, cells are colored as follows: Green for previously exposed cells (green = vegetated) Yellow for newly exposed cells (yellow = sandy beach) Blue for unexposed cells (blue = under water) For each newly exposed cell, one of three things happens: New island emerges. Cell is a local maximum. Morse index=2. Connectivity increases. Existing island increases in size. Cell is not a critical point. Connectivity unchanged. Two (or more) islands are joined. Cell is a saddle point Morse index=1. Connectivity decreases.

ULS Connectivity Tree -- 3 Newly exposed island ULS Tree g j a f d h b c a i e k Island grows a g j f d h b,c b c a i e k

ULS Connectivity Tree -- 4 Second island appears ULS Tree a g j d f d h b,c b c a New leaf node (local maximum) i e k Both islands grow a g b,c j d f d h e b f,g c a i e k

ULS Connectivity Tree -- 5 ULS Tree Islands join – saddle point a b,c g j d f d h e f,g b c a i e h k Junction node a Exposed land grows b,c g d j f d h e f,g b a c i e h k Root node i,j,k

Changing Connectivity of ULS as Level Drops Hotspot zones at level g (Connected Components of upper level set)

ULS Connectivity Tree Schematic intensity “surface” N.B. Intensity surface is cellular (piece-wise constant), with only finitely many levels A, B, C are junction nodes where multiple zones coalesce into a single zone A B C

Demonstration Example Disease Rate

Demonstration Example Upper Level Set Tree: Method I 12 9 2 1 3 4 5 6 7 8 13 14 15 16 17 18 19 11 10 Level 1: 3 Level 2: 18 Level 3: 8 0 Level 4: 7 4 17 Level 5: 19 16 14 Level 6: 12 5 15 Level 7: 9 11 Level 8: 1 Level 9: 10 Level 10: 6 Level 11: 13 Level 12: 2

Demonstration Example Upper Level Set Tree: Method II 12 9 2 1 3 4 5 6 7 8 13 14 15 16 17 18 19 11 10 [3] [3,18] [3,18,0] [17] [17,16] [14] [17,16;14;15] [3,18,0,4,8,7,19,5;17,16,14,15;11] [8] [3,18,0,4] [3,18,0,4;8,7;19] [8,7]

The Two Methods Give the Same Results Using the ULS Tree The ULS tree is different from general classification methods in the way that it defines its similarity matrix. The similarity matrix is determined by the exceedance of incidence rates. It does not matter at what stage the spatial constraint is applied, either at the end or as you go along, the result is the same.

Maximum Likelihood Estimate and Resultant Cluster Partition

ULS Scan on Multivariate Data Spatial Coincidence In a scenario of multivariate data, ULS is operated as many times as the dimensions of the data, with every individual run of ULS operating only on one dimension. Finally the clusters are those that are the intersection of the clusters obtained by the runs of ULS.

Conclusion In many situations, the usual clustering analysis is not sufficient. The spatial structure needs to be taken into consideration and accounted for. Therefore, spatially constrained clustering has been introduced. It can be accomplished by method I and method II. These give rise to different results. In the hotspot detection work, the upper level set scan approach has been adopted. ULS is a spatially constrained clustering method based on upper level sets of rates. Similarity is defined by exceedance. It delivers identical clusters in both methods.

References De Soete, Carroll, G., J.D. and DeSarbo, W.S. (1987). Least squares algorithms for constructing constrained ultrametric and additive tree representations of symmetric proximity data. Journal of Classification, 4, pp155-173. BoundarySeer (2005). Software for Geographic Boundary Analysis. TerraSeer. Ann Arbor MI. Everitt, B., Landau,S., and Leese, M. (2001). Cluster Analysis. Arnold, London, pp161-164. Kulldorff, M. and Nagarwalla, N. (1995). Spatial disease clusters: detection and inference. Statistics in Medicine, 14, pp799–810. Lebart, L. (1978). Programme d’agrégation avec constraints (C.A.H. contiguïté). Cah. Anal. Donnée, 3, 275-287. Legendre, P. (1987). Constrained clustering in Developments in Numerical Ecology. P. and L. Legendre, eds. Springer-Verlag, Berlin. pp289-307. Legendre, P. and Legendre, L.(1998). Numerical Ecology. Elservier. NY. 853pp. Patil, G.P. and Taillie, C.(2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11, 183-197. Urban, D.L. (2004). Multivariate Analysis. Nonhierarchical Agglomeration. Spatially Constrained Classification. http://www.env.duke.edu/landscape/classes/env358/mv_pooling.pdf