Download presentation
Presentation is loading. Please wait.
1
Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi 22 May 2003 © The Trustees of Columbia University in the City of New York
2
2 Presentation Overview Issues Trends and Examples Data Center-based Responses Benefits from Appropriate Data Center Responses
3
3 Issues Why privacy and confidentiality? Privacy and confidentiality and spatial data Why use spatial data
4
4 Restate the issue of privacy and confidentiality Researchers and users of data have a legal and moral responsibility to protect the privacy and confidentiality of individuals participating in research.
5
5 Personal Identifying Information and Spatial Data Typical case is not the spatial data itself, but the mapping of sensitive information in a way that potentially allows a subject to be identified or the integration of different data that allows for the potential identification of individual respondents
6
6 Why integrate or use spatial data? Re-evaluation of social or health data in a geospatial framework Evaluating spatial patterns is only a first step Analysis of these data with geographically specified and environmental parameters Geographic parameters have often been implicit E.g., county of residence New technologies—like global position systems—make geographic parameters explicit E.g., lat-long coordinates
7
7 Usages of linked micro-level data Data applications At the individual-level: exact locations known Confidentiality a clear concern, even with masked identifiers (remove names) Even when grouped (e.g., in sample clusters) At different scales: aggregating up Why isn’t this enough? Or, when it is enough?
8
8 Trends and Examples Accessibility of higher resolution data is increasing Ubiquity of GIS technology Demographic Health Survey
9
9 Easily Accessible High Resolution Data http://terraserver.microsoft.com/ Lamont-Doherty Earth Observatory, Palisades, NY
10
10 From Space Imaging (http://www.spaceimaging.com/) Tornado Damage, Oklahoma City, May 8, 2003 One-meter IKONOS
11
11 Examples with Demographic and Health Survey (DHS) data 100 surveys in roughly 75 countries (1984-present) 45 with GPS data in 30 countries (late-90s to present) Mostly in Africa GPS points taken at population center of cluster (or enumeration area) Roughly 30 households per cluster Ranges from a single building in an urban area to 250 km2 area in sparsely populated areas Survey content includes highly sensitive subjects: Births Deaths Contraceptive use HIV knowledge, preventative measures and blood samples Household assets Data are publicly and freely available with request
12
12 Case for integrating geospatial data with health data: DHS Clusters & Aridity Zones West Africa
13
13 Overlaying satellite imagery Moderate resolutions—roughly 30 meters 2 —e.g., Landsat Gives a good indication of vegetation, land use change, some vector habitats Gives general indication of DHS clusters, difficult to determine precise location of cluster High resolution—4 meters 2 —e.g., Quickbird Indicates vegetation, roads, bridges and built environments Even exact buildings Could easily be mapped with street-location data
14
14 Landsat Quickbird
15
15
16
16
17
17
18
18
19
19 Frequency of cluster size Ranges from 2 to 36 persons per cluster
20
20 HIV/AIDS testing Three recent DHS surveys have conducted testing among a subsample of surveyed women age 15-49 and men age 15- 59, becoming some of the first, nationally representative survey data to include biomarker testing for HIV/AIDS: Mali, Dominican Republic, Zambia HIV tests were "anonymized“ or “delinked” so that the results of the tests could not be linked back to the individual data file in order to preserve confidentiality of respondents Coupons were provided to the respondents to obtain testing themselves if they wished, along with counseling services Results then relinked to original survey but with random IDs Source: L. Montana: 2003
21
21 Adding spatial noise 2 km urban Increases the potential number of hhs from 260 to 2,340 Adds 9 EA for every sampled EA 5 km rural Increases the potential number of hhs from 214 to 2,568 Adds 12 EA for every sampled EA EA = Enumeration Areas, Malawi
22
22 Methodological Questions How much error is introduced by these buffers? Especially if these buffers are within the spatial error of some overlaying data sets. Does spatial noise compound “tabular” noise? Can we a priori predict all the possible permutations with newly available data?
23
23 Data Center Responses – 3Ps and a K Policies Procedures People Knowledge
24
24 Policies To control data Sensitize personnel and end-users
25
25 Procedures Restricting access to data through a controlled environment Promote data “enclave model” whereby individual researchers may visit “safe” site for full access to confidential data Consider developing virtual data environments to extract and use micro-level data while protecting confidentiality, e.g. IPUMS at University of Minnesota Documenting confidentiality issues in metadata
26
26 People Staff Read and sign an agreement indicating a commitment to protect confidential data and to follow relevant procedures (similar to a computer use policy) Researchers Responsible use statement
27
27 Knowledge Transfer Support researchers and local IRBs by transmitting knowledge of potential confidentiality issues using spatial data Communicating the methods used to protect confidentiality in a data set, i.e. adding spatial noise
28
28 Benefits Protect respondents Further science Support researchers interface with local IRBs Create an “enclave” for the responsible use of confidential data products, e.g. US Census Data Centers Alternative model for conducting research, “getting out from behind your desk,” promotes scientific interactions and new ways of thinking
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.