Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

Similar presentations


Presentation on theme: "Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi."— Presentation transcript:

1 Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi 22 May 2003 © The Trustees of Columbia University in the City of New York

2 2 Presentation Overview Issues Trends and Examples Data Center-based Responses Benefits from Appropriate Data Center Responses

3 3 Issues Why privacy and confidentiality? Privacy and confidentiality and spatial data Why use spatial data

4 4 Restate the issue of privacy and confidentiality Researchers and users of data have a legal and moral responsibility to protect the privacy and confidentiality of individuals participating in research.

5 5 Personal Identifying Information and Spatial Data Typical case is not the spatial data itself, but the mapping of sensitive information in a way that potentially allows a subject to be identified or the integration of different data that allows for the potential identification of individual respondents

6 6 Why integrate or use spatial data? Re-evaluation of social or health data in a geospatial framework Evaluating spatial patterns is only a first step Analysis of these data with geographically specified and environmental parameters Geographic parameters have often been implicit E.g., county of residence New technologies—like global position systems—make geographic parameters explicit E.g., lat-long coordinates

7 7 Usages of linked micro-level data Data applications At the individual-level: exact locations known Confidentiality a clear concern, even with masked identifiers (remove names) Even when grouped (e.g., in sample clusters) At different scales: aggregating up Why isn’t this enough? Or, when it is enough?

8 8 Trends and Examples Accessibility of higher resolution data is increasing Ubiquity of GIS technology Demographic Health Survey

9 9 Easily Accessible High Resolution Data http://terraserver.microsoft.com/ Lamont-Doherty Earth Observatory, Palisades, NY

10 10 From Space Imaging (http://www.spaceimaging.com/) Tornado Damage, Oklahoma City, May 8, 2003 One-meter IKONOS

11 11 Examples with Demographic and Health Survey (DHS) data 100 surveys in roughly 75 countries (1984-present) 45 with GPS data in 30 countries (late-90s to present) Mostly in Africa GPS points taken at population center of cluster (or enumeration area) Roughly 30 households per cluster Ranges from a single building in an urban area to 250 km2 area in sparsely populated areas Survey content includes highly sensitive subjects: Births Deaths Contraceptive use HIV knowledge, preventative measures and blood samples Household assets Data are publicly and freely available with request

12 12 Case for integrating geospatial data with health data: DHS Clusters & Aridity Zones West Africa

13 13 Overlaying satellite imagery Moderate resolutions—roughly 30 meters 2 —e.g., Landsat Gives a good indication of vegetation, land use change, some vector habitats Gives general indication of DHS clusters, difficult to determine precise location of cluster High resolution—4 meters 2 —e.g., Quickbird Indicates vegetation, roads, bridges and built environments Even exact buildings Could easily be mapped with street-location data

14 14 Landsat Quickbird

15 15

16 16

17 17

18 18

19 19 Frequency of cluster size Ranges from 2 to 36 persons per cluster

20 20 HIV/AIDS testing Three recent DHS surveys have conducted testing among a subsample of surveyed women age 15-49 and men age 15- 59, becoming some of the first, nationally representative survey data to include biomarker testing for HIV/AIDS: Mali, Dominican Republic, Zambia HIV tests were "anonymized“ or “delinked” so that the results of the tests could not be linked back to the individual data file in order to preserve confidentiality of respondents Coupons were provided to the respondents to obtain testing themselves if they wished, along with counseling services Results then relinked to original survey but with random IDs Source: L. Montana: 2003

21 21 Adding spatial noise 2 km urban Increases the potential number of hhs from 260 to 2,340 Adds 9 EA for every sampled EA 5 km rural Increases the potential number of hhs from 214 to 2,568 Adds 12 EA for every sampled EA EA = Enumeration Areas, Malawi

22 22 Methodological Questions How much error is introduced by these buffers? Especially if these buffers are within the spatial error of some overlaying data sets. Does spatial noise compound “tabular” noise? Can we a priori predict all the possible permutations with newly available data?

23 23 Data Center Responses – 3Ps and a K Policies Procedures People Knowledge

24 24 Policies To control data Sensitize personnel and end-users

25 25 Procedures Restricting access to data through a controlled environment Promote data “enclave model” whereby individual researchers may visit “safe” site for full access to confidential data Consider developing virtual data environments to extract and use micro-level data while protecting confidentiality, e.g. IPUMS at University of Minnesota Documenting confidentiality issues in metadata

26 26 People Staff Read and sign an agreement indicating a commitment to protect confidential data and to follow relevant procedures (similar to a computer use policy) Researchers Responsible use statement

27 27 Knowledge Transfer Support researchers and local IRBs by transmitting knowledge of potential confidentiality issues using spatial data Communicating the methods used to protect confidentiality in a data set, i.e. adding spatial noise

28 28 Benefits Protect respondents Further science Support researchers interface with local IRBs Create an “enclave” for the responsible use of confidential data products, e.g. US Census Data Centers Alternative model for conducting research, “getting out from behind your desk,” promotes scientific interactions and new ways of thinking


Download ppt "Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi."

Similar presentations


Ads by Google