Presentation on theme: "1 Laura Odwazny Senior Attorney Office of the General Counsel U.S. Department of Health and Human Services Research Using Data Mined from the Internet."— Presentation transcript:
1 Laura Odwazny Senior Attorney Office of the General Counsel U.S. Department of Health and Human Services Research Using Data Mined from the Internet -- Regulatory Considerations DOE CIRB meeting June 14, 2012
2 Disclaimer This presentation does not constitute legal advice. The views expressed are the presenter’s own, and do not bind the U.S. Department of Health and Human Services or its components.
3 Do Note: OHRP has no guidance on Internet research specifically Many boards have separate guidelines and best practices for Internet research
4 Internet Research Internet research = research which utilizes the Internet to collect information through an online tool, such as an online survey; studies about how people use the Internet, e.g., through collecting data and/or examining activities in or on any online environments; and/or, uses of online datasets, databases, databanks, repositories. –Internet as a TOOL FOR research or… –Internet as a MEDIUM/LOCALE OF research TOOL=search engines, databases, catalogs, etc… MEDIUM/LOCALE=chat rooms, newsgroups, home pages, multi- player gaming sites, blogs, skype, tweeting, online course software, etc
5 Forms of Research: Exploring Where Human Subjects Fit Consider Methodologies, Venues, Types of Data Generated through: Quantitative Research – Data Aggregation, Scraping, Transaction Log Analysis, Network Analysis, Statistical Analysis etc Qualitative Research – Ethnography, Focus Groups, Observation, Surveys, Content/Discourse Analysis, etc 5
6 Forms of Internet Research Venues , IM, tweets Listserves, chat rooms Search engines, other archives Social network sites, media sharing sites Blogs and home pages Virtual worlds Online marketplaces, online gaming Databanks, repositories Venues other than “place-based), e.g. mobile data collection
7 E-Data Raises New Ethical Challenges Trackability –“Dataveillance” = data monitoring+ recording “Greased” –“ When information is computerized, it is greased to slide easily and quickly to many ports of call. But legitimate concerns about privacy arise when this speed and convenience lead to the improper exposure of information. Greased information is information that moves like lightning and is hard to hold onto.” Malleability –Can be utilized in varied ways for multiple purposes Invisibility Factor –Computer operations usually invisible; can allow for abuse James Moor, 1985
9 Online Support Groups
11 Twitter Blurs the boundaries between public/private Tweeter A (private) followed by Tweeter B (public) Tweeter B retweets A = Tweet A is now visible to Tweeter B’s public)feed Track-backability is increased; consider sensitivity, reputation, risk/benefit Archived Tweet Data fields: –country code: –id: –klout score –link: –location –coord type: –location coords: –location displayname: –location type: –posted time: –real name: –rule match: –tweet url:user twitter page: –username:
12 Regulatory considerations HEADER
13 Big regulatory issues… What is “private”? What is “identifiable”? How to protect subjects’ privacy and confidentiality interests? Minimizing risk when using sensitive online data –Current sensitivity vs. future sensitivity –Informational risks –Data security
14 OHRP’s Analytic Framework for the Common Rule: Always Start With… Is the activity subject to regulation? –Conducted or supported by a Common Rule agency? –Covered under an applicable FWA? Is it research? Does it involve human subjects? Is it exempt? Keep in mind regulatory flexibilities: –Can it be expedited? –Waiver of informed consent? –Waiver of documentation of consent?
15 Human subject. 102(f): “a living individual about whom an investigator conducting research obtains (1) data through intervention or interaction with the individual, or (2) identifiable private information… Private information includes information about behavior that occurs in a context in which an individual can reasonably assume that no observation or recording is taking place, and information which has been provided for specific purposes by an individual and which the individual can reasonably expect will not be made public (for example, a medical record). Private information must be individually identifiable (i.e., the identity of the subject is or may readily be ascertained by the investigator or associated with the information) in order for obtaining the information to constitute research involving human subjects. (emphasis added)
16 Privacy in the Internet age Private How to interpret “reasonably expect that no observation or recording is taking place” or “reasonably expect will not be made public” –IMs, tweets, , FB profile, chatroom discussions, listserves Must information be considered either “public” or “private”? –Members-only forum, community standards Shifting norms about what information is “private” What is a “reasonable” expectation of privacy in grid/Internet/e-data? –Expectations of privacy vs. actual privacy
17 How should the IRB assess privacy? What expectations of privacy are “reasonable”? –Get information about the environment –Get information about the users –Review Terms of Service –Data security consideration
18 Human subjects (2) Identifiable Individually identifiable = subject’s identity readily ascertainable by the investigator or associated with the information Structure of social network, search terms, purchase habits, movie ratings on Netflix may uniquely identify individual –Zip code + sex + DOB enough for Latanya Sweeney to identify Given demonstrated ability to reidentify individuals from anonymized or aggregated data, is this a meaningful decision point?
19 How should the IRB assess identifiability? When will the subject’s identity be “readily” ascertainable by the investigator or associated with the information? –Consider the investigator, e.g. Professor LaTanya Sweeney vs. Professor Elizabeth Buchanan –Consider the potential identifiers –Consider likelihood of reidentification with triangulation
20 Exemption.101(b)(4) Research involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects. HEADER
21 Exemption.101(b)(4) applied When is information “recorded in an identifiable manner” –Is an address an identifier? –Do tweets contain identifiers? –Does the inclusion of IP address make information identifiable? When are data, documents, or records publicly available on the internet? –Does “publicly available” include large datasets purchased/obtained from Google or Facebook? –What if data are semi-restricted -- available only to ‘friends’, listserve members?
22 Key Considerations for IRB Review What type of venue? Expectations of privacy? Consent procedures? Sensitivity of data? Harm/Risk? Age verification? Authentication of participants? Identification of participants? Use of encryption? Storage/transmission of data?
23 Other potential issues – international research PI is proposing to collect data from publically accessible social media sites, some of which are hosted by servers outside of the US. The PI will collect all data from his computer in the US. Is the activity international research?” (from IRB Forum) –Consider EU data protection directive, Canadian laws, etc. if applicable!
24 Stay tuned
25 ANPRM– Implications for Internet research Base concept of identifiability under Common Rule on HIPAA Privacy Rule standards of identifiability? Tor protect from informational risks (inappropriate use/disclosure of information), mandatory data security measures “modeled on” HIPAA? Apply Common Rule to all institutions receiving support from CR agency? No continuing review for most minimal risk research?
26 ANPRM – Proposals for “excused” research Additional requirements for “excused” (formerly exempt) research? –Registration –Consent, oral or written, depending, with waiver contemplated Oral w/o documentation for educational tests, surveys, focus groups, interviews –Data security standards –Retrospective auditing of portion of “excused” submissions
27 Proposal: Revised scope of existing exemption 4 Expansion of.101(b)(4) by removing “existing” and de-identified recording? –Keep collected for purposes other than the research
28 ANPRM – consent and exempt research Additional consent requirements for “excused” (formerly exempt) research? –Oral or written consent, depending, with waiver contemplated Oral w/o documentation for educational tests, surveys, focus groups, interviews (modifying exemption (b)(2)) Secondary use of data (modifying exemption (b)(4)) –originally collected for research purposes, consent required whether or not the researcher obtains identifiers –originally collected for non-research purposes, no change (no consent required unless identifiers are obtained)