Presentation is loading. Please wait.

Presentation is loading. Please wait.

DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey December 10-12, 2003.

Similar presentations


Presentation on theme: "DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey December 10-12, 2003."— Presentation transcript:

1

2 DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey December 10-12, 2003

3 Health Care Databases under HIPAA: Statistical Approaches to De-identification of Protected Health Information Judith E. Beach, Ph.D., Esq. Associate General Counsel, Regulatory Affairs Chief Privacy Officer Chair, Council on Data Protection and Council on Research Ethics

4 Outline  1.  1.Evolution of De-identification Standards – HIPAA Privacy Regulation  2.  2.De-identification Standards for Health Information in Research   a. Safe Harbor   b. Statistician Method  )  )HIPAA Provisions  )  )Quintiles Experience and Methodology   c. Limited Data Set  3.  3.Preemption of State laws on De-identification Standards for Health Information  4.  4.Health Information Privacy - Cases and Controversies

5 Evolution of De-Identification Standards in HIPAA Privacy Regulation

6 Federal Policy: De-Identification of Health Information  Government’s intent - to provide a balance of stringent standards flexible enough not to be a disincentive to use or disclose de-identified health information, wherever possible.  De-Identified health data is one of the best mechanisms for avoiding wrongful disclosure of Protected Health Information (PHI). See Draft (05/27/03) DHHS Policy and Procedure Manual “De-Identification Policy d11” (effective date 6/1/03) - applies to DHHS agencies: HIPAA covered health care components and Internal Business Associates 5

7 Federal Policy: Use of De-identified Health Data Rather than PHI for Research  “We [HHS] expressed the hope that covered entities, their business [associates] and others would make greater use of de-identified health information... when it is sufficient for the [research] purpose and that such practice would reduce the burden and the confidentiality concerns that result from the use of individually identifiable health information for some of these purposes.” [HHS, in final privacy rule, 65 Fed. Reg. at (Dec. 28, 2000), citing proposed privacy rule of Nov. 3, 1999] 6

8 HIPAA’s Jurisdiction  Individually Identifiable Health Information (IIHI):  A subset of health information, including demographic information, that identifies the individual or with respect to which there is a reasonable basis to believe the information can be used to identify the individual  Protected health information (PHI):  Means individually identifiable health information (IIHI = Health Information + Identifier) that is transmitted or maintained electronically, or transmitted or maintained in any other form or medium  An investigator who submits health claims would be a HIPAA covered entity (CE)  CE + Health Information + Identifier = PHI  CE + Identifier - Health Information = NOT PHI  Health Information + Identifier - CE = NOT PHI 7

9 De-identification Standards for Health Information in Research

10 De-identified Health Information  Definition: health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual. [45 CFR § (a)]  The Privacy Rule permits de-identification of PHI so that such information may be used and disclosed freely, without being subject to the Privacy Rule’s requirements.  Once de-identified, the data is out of the Privacy Rule. 9

11 HIPAA De-identification Standards  Two methods for the de-identification of health information:  “Safe Harbor” -- remove 18 specified identifiers - intended to provide a simple, definitive method for de-identifying health information with protection from litigation  “Statistician Method” -- retain some of the 18 safe harbor’s specified identifiers and demonstrate the standard is met if person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods, e.g., a Biostatistician, makes and documents that the risk of re-identification is very small. [45 CFR § ] 10

12 Limited Data Set  Final rule: added another method requiring removal of facial identifiers -- “Limited Data Set”  Under confidentiality agreements - for research, public health, and health care operations  Regarded as PHI - NOT de-identified  therefore, still subject to Privacy Rule requirements such as minimum necessary rule. 11

13 Safe Harbor Method

14 Safe Harbor  Covered entities must remove all of a list of 18 enumerated identifiers and have no actual knowledge that the information remaining could be used alone or in combination to identify a subject of the information.  The identifiers to be removed include  direct identifiers such as name, address, SSN  indirect identifiers such as birth date, admission and discharge dates, and five-digit zip code  [45 CFR § (b)(2)] 13

15 Safe Harbor The safe harbor does allow for the disclosure of  All geographic subdivisions no smaller than a State, as well as the initial three digits of a zip code  IF the geographic unit formed by combining all zip codes with the same initial three digits contains more than 20,000 people  AGE, if less than 90, gender, ethnicity and other demographic information not listed. 14

16 Safe Harbor’s 18 Identifiers   Names   All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes   Except for the initial three digits of a zip code if according to the currently available data from the Bureau of the Census:   The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and   The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people are changed to 000;   All elements of dates (except year) or dates directly relating to an individual, including:   birth date, admission date, discharge date, date of death;   and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;   Telephone numbers;   Fax numbers;   Electronic mail addresses;   Social security numbers;   Medical record numbers;   Health plan beneficiary numbers;   Account numbers;   Certificate/license numbers;   Vehicle identifiers and serial numbers, including license plate numbers;   Device identifiers and serial numbers;   Web Universal Resource Locators (URLs);   Internet Protocol (IP) address numbers;   Biometric identifiers, including finger and voice prints;   Full face photographic images and any comparable images; and   Any other unique identifying number, characteristic, or code. 15

17 Sources of Authority   In Privacy Rule Preamble, HHS recognizes two sources of authority as to what constitutes such principles and methods for de- identification adequate for posting a de- identified database on the Internet [65 Fed. Reg. at 82, ,710 (Dec. 28, 2000)]   “Paper 22”: Statistical Policy Working Paper 22—Report on Statistical Disclosure Limitation Methodology   “The Checklist”: The Checklist on Disclosure Potential of Proposed Data Releases -“intended primarily for use in the development of public-use data products.” 16

18 Safe Harbor  BUT many researchers and other groups have complained that the Safe Harbor renders the de-identified data as virtually useless for research so that the result will be MORE research using PHI.  No dates of service, no patient initials, no date of birth  Can have “deltas” such as number of patient visits over time  However, the safe harbor was NOT designed for research, but to provide an approved method of de-identification for any purpose by any covered entity, regardless of sophistication.  For instance, such de-identified data would be deemed to be safely posted on the Internet. 17

19 Statistician Method

20 For this method, the covered entity  must remove all direct identifiers  reduce the number of variables on which a match might be made  should limit the distribution of records through a “data use agreement” or “restricted access agreement” [65 Fed. Reg. at 82, (Dec. 28, 2000)] 19

21 Opinion of Statistician  Statistician must  determine that there is a “very small risk” of re- identification  after applying “generally accepted statistical and scientific principles and methods for rendering information not individually identifiable”  documents the methods and results of the analysis that justify such determination. [45 CFR (b)(1)] 20

22 Statistician Method  This method has been generally ignored by covered entities.  Who prefer a safe harbor approach with “safe” being the operative word.  Consider the Statistician alternative as too complicated. 21

23 Statistician Method: Quintiles Experience  An expert statistician calculated the statistical likelihood of re-identification IF all 18 safe harbor identifiers were removed, that is, the “de- identification probability.”  Then, the statistician calculated the likelihood of re-identification if certain dates of service of medical or pharmacy claims were retained  And rather than age or year of birth, which is allowed in the safe harbor, the month and year of birth was included. 22

24 Statistician’s Opinion  This calculated number, the “de-identification probability” served as a benchmark of a “very small risk of re-identification” against which the statistician method would be compared. 23

25 Analysis: Comparison of Both Methods To ensure the statistical likelihood of re-identification was comparable to that of the calculated safe harbor benchmark, the following data fields were made stricter than as permitted by the safe harbor:  For all patients older than 85 years of age (rather than 90), the year of their birth modified to make them all 85 years old.  All five-digit patient zip codes truncated to first 3 digits and further merged so that no resulting 3 digit code has a total population of less than 200,

26 Factors Considered by Statistician In the analysis, the statistician pointed out the obvious:  The de-identified data received is conveyed under a confidentiality agreement, which specifically prohibits re- identification or further disclosure of the data except in statistically aggregated form.  The database is maintained on a physically and technically secure, password-protected server. 25

27 Statistician’s Opinion “Applying generally accepted statistical and scientific principles and methods for rendering information not individually identifiable,... I conclude that the risk is very small that the information... could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information.... In practice the actual reidentification probabilities are much, much lower... arguably de minimis.” “Applying generally accepted statistical and scientific principles and methods for rendering information not individually identifiable,... I conclude that the risk is very small that the information... could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information.... In practice the actual reidentification probabilities are much, much lower... arguably de minimis.” 26

28 Statistician Method  It is clear that most persons who have reviewed the Privacy Rule have failed to appreciate the significance of the statistician opinion to de- identification, and, instead, have focused almost exclusively on the "safe harbor."  In particular, many have failed to understand the importance of the "restricted access" as it relates to the statistician opinion approach to de-identification. 27

29 Ensuring HIPAA Compliance Data DataWarehouse Warehouse Data Encryption Process Patient identifiable electronic healthcare claims (standard health claims data fields) De-identified data All data handled is de-identified using a unique patient identifier that is irreversibly encrypted. * zip = 3 digit ** DOB = modified Upon completion of the de-identification process a unique patient identifier is created, which is irreversibly encrypted. 28

30 Core Data Elements Pharmacy Data Medical Data Jan ‘98 - to date July ‘98 - to date RX Pharmacy Data (NCPDP) RX Pharmacy Data (NCPDP)  Anonymous Patient ID  Patient Age & Gender  Date Written  Date Filled  NDC Code  Quantity Dispensed  Days Supply  Refill Flag  Prescribing Physician  Pharmacy  Payor Type^  Anonymous Patient ID  Patient Age & Gender  Date Written  Date Filled  NDC Code  Quantity Dispensed  Days Supply  Refill Flag  Prescribing Physician  Pharmacy  Payor Type^ MX Provider Data (HCFA 1500) MX Provider Data (HCFA 1500)  Anonymous Patient ID  Patient Age & Gender  Diagnosis Codes (ICD9)  Procedure Codes (CPT)  Service Dates  Physician/Provider ID  Location of Care  Payor Type  Anonymous Patient ID  Patient Age & Gender  Diagnosis Codes (ICD9)  Procedure Codes (CPT)  Service Dates  Physician/Provider ID  Location of Care  Payor Type HX Facility Data (UB-92) HX Facility Data (UB-92)  Anonymous Patient ID  Patient Age & Gender  Diagnosis Codes (ICD9)  Procedure Codes (CPT)  DRG  Admit Date  Discharge Date  Physician/Provider ID  Location of Care  Payor Type  Anonymous Patient ID  Patient Age & Gender  Diagnosis Codes (ICD9)  Procedure Codes (CPT)  DRG  Admit Date  Discharge Date  Physician/Provider ID  Location of Care  Payor Type ^Note: Payor Type not available on all records 29

31 Physician Demographics  Specialty  Region  Number of years in practice  Prescribing volume  Type of practice  Number of HMO / PPO / IPA affiliations  % patient volume by insurance type  Physician race  Physician age 30

32 Patient Characteristics  Location of contact  Height and weight  Age  Gender  Race  Blood pressure  Cholesterol levels (total, HDL, LDL, triglycerides)  Insurance type  Physician reimbursement method (fee-for-service vs. capitation)  Smoker or non-smoker 31

33 Disease Entities  Visits (with and without drugs)  Visits per physician per year  Total patients seeking treatment  Newly diagnosed patients  Visit type (first vs. subsequent)  Referrals and referring specialty  Severity of condition  Tests ordered or completed during visit  Existing medical conditions not treated  Number of times seen and days since last visit  Number of patient drug requests for condition 32

34 Treatment Regimens  Dosage form, strength and signa  Formulary impact  Quantity prescribed and number of refills (mean and frequency)  Weighted diagnosis value  Dispensing instructions  Occurrences per physician per year  Therapy type:  New  First-line versus adjunct therapy  Drug replacement and reason  Continued 33

35 Treatment Regimens  Desired action  Concomitant drugs (to treat same diagnosis)  Concurrent drugs (regardless of diagnosis)  Drug issuance  Sample days of therapy (mean and frequency)  Prescribed days of therapy (mean and frequency)  Daily average consumption (DACON)  Non-drug therapy 34

36 Limited Data Set (LDS)

37 HHS’ Solution: Limited Data Set  For research, public health, or health care operations purposes  Authorization not required  A limited data use agreement must be in place between the covered entity and the recipient of limited data set (LDS) [45 CFR § (e)] “Data Use Agreements would only be needed for those public health, research, or health care operation uses and disclosures that are not otherwise permitted by federal or state laws.” [See Draft (05/27/03) DHHS Policy and Procedure Manual “De- Identification Policy d11”] 36

38 LDS = Still PHI  Regarded as PHI, that is, not de-identified data and, therefore subject to requirements for protection of PHI such as  Prohibits re-identification or any attempt to contact individuals by recipient  BUT re-identification code permitted for covered entity  Subject to minimum necessary standards  BUT no accounting of disclosures or IRB approval 37

39 Limited Data Set Specifications  May be useful for records-based research such as epidemiological and other population research  But may NOT be useful for patient recruitment  Because re-identification of individuals or attempt to contact individuals is prohibited by a third party even if by Researcher (without IRB or internal privacy board approval) unless the contact is made by the Covered Entity or the Covered Entity’s Workforce. 38

40 LDS: Remove 16 Identifiers  Name  Postal address information (other than city, state, zip code)  Telephone number  Fax number  address  Social Security Number  Medical record / prescription numbers  Health plan beneficiary numbers  Account numbers  Certificate / license numbers  Vehicle identity / serial numbers  Device numbers  Web URL  IP address  Biometric identifiers (e.g., fingerprints, retinal scans)  Full face similar photographic images 39 [45 CFR § (e)(2)]

41 LDS: Retain Indirect Identifiers  Five-digit zip code  Dates of service (e.g., admission / discharge)  Dates of birth and death  Geographic subdivision (e.g., state, county, city, precinct), but not street address 40

42 Statistical Method for Dummies “Limited Data Set”... the Statistician Method made easy. 41

43 Preemption of State Laws on De- identification Standards for Health Information

44 Preemption of De-identification Standards - A View  HIPAA Statute and privacy regulation  Preemption of state law only if  The provision of state law relates to the privacy of individually identifiable health information  HIPAA Statute § 1178 AND 45 CFR §§

45 Preemption of State Law: HIPAA Statute  Health information considered identifiable and, therefore, subject to all requirements of rule ONLY if “reasonable basis to believe that the information can be used to identify the individual.”  Exception to preemption - when states can assert contrary and more stringent definition of “individually identifiable health information”  But exception analysis does not apply to de-identified data 44

46 Preemption: Deidentification Standards  Thus, states would be preempted from enforcing a standard for deidentification that exceeds the “reasonable basis” definition of individually identifiable health information as established in HIPAA statute.  Note: in response to Quintiles’ written request, HHS responded by revising preemption section of the Rule to refer to “individually identifiable” health information rather than merely health information. 45

47 Privacy Cases & Controversies: De-identified Health Databases

48 U.S. Controversy  Quintiles Transnational Corp. v. WebMD  No demonstrable violation of HIPAA or other privacy law by transmission and aggregation of deidentified health data  Inhibits additional state regulation of national electronic data system  Order of Judge Terrence Boyle.  Re de-identified data: “the Dormant Commerce Clause prevents the individual states from regulating the interstate transmission of data.”  [No. 5:01-CV-180-BO(3), U.S. EDNC Western Division] 47

49 UK Controversy  Regina v. Department of Health, Ex Parte Source Informatics Ltd.  Regina v. Department of Health, Ex Parte Source Informatics Ltd. [Judge Latham, 4 All ER 185, May 29, 1999; Case No. CO\4490\97, Queen’s Bench Division]  “The Protection [and] Use of Health Information.”  Judge Latham dismissed applicants' application for a Declaration that a policy document issued in March 1996 by the Department of Health “The Protection [and] Use of Health Information.” 48

50 UK: Source Informatics: Overturned on Appeal  Court of Appeals: Simon Brown, Aldous and Schiemann LJJ: 21 December 1999  Where a patient's identity was protected, it would not be a breach of confidence for general practitioners and pharmacists to disclose to a third party, without the patient's consent, the information contained in the patient's prescription form for marketing research purposes. 49

51 UK Health and Social Care Bill: Clause 65  Department of Health included language in the Health and Social Care Bill that would have essentially reinstated the lower court’s opinion (Judge Latham’s)  After heavy lobbying in the House of Lords against Clause 65, the language was defeated. 50

52 The key is... Safeguarding protected health information by encouraging use of federal standards for de- identification of health data for clinical research. Conclusion


Download ppt "DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey December 10-12, 2003."

Similar presentations


Ads by Google