2 Privacy in Organizational Processes Patient medical billsPatientinformationHospitalInsurance CompanyDrug CompanyAggregate anonymized patient informationAdvertisingComplex Processwithin a HospitalPUBLICPatient2
3 Transfer and Use Between Organizations Achieve organizational purpose while respecting privacy expectations in the transfer and use of personal information (individual and aggregate) within and across organizational boundaries
4 We Use the Health Data for Research in Many Aspects
5 Two Swords in Health Research Informed Consent FormDe-Identification目前臨床研究兩種方式，可以說服 IRB 和社會大衆，它們是有盡到人身和資料保護的方法第一種方式，就是受試者簽署 ICF，讓受試者(被研究者)預先或有條件的放棄他們本應擁有權力的隱私。第二種方式，是讓研究者無法透過資料的比對獲得受試者個人的病歷資料，進而傷害受試者的隱私。
6 HIPAA Background Commercial Healthcare Insurance Pharmaceutical Benefit Maker (Intruder)Health Maintain Organization holding hospitals’ stock share or M&A hospitalsResearch Fraud and Scandal of Clinical TrialsWho can market our medical record data?
7 Health Insurance Portability and Accountability Act HIPPA, enacted by US Congress in 1996Title I: Health Care Access, Portability, and RenewabilityTitle II: Preventing Health Care Fraud and Abuse; Administrative Simplification; Medical Liability ReformPrivacy RuleTransactions and Code Sets RuleSecurity RuleUnique Identifiers RuleEnforcement RuleHITECH Act: Privacy Requirements
11 HIPAA Privacy Rule and Research with De-identified Information (1) (1) Names (2) All geographic subdivisions smaller than a State, including: street, city, county, precinct, zip code - the first three digits of the zip code can be used if this geocode includes more than 20,000 people. If such geocode is less than 20,000 persons, "000" must be used as the zip code. (3) All elements of dates (except year) related to an individual, including birth date, admission date, discharge date, date of death. For individuals > 89 years of age, year of birth cannot be used - all elements must be aggregated into a category of 90 and older.
12 HIPAA Privacy Rule and Research with De-identified Information (2) (4) Telephone numbers (5) FAX numbers (6) Electronic mail addresses (7) SSN (8) Medical record numbers (9) Health plan beneficiary numbers (10) Account numbers (11) Certificate/license numbers(12) Vehicle identifiers and serial numbers, including license plates (13) Device identifiers and serial numbers (14) Web universal resource locators (URLs) (15) Internet protocol (IP) address (16) Biometric identifiers, including finger and voice prints (17) Full face photos, and comparable images (18) Any unique identifying number, characteristic or code and
13 Following the HIPAA Regulation Is it really a safe procedure to “de-identification” ? ( Yes or No )Are you sure that researchers can proceed their research after deleting these tags or codes ( Yes or No )
18 ExampleTo track those subjects of cervical cancer by comparing the ICD9 and SCC data ( Date, Tag and Result )Age and Location (Place) are very important influencing factors. Will this data-link-decoding spoil your research?
19 Categories of variables in a data set Directly Identifying VariablesQuasi-identifiersSensitive variablesSensitive Variable : like the financial or health status of an individual.How many sensitive variables are allowed in a limited database ?胡自強事件，選舉時病歷被公佈，來阻止胡自強當選一事
23 Quasi-Identifier Date of Birth (DoB) DoB – Month and Year Day, Month and Year of Admission, Discharge or OperationGenderInitialsAddressCityRegionPostal Code
24 The Difference Anonymous Confidential De-identified The IRB often finds that the terms anonymous, confidential, and de-identified are used incorrectly. These terms are described below as they relate to an individual’s participation in the research and the way that their data are collected and maintained for analysis.
25 AnonymousIt is impossible to know whether or not an individual participated in the study directly.A study participant who is a member of a minority ethnic group might be identifiable from even a large data pool.Information regarding other unique individual characteristics (indirect identifiers) might make it possible to identify an individual from a pool of dataset.
26 Example ATaiwan Health Insurance Claim Data Set for Physician Behavior of Prescription in Commercial Use (PBMs know which physician prescribed their medications)
27 ConfidentialThe research team is obligated to protect the data from disclosure outside the research according to the terms of the research protocol and the informed consent document.In order to protect against accidental disclosure, the subject’s name or other identifiers should be stored separately from their research data and replaced with a unique code to create a new identity for the subject.Note that coded data are not anonymous.
28 Example BUse distrust or conflict mechanism between different individuals or branchesCongressmen and OfficersAccounting and Financial BranchMarket and SaleIRB and Researcher
29 De-identifiedWhen any direct or indirect identifiers or codes linking the data to the individual subject’s identity are destroyed.Data have been de-identified. There were no risk to re-identify. However, in the research aspect, there were a lot of details and facts would be ignored and loosed.
33 Re-IdentificationRe-Link with some identifier or quasi-identifier to access original identification.Evaluation the risk of re-identification is an attitude or consensus for a reviewer.
34 Limited or De-Identified Contract or not ?(Non-Disclosure Agreement)Regulation or not ?Expiated or Full Board ?Preservation or Time Period Available ?IndefiniteWith Date to be ExpiredDatabase Access Committee ?Database Administrator ?
35 A Perfect Data Security Management & Infrastructure HeuristicsA Perfect Data Security Management & InfrastructureIRB Role and Review FAQ
37 Are subjects identifiable by their age, gender, and residence ? 原住民、少數族群、特殊疾病，能夠透過不同資料庫的比對，讓受試者或被研究者的個人資料重新再被連結。某些研究需要年紀、性別和居住地的資料，年紀可以限制在一定的 Interval，如 10年、5年為一個單位，ZIPCode 要重新編碼
38 Can a person be re-identified from their diagnosis code ? Many data sets also include diagnosis codes (for example, ICD-10 codes).Hospital medical record abstract data is almost publicly available.A set of diagnosis codes can make an individual very unique.Some of the records in the disclosed data set have diagnosis codes for rare and visible diseases/conditionsIcd10 比 ICD 9 更容易 Identifiable
39 Can a claim database be used for re-identification ? A lot of literature makes the point that claim database can be used for re-identification. However, the accuracy of this statement will depend on your jurisdiction.Other sources of public information they can still be very useful for re-identification.
40 Can individuals be re-identified from disease maps ?
41 Do these maps risk identifying any of the individuals ? There are three questions that need to be answered to determine the risk:Is the disease visible ?Is the disease rare in the geography ?If I re-identify an individual, will I learn something new about them ?
42 Can postal codes re-identify individuals ? 5 codes are the smallest geographic unit that is used by Taiwan post to deliver mail. In a health care context they are the most common geographic unit because that is what patients know and are able to provide.The postal code is the only demographic information that is being disclosed in this data set.The smallest postal codes in all provinces and territories have very few people living there. Any information about the postal code would pertain to a very small number of individuals.
43 Definition of identifiable dataset if a person can find their record(s) in the dataset Who is most sensitive to a data de-identification ? (Individual or reviewer)Best de-identification of dataset is that a individual cannot point out his/her record.
44 How can I de-identify longitudinal records ? Time Series Record is just a DNA (unique)-sequential dataset.It can easily re-identified.It should be considered a limited database.Intervals are less likely to be unique than actual dates.
45 How can I safely release data to multiple researchers? Re-numberingRe-rankingDifferent SamplingShuffle your data before disclosureStrong dis-incentive to match the two data setsChange (Say 0.4 to 40%, English style to metric)
46 Is sampling sufficient to de-identify a data set ? Not only statistical significance but also risk re-identification would be taken into consideration.Intruder may not know their target within disclosure databaseSampling fraction if it is higher ? (Similar as public database)
47 Is there a secondary use market for health information ? Yes or NoPharmaceutical Benefit MakerPrivate Health Insurance ServiceOther service ( Women and Children)
48 Should de-identified data go through a research ethics review ? In the first approach the IRB form has a checkbox question asking the investigator if the data is de- identified. (UM forms)If the investigator checks that box then the IRB does not review the protocol and it is automatically approved.The reasoning is that it is de-identified data and therefore there is no requirement to review the protocol.
49 Should IRBs decide if a data set is de-identified ? Yes or No ? (No)We don’t have a privacy expert.Whether a particular data set is identifiable, and resolving any re-identification risk concerns is iterative.If these interactions are attempted they can be very slow and consequently frustrating.IRB 沒那麼厲害，管天管地又要管人…
50 Should we de-identify if technology is moving so fast ? Re-Identification technology moves faster than De-Identification道高一尺，魔高一丈Educations for data security is cheaper than new technology.High technology stands for high risk
51 The difference between consenters and non-consenters Secondary use of previous dataset which is contributed from previous consenter. (對原提供者有益)Should the data of drop out consenter would be included (Consenter 是否認同提供)?Were there any words of consent to use his/her personal information found in the ICF. (Usually, to agree specimen but no personal Information)Non-consenter would be reviewed by a data access center or the privacy expert. (沒有同意資料使用)資料使用必須要對原提供者是有益的，如果是有害的呢使用者是否認同後續的提供，如果他不贊同呢，在 ICF 內他無法表達，因為沒有不贊同的選項通常他只有同意標本的使用，但是沒有同意他的個資，所以未來在 ICF上，PI 如果有資料使用的可能，記住要加以註明。沒有同意個人資料使用的，一定要經過 Privacy Expert 或資料安全管理中心的 Review.
52 The five levels of identifiability Level 1. The full data set as is.Level 2. The names are replaced by fake names, the health insurance number is replaced with a fake number, and the street address field is removed altogether.Level 3. The data set at Level 2 also has the postal code generalized from six characters to five characters. The risk at Level 3 is the same as Level 2, but the organization believes it has de-identified the data and discloses it. Therefore, the organization is exposed.有其它的資料安全的分類方式Level 只有改名字
53 The five levels of identifiability Level 4. The data set at Level 3 is further modified by replacing the 5 digit postal code with a single character postal code, the date of birth is replaced by age, and the date of visit is replaced by the month of the visit. A re-identification risk assessment is then performed on this data set and the risk was found to be below a pre-specified threshold.Level 5. The number of individuals with a sexually transmitted disease.
56 What are the quasi-identifiers that I should use for managing risk What are the quasi-identifiers that I should use for managing risk ? (Neighborhood)Address and telephone information about the target individualHousehold and dwelling information (number of children, value of property, type of property)Key dates (births, deaths, weddings, admissions, discharges)Visible characteristics: gender, race, ethnicity, language spoken at home, weight, height, physical disabilitiesProfession除了 HIPAA 和前面所例出來的Quasi-Identifier那些可能是 Sensitive Identifier, 也就是可能在道德和法律上應該注意。電訪的內容，或者訪問的情境。…….
57 What are the quasi-identifiers that I should use for managing prosecutor risk ? (Ex-Spouse) The same things that a neighbor would knowBasic medical history (allergies, chronic diseases)Income, Years of schooling其它的如前妻，前夫的問題，包含…
58 What de-identification software tools are there ? The PARAT tool from Privacy Analytics implements comprehensive risk management for three types of identity disclosure risk.mu-Argus, developed by the Netherlands national statistical agency.The Cornell Anonymization Toolkit (CAT) implements a k-anonymity algorithm.The University of Texas at Dallas Anonymization Toolbox推薦一些 De-identification 的資訊產品。但前面我已經表達我的意見了，不要迷信這種東東…
59 Who cares about my medical records ? (Finance) Some medical records have financial information in them (e.g. information used for billing purposes)For example, date of birth, address, and mother's maiden name. It is used as pin or password frequently.Even if medical records do not have information in them that is suitable for financial fraud, if your record has information about your health insurance then it can be very valuable.醫療費用，使用項目，可以評估病人的經濟情況，或許衣服的穿著不會完全反應他的財富，但是醫療的使用，是比較好的指標。例如自費癌症藥品的使用。密碼的來源詐騙集團的來源，藥物王碌仙的
60 Who cares about my medical records ? (Media) If you ever become of interest to the media and they want to do a story on you or your family, then reporters may be interested in re-identifying records about you.Medical records are a good source of revenue if you are in the extortion business.Even if there is no financial impact, some people feel violated if there is a breach of privacy of their medical information and change their behavior by adopting privacy protective behaviors.There are a number of attempts to make your health information publicly (or at least very widely) available.辜家的事件….認父親，爭遺產的 ….
61 Other researchers’ questions What genes predict better prognosis or response to treatment?Can study these questions using cancer registries, claims data如果你的employee 癌症的資料，在你的手上。61
62 Ethical questionsMay you share (coded genome-wide) data with other researchers?May you use (coded genome-wide) data for additional research without consent?May you do whole genome sequencing on existing coded samples without consent?三個答案都是 (不可以)62
63 Ethical concernsMay you follow participants as prospective cohort using medical records without consent?With identifiers can link to cancer registry, Medicare claims databases第四個是醫療工作的大忌，尤其是癌症或者是健保資料庫63
64 Federal regulations on human subjects research IRB reviewInformed consentNot apply ifResearcher not interact with participant ANDInformation not “identifiable”研究者沒有和參加者互動，或者是這些資訊己經去連結64
65 What are de-identified data? 18 HIPAA specific identifiersOvert identifiers, including SSN, medical record numberGeographic data more precise than first 3 digits of zip codeDates except for yearBiometric identifiersAny other unique identifying characteristicStricter than Common RuleNeed these data elements to carry out researchLimited Data Set allows dates and zip code65
66 Ethical issues Informed consent When giving broad permission for future research, do donors appreciateWhole genome sequencing?Very sensitive downstream research?什麼都可以的 同意書，萬用的同意書我的是我的，你的也是我的。66
67 Sensitive research projects Some donors may object to researchGenetics of antisocial behaviorHuman evolutionBeliefs about group ancestry不屬於 IRB 的查審範圍，精神病，進化，族群67
68 Ethical issues Privacy and confidentiality Heightened concerns, particularly about whole genome sequencing不認為可以審查的部份68
69 Special concerns about genetic confidentiality Information considered particularly sensitiveAbout relatives and groupsHighly predictive of future illness“Future diaries”Annas, JAMA 1993.69
70 Genetic Information Nondiscrimination Act (2008) Remove barriers to genetic testingHealth insurers may notUse genetic information to set eligibility or premiumsRequire or request genetic testing不可以用基因檢查來作的研究70
71 Genetic Information Nondiscrimination Act (2008) Employers may notUse genetic information in employment or promotion decisionsRequire or request genetic testing71
72 Limitations of GINAAfter job offer, employer may request medical recordsImpractical to delete genetic informationNot apply to disability, life, long-term care insuranceAdverse selection if individual rating72
73 HIPAA fails to protect privacy Weak security protectionsApplies only to “covered entities”Protection does not follow information technology advancingIRBs not oversee once research is initiated73