De-identifying Pathology Reports for Pathology Informatics

Slides:



Advertisements
Similar presentations
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Advertisements

HIPAA and Public Health 2007 Epi Rapid Response Team Conference.
COBB/DOUGLAS COMMUNITY SERVICES BOARD Confidentiality and Privacy of Consumer Information.
HIPAA – Privacy Rule and Research USCRF Research Educational Series March 19, 2003.
Increasing public concern about loss of privacy Broad availability of information stored and exchanged in electronic format Concerns about genetic information.
WORKFORCE CONFIDENTIALITY HIPAA Reminders. HIPAA 101 The Health Insurance Portability and Accountability Act (HIPAA) protects patient privacy. HIPAA is.
HIPAA Health Insurance Portability and Accountability Act.
HIPAA Requirements for Patient Oriented Research
Informed Consent.
Health Insurance Portability & Accountability Act “HIPAA” To every patient, every time, we will provide the care that we would want for our own loved ones.
Professional Nursing Services.  Privacy and Security Training explains:  The requirements of the federal HIPAA/HITEC regulations, state privacy laws.
Protecting Client Data HIPAA, HITECH and PIPA Part 1A
HIPAA Training Presentation for New Employees How did we get here? HIPAA Police 1.
Privacy and Information Security Essentials
Nora B. McCann Privacy Manager Corporate Compliance Fox Chase Cancer Center
Electronic Health Records Danielle P. Berthelot, RHIA Director, Health Information Management and Cancer Registry Privacy Officer Woman’s Hospital.
1 HIPAA, Researchers and the IRB: Part Two Alan Homans, IRB Chair and Nancy Stalnaker, IRB Administrator.
SPECIAL DIABETES PROGRAM FOR INDIANS Competitive Grant Program Special Diabetes Program for Indians Competitive Grant Program SPECIAL DIABETES PROGRAM.
Registry 201 Excel Registry Training. Registry 201 Excel Registry Training Outline ► Important Information about PHI ► Getting to know you ► Excel Registry.
HIPAA, Researchers and the IRB Alan Homans, IRB Chair and Nancy Stalnaker, IRB Administrator.
Registry 201 Excel Registry Training. Registry 201 Excel Registry Training Outline ► Important Information about PHI ► Getting to know you ► Excel Training.
Public Aggregate Reporting – DHCS Business Reports Overview
HIPAA What’s Said Here – Stays Here…. WHAT IS HIPAA  Health Insurance Portability and Accountability Act  Purpose is to protect clients (patients)
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
HIPAA Health Insurance Portability & Accountability Act of 1996.
Health Insurance Portability and Accountability Act (HIPAA)
Electronic Customer Portal System. Reducing Risks – Increasing Efficiency – Lowering Costs Secure Internet based Communication Gateway direct to your.
Data Security and Research 101 Completing Required Forms Kimberly Summers, PharmD Assistant Chief for Clinical Research South Texas Veterans Health Care.
MIRC Clinical Trials Software Medical Imaging Resource Center.
Protected Health Information (PHI). Privileged Communication An exchange of information between two individuals in a confidential relationship. (Examples:
Paula Peyrani, MD Medical/Project Director, HIV Program at the 550 Clinic Assistant Director, Research Design and Development Clinical and Translational.
“ Technology Working For People” Intro to HIPAA and Small Practice Implementation.
HIPAA Business Associates Leadership Group Meeting June 28, 2001.
1 Research & Accounting for Disclosures March 12, 2008 Leslie J. Pfeffer, BS, CHP Office of the Vice President for Research Administration Office of Compliance.
Forms Management: Compliance, Security & Workflow Efficiencies.
Revised February 4, Health Insurance Portability and Accountability Act (HIPAA) HIPAA Privacy Rule: UCSF Education Module for Researchers, Research.
1 HIPAA OVERVIEW ETSU. 2 What is HIPAA? Health Insurance Portability and Accountability Act.
14 May Privacy Requirements Phoenix Ambulatory Blood Pressure Monitoring System © 2006 Christopher J. Adams Copying and distribution of this document.
HIPAA Privacy and Research August 21, 2015
Standards & Vocabulary
Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be.
PwC Tissue Banking and Repositories – Human Subject Protections Privacy Protections Medical Research Summit Tom Puglisi, Ph.D. Friday March 7 – 9:15 am.
HIPAA – How Will the Regulations Impact Research?.
Patient Data Security and Privacy Lecture # 7 PHCL 498 Amar Hijazi, Majed Alameel, Mona AlMehaid.
De-identification: A Critical Success Factor in Clinical and Population Research Steven Merahn MD Dee Lang, RHIT Prepared for 2007 APIII Pittsburgh, PA.
Configuring Electronic Health Records Privacy and Security in the US Lecture b This material (Comp11_Unit7b) was developed by Oregon Health & Science University.
PwC Issues in HIPAA Research Compliance William R. Braithwaite, MD, PhD “Dr. HIPAA” HIPAA Summit 6 Washington, DC 27 March 2003.
Teaching & POEMs and DOEs in an Online Classroom Jacob Reider, MD David C Ross Albany Medical College.
Final HIPAA Privacy Rule: The Research Provisions Julie Kaneshiro DHHS Office for Human Research Protections Phone: Fax:
Privacy: HIPAA Emerson Murphy-Hill. Rosie Callender, RHIA, web.msm.edu/hipaa/An%20Introduction%20to%20HIPAA.ppt What is HIPAA? A Federal Law Created in.
HIPAA and RESEARCH 5 th Thursday May 31, Page 2.
De-identification using Harvard Scrubber Umit Topaloglu, Ph.D.
Best-of-Breed Hybrid Methods for Text De-identification Yang H, Garibaldi JM. Automatic detection of protected health information from clinical narratives.
Reviewed by: Gunther Kohn Chief Information Officer, UB School of Dental Medicine Date: October 20, 2015 Approved by: Sarah L. Augustynek Compliance Officer,
UC Riverside Health Training and Development
HIPAA PRIVACY & SECURITY TRAINING
HIPAA Definitions What Does PHI Include?
Protecting our members, our company, and our selves
No No, Yes Yes: Simple Privacy & Information Security Tips Krista Barnes, J.D. Senior Legal Officer and Director, Privacy & Information Security, Institutional.
The Health Insurance Portability and Accountability Act
HIPAA & PHI TRAINING & AWARENESS
Issues in HIPAA Research Compliance
The Health Insurance Portability and Accountability Act
Open Data Sharing and its Statistical Limitations
Office of Audit, Compliance & Privacy
Case Study Template Kerecis Aurora Awards
Office of the Vice President for Research Human Subjects Protection Program IRB Submission Process Module 4 - Health Insurance Portability and Accountability.
The Health Insurance Portability and Accountability Act
From Baby Boomers to Millennials
Presentation transcript:

De-identifying Pathology Reports for Pathology Informatics James Gardner, Li Xiong Department of Math and Computer Science Fusheng Wang, Andrew Post, Joel Saltz Center for Comprehensive Informatics 1

Introduction The HIPAA Privacy Rule regulates the use and disclosure of Protected Health Information (PHI) De-identification of pathology reports is of critical importance in order to facilitate secondary use of medical records for research HIDE (Health Information DE-identification) is an open- source de-id tool based on advanced statistical based de- identification technologies While statistical learning based techniques have shown promising results for de-identification purposes, few such systems are publicly available. A comprehensive study evaluating the effects of different feature sets and potential impacts of sampling on extracting PHI from pathology reports.

HIPAA Identifiers These identifiers have to be removed or 1. Names; 2. All geographical subdivisions smaller than a state; 3. All elements of dates (except year); 4. Phone numbers; 5. Fax numbers; 6. Electronic mail addresses; 7. Social Security numbers; 8. Medical record numbers; 9. Health plan beneficiary numbers; 10. Account numbers; 11. Certificate/license numbers; 12. Vehicle identifiers and serial numbers; 13. Device identifiers and serial numbers; 14. Web Universal Resource Locators (URLs); 15. Internet Protocol (IP) address numbers; 16. Biometric identifiers, including finger and voice prints; 17. Full face photographic images or comparable images; and 18. Any other unique identifying number, characteristic, or code These identifiers have to be removed or Based on the opinion from an qualified statistical expert, the risk of identifying an individual is very small

HIDE Overview Utilizes the state-of-the-art named entity recognition technique, Conditional Random Fields, for extracting PHI Previous tools such as DE-ID and HMS scrubber use rule- based approaches which are labor intensive and not portable Provides flexible de-identification options including full de- identification and state-of-the-art statistical de-identification Previous tools allow simple removal or substitution of the PHI Provides an easy-to-use web-based interface that utilizes the latest web-technologies Integrated with caTIES, and caTissue (in progress)

PHI Extraction Utilizes state-of-the-art NLP technique, Conditional Random Fields High accuracy, easy to train, portable Combines different feature sets and sampling techniques Feature sets: dictionary, affix, regular expression and context Can use default models or custom trained models Web interface for annotating and training custom models A set of reports are loaded and manually labeled The labeled documents will generate a trained model for automatically de-identifying new reports

HIDE: De-identification Options Full de-identification safe-harbor, all 18 HIPAA identifiers removed or substituted Partial de-identification limited dataset, all direct HIPAA identifiers removed or substituted(not for dates, address other than street/P.O.Box) Configurable de-identification A configurable set of identifiers removed or substituted Statistical de-identification Advanced anonymization that guarantees rigorous statistically acceptable privacy while keeping the utility of the data An example of utility from statistical de-id results?

Statistical De-identification Example De-identification satisfying k-anonymity (k=2) (every record is indistinguishable in a group of records with size greater than or equal to k)

Study 1: PHI Extraction on Emory Pathology Reports The CRF classifier with a good feature set achieves good attribute extraction accuracy (100 reports,10-fold cross validation) Precision: true positives over the sum of true positives and false positives Recall (sensitivity): true positives over total actual positives F1: combination: 2*precision*recall/(precision+ recall)

Study 2: PHI Extraction on i2b2 Reports Based on 669 discharge summaries, 10-fold cross validation Good precision and recall for most individual PHI identifiers Good overall precision and recall for PHI extraction I2b2: Informatics for Integrating Biology and the Bedside

Study 3: Impact of Different Feature Sets The context features, the previous word, next work, etc., are the most important. Regular expression features; Affix features – prefix and suffix; Dictionary features (phrase or token); Context features: previous, next words, and occurrence counts Dictionary (d), affix (a), regular expression (r) and context (c) features are in order of increasing importance for statistical CRF based PHI extraction

Integrating HIDE with caTIES caTIES (cancer Text Information Extraction System) provides tools for de-identification and automated coding of free-text pathology reports caTIES provides de-id extensibility through implementing its CaTIES_DeIdentifier interface HIDEDeIdentifier, which calls HIDE client API Added HIDE de-id option in caTIES installer HIDE is bundled with caTIES since release v3.7 (May 2010)

Integrating HIDE with caTissue (in Progress) caTissue uses caTIES V2.x and refactored it into caTissue’s workflow HIDE integration with caTissue is similar to caTIES Implementation and evaluation under going Goal: Integration of pathology reports into caTissue installation at Winship Cancer Institute at Emory University

Continue development on HIDE/caTissue integration Ongoing Development Continue development on HIDE/caTissue integration Usability improvement: simplified installation progress System improvements Efficiency and scalability of the system Multiple file formats support Additional statistical de-identification options

HIDE Demo http://www.mathcs.emory.edu/hide/demos

Li Xiong (lxiong@mathcs.emory.edu) Thank you http://www.mathcs.emory.edu/hide Li Xiong (lxiong@mathcs.emory.edu)