Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping June 30, 2011 Jyoti Pathak,

Similar presentations


Presentation on theme: "Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping June 30, 2011 Jyoti Pathak,"— Presentation transcript:

1 Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping June 30, 2011 Jyoti Pathak, PhD Assistant Professor of Biomedical Informatics Department of Health Sciences Research

2 Project 3: Collaborators and Acknowledgments  CDISC (Clinical Data Interchange Standards Consortium) –Rebecca Kush, Landen Bain, Mark Arratoon  Centerphase Solutions –Gary Lubin, Jeff Tarlowe  Harvard University/MIT –Guergana Savova, Margarita Sordo, Peter Szolovits  IBM T.J. Watson Research Labs –Marshall Schor  Intermountain Healthcare/University of Utah –Susan Welch, Herman Post, Darin Wilcox, Peter Haug  Mayo Clinic –Cui Tao, Lacey Hart, Erin Martin, Sridhar Dwarkanath, Calvin Beebe, Kent Bailey, Kevin Bruce, Mike Conway (UCSD)

3 Outline  Background  On-going projects and updates  Proposed project ideas for Year 2  Productivity till date  Q & A

4 The Big Question…  The era of Genome-Wide Association Studies (GWAS) has arrived –Genotyping cost is asymptoting to free [Altman et al.] –Most (all?) published GWAS are done on carefully selected and uniformly characterized patient populations –Time consuming  Clinical Phenotyping, on the other hand, is lacking –Slow-throughput –Costly and time consuming  How “good” are EMRs (with inconsistencies and biases) as a source for phenotypes?

5 Why is this important now?  Bio-repositories are becoming popular –Linking biospecimens to personal health data  Population-based studies for genetic and environmental conditions and contributions to disease etiology –Often limited in scope or population diversity  Clinical trials eligibility –Cohort identification is always a bottleneck  Quality metrics and HITECH Act  Large-scale prospective cohort studies could be facilitated by availability of complete, standardized, and unbiased data from EMRs

6

7 Pros and Cons of EMR Data for Phenotyping  We have a LOT of information about subjects –Demographics, labs, meds, procedures… –Team diagnoses as opposed to a diagnoses based on a single person’s opinion –Potential for more reliable diagnoses –Identification of otherwise latent population differences  Possible issues with using EMR data for phenotyping –Non-standardized, heterogeneous, unstructured data –Measured (e.g., demographics) vs. un-measured (e.g., socio-economic status) population differences –Hospital specialization and coding practices –Population/regional market landscape

8 But…the challenges can be addressed…if we  Develop techniques for standardization and normalization of clinical data  Develop techniques for transforming and managing unstructured clinical text into structured representations  Develop techniques for resolving missing and inconsistent data  Develop a scalable, robust and flexible framework for demonstrating all of the above in a “real-world setting”

9 EMR-derived Phenotyping  Overarching goal –To develop techniques and algorithms that operate on normalized EMR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings  Phenotyping (from our perspective) –Inclusion and exclusion criteria for cohort identification –Numerator and denominator criteria for clinical quality metrics –Trigger criteria for clinical decision support –…

10 EMR-based Phenotype Algorithms  Typical components –Billing and diagnoses codes –Procedure codes –Labs –Medications –Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores) –Pathology –Imaging?  Organized into inclusion and exclusion criteria  Experience from eMERGE (http://www.gwas.net)http://www.gwas.net –Electronic Medical Records and Genomics Network

11 EMR-based Phenotype Algorithms  Iteratively refine case definitions through partial manual review to achieve ~PPV ≥ 95%)  For controls, exclude all potentially overlapping syndromes and possible matches; iteratively refine such that ~NPV ≥ 98%

12 Example: Type 2 Diabetes (cases)

13 Challenges  Algorithm design –Non-trivial; requires significant expert involvement –Highly iterative process –Time-consuming manual chart reviews –Representation of “phenotypic logic”  Data access and representation –Lack of unified vocabularies, data elements, and value sets –Questionable reliability of ICD & CPT codes (e,g., omit codes that don’t pay well, billing the wrong code since it is easier to find) –Natural Language Processing needs  And many more…

14 Outline  Background  On-going projects and updates  Proposed projects for Year 2  Productivity till date  Q & A

15 Current HTP Project Themes  Identification of Clinical Element Models  Phenotyping Execution Logic  Data Quality, Validation and Cost Effectiveness

16 Project Overview  Three eMERGE phenotyping algorithms as initial Use Cases –Type 2 Diabetes Mellitus (T2DM) –Peripheral Arterial Disease (PAD) –Hypothyroidism  Specified computable mappings between CEMs and algorithms  Classified phenotyping input specifications into two categories: –General EHR data requirements (Examples: demographics, diagnoses) –Phenotype-specific EHR data (Example: Ankle-brachial index for PAD)  Proposed semantic types of the input specifications

17 Semantic Classification Types  Demographic data (e.g., Gender, Race, Age, etc)  Physical measurements (e.g., Weight, Height, BMI, etc)  Diagnosis (ICD codes, SNOMED CT annotations from problem list, administrative coding workflows, clinical notes, and etc)  Procedure (CPT codes, ICD procedure codes)  Medication  Laboratory

18 General Models for Scalability  Diagnosis –AdministrativeDiagnosisCode: billing purposes –ClinicalAssertedDiagnosisCode: problem list, clinical notes, etc  Medication –Prescribed/Ordered –Dispensed –Administered  Procedure –AdministrativeProcedureCode: CPT code, ICD 9 code for inpatient.  Laboratory

19 Mapping Issues  Secondary use versus patient care meanings –History of X meaning “evidence of X prior to date Y” versus history of X statement in text documents –Diagnosis inputs often validated on ICD-9-CM codes  Non-standard aggregations –Fasting glucose test  Availability of data in EHR –Age at onset of X –Medical specialty (ankle brachial index) –Smoking history/family history (NLP/structured solutions)

20 Mapping Considerations  Algorithm inputs are abstractions of EHR content –Native content –Generalized content –Computed –Selected content  Common constraints of EHR content –Source of data, i.e., EHR application used, encounter type –Allowable codes –Temporal bounds –Relationships among separate observations

21 Example CEM to Algorithm Map

22 Example CEM to Algorithm Map - 2

23 Current HTP Project Themes  Identification of Clinical Element Models  Phenotyping Execution Logic  Data Quality, Validation and Cost Effectiveness

24 Drools-based Phenotyping Architecture Clinical Element Database List of Patients for Specific Cases  Rule accessibility by clinicians – BPMN, decision tables, DSL; collaborative authoring  Workflow authoring by domain experts (clinicians) Domain Expert ~Analyst ~Developer Drools (A long with other technologies)

25 Drools-based Phenotyping Architecture Business Logic Clinical Element Database List of Diabetic Patients Data Access Layer Transformation Layer Inference Engine (Drools) Service for Creating Output (File, Database, etc) Transform physical representation  Normalized logical representation (Fact Model)

26 Drools – Workflow

27 Diabetes Project Status  Diabetes Rules are Completed  Demonstrated the Workflow/Rules for Feedback  Make Rules “Shareable”  Performance Validation  More details in the later session!

28 DM2 algorithm Logic StatementGELLO expressionQDM expression Patient record flagged as “Y” with research Authorization (nothing in data model to represent this) context Patient def: researchAuthorization: Boolean = Exist(Self.explicitConsent = ‘Y’) If ResearchAuthorization If Patient.explicitConsent = ‘Y’ Patient age greater than 18 at the start of measurement period [1/1/09-12/31/10] context Patient def: age: Integer = let startOfMeasurement = PointInTime : 1/1/09 in StartOfMeasurement – Self.birthdate If age > 18 startOfMeasurement= 1/1/09 If startOfMeasurement – Patient.birthdate > 18 Patient meets at least one of the following criteria: Patient has at least 2 clinic (face-to-face outpatient) visits during measurement period with visits coded with a diabetes ICD-9 CM code OR context Patient def: face2face: Integer = let startOfMeasurement = PointInTime : 1/1/09, let endOfMeasurement = PointInTime: 12/31/10, let dmCodes = {listOfICD9CodesForDM}, let EncountersWithDMcodes: Set(Encounter) -> select (Encounter.EncounterType =outpatient AND Encounter.StartDate >= startOfMeasurement AND Encounter.StartDate <= endOfMeasurement AND Encounter.ClinicalEncounterId = dmCodes) in count(encountersWithDMcod es) if face2face >= 2 startOfMeasurement= 1/1/09 endOfMeasurement = 12/31/10 dmCodes = {listOfICD9CodesForDM} Countdistinct(Encounter: encounter outpatient DURING StartOfMEasurement and endOfMeasurement and Encounter.ClinicalEncounterId in dmCodes) >=2 Patient is on DM medications OR context Patient def: onDMmeds: Boolean = let dmMedications = {listOfRxNormCodesForDMme ds} in Exist(Medication.MedicationId in dmMedications) If onDMmeds dmMedications = {listOfRxNormCodesForDMmeds} Count(Medication.MedicationId in dmMedications) > 0 Patient has at least 2 clinic (face-to-face outpatient) visits during measurement period with capillary glucose lab value in the measurement period OR with an abnormal lab glucose level > 200 mg/dL OR fasting blood glucose level > 125 mg/dL OR (glyco) hemoglobin A1c >= 6.5% OR context Patient def: face2face: Boolean = let startOfMeasurement = PointInTime : 1/1/09, let endOfMeasurement = PointInTime: 12/31/10, let capillaryGlucoseCodes = Set (codes) : {listOfLoincCapillaryGlucoseCo des}, let glucoseTestCodes = Set (codes): {listOfLoincCodesForGlucose}, let HbA1CCodes = Set (codes) : {listOfLoincCodesForHbA1c}, let fastingGlucoseCodes = Set (codes): {listOfLoincCodesForFastingGl ucose}, let capillaryGlucoseTests: set (Lab) -> select (Lab.specimenCollectionDate >= startOfMeasurement AND Lab.specimenColletiondate <= endOfMeasurement AND Lab.ResultCode in capillaryGlucoseCodes), let abnormalGlucoseTests: Set (lab) -> select (Lab.specimenCollectionDate >= startOfMeasurement AND Lab.specimenCollectiondate 200 mg/dL), let fastingGlucoseTests: Set(Lab) -> select (Lab.specimenCollectionDate >= startOfMeasurement AND Lab.specimenColletiondate 125 mg/dL), let HbA1CTests: Set(Lab) -> select (Lab.specimenCollectionDate >= startOfMeasurement AND Lab.specimenColletiondate = 6.5%) let EncountersDuringMeasureme nt: Set(Encounter) -> select (Encounter.EncounterType =outpatient AND Encounter.StartDate >= startOfMeasurement AND Encounter.StartDate <= endOfMeasurement ) in count(EncountersDuringMeas urement) > 2 AND ( exist(EncountersDuringMeasu rement -> intersection(capillaryGlucoseT ests) OR exist(EncountersDuringMeasu rement -> intersection(abnormalGlucose Tests) OR exist(EncountersDuringMeasu rement -> intersection(fastingGlucoseTes ts) OR exist(EncountersDuringMeasu rement -> intersection(HbA1CTests) ) startOfMeasurement= 1/1/09 endOfMeasurement = 12/31/10 capillaryGlucoseCodes = {listOfLoincCapillaryCodes} glucoseTestCodes = {listOfLoincCodesForGlucose} HbA1CCodes = {listOfLoincCodesForHbA1c} fastingGlucoseCodes = {listOfLoincCodesForFastingGlucose} If( CountDistinct(Encounter: encounter outpatient DURING StartOfMEasurement and endOfMeasurement) >= 2 AND ( Lab.ResultCode in capillaryGlucoseCodes starts concurrent with Encounter outpatient OR ( Lab.ResultCode in glucoseTestCodes starts concurrent with Encounter outpatient AND Lab.Value > 200 mg/dL ) OR ( Lab.ResultCode in fastingGlucoseCodes starts concurrent with Encounter outpatient AND Lab.Value > 125 mg/dL ) OR ( Lab.ResultCode in HbA1CCodes starts concurrent with Encounter outpatient AND Lab.Value >= 6.5% ) ) Patient has ‘diabetes’ in the EMR problem list context Patient def: hasDMinProblemList: Boolean = let DMproblem = {listOfICD9codesForDM} In Exist(Problem.ProblemId in Dmproblem} If hasDMinProblemList DMproblem = {listOfICD9codesForDM} Count(Problem.ProblemId in DMproblem) > 0

29 NQF QDM Criteria

30 Current HTP Project Themes  Identification of Clinical Element Models  Phenotyping Execution Logic  Data Quality, Validation and Cost Effectiveness

31 Data Quality: Objectives  Assess Data variability within and across institutions  Assess impact of this variability on Secondary Use of EMR  Generate specifications for Widgets –“Warning Label” for suspect data categories –Data quality audits with logs –Batch data correction / removal  More details during the later session!

32 Centerphase Project Research Design Randomly generate ONE sample set of patient records from database: Based on T2DM ICD9 codes from at least 2 visits during measurement period Sample Patient Records Screens 1 -3 Patient Result Set Patient Result Set Manual Process Algorithm-Driven Process Compare time, cost and accuracy of results Study coordinator (SC) conducts manual review of patient charts, and monitors activity time Programmer develops and runs algorithm to query records, and monitors development and run time

33 Outline  Background  On-going projects and updates  Proposed projects for Year 2  Productivity till date  Q & A

34 Project 1: National Library for Clinical Phenotyping Algorithms  Current state of the art –MS Word files: do not scale –An FTP server: will not work either –We need…programmatic access, querying, navigation –Promote re-use (where applicable)  Research Question: To develop an implementation independent, phenotyping logic representation template for algorithm design –Existing work on Drools, GELLO and NQF –Leverage CEMs for algorithm design and representation –Publicly accessible Web-based environment for phenotyping algorithms –Validate algorithm deployment in multiple EMR settings

35 Project 2: Machine Learning and Phenotyping  EMR-derived phenotyping algorithm development is tedious, and time-consuming –Based on our own experience!  Research Question: To leverage machine learning methods for rule/algorithm development, and validate against expert developed ones –Use eMERGE library of phenotype algorithms for validation –Asthma and Diabetes as initial use-cases  Preliminary work by Susan –Work with data normalization and NLP teams

36 Project 3: Just-in-Time Phenotyping  The current pipeline prototype is based on a relational persistence layer –Access to historical, retrospective data –Offline processing of data and phenotyping algorithms  Research Question: To to apply phenotyping algorithms as “data sniffers” that can be plugged within an UIMA pipeline –Online, real-time phenotyping (e.g., for clinical decision support) –How much data is “necessary”? How much data is “necessary and sufficient”? –More active role of NLP techniques

37 Project 4: Phenotyping Workbench  EMR-based phenotyping algorithms are hard to design, and even harder to implement –Access to domain experts—often a resource issue –Access to IT/informatics experts—also, a resource issue –Lot of moving components  Research Question: To develop a phenotyping “plug & play” workbench for algorithm design and evaluation –Visual and graphical algorithm editing (jPBMN) –Configurable algorithms (Drools code snippets) –User workspace management (who are these “users”?) –File-based or database access layer (CEM-based) –Leverage i2b2 workbench where applicable –“Plug & Play” is still a big challenge…

38 Outline  Background  On-going projects and updates  Proposed projects for Year 2  Productivity till date  Q & A

39 Productivity till date  Manuscripts/Abstracts/Posters –Conway MA, Berg RL, Carrell D, Denny JC, Kho AN, Kullo IJ, Linneman JG, Pacheco JA, Pessig PL, Rasmussen L, Weston N, Chute CG, Pathak J. Analyzing Heterogeneity and Complexity of Electronic Health Record Oriented Phenotyping Algorithms. AMIA 2011 (paper). –Tao C, Parker CG, Oniki TA, Pathak J, Huff SM, Chute CG. An OWL Meta- Ontology for Representing the Clinical Element Model. AMIA 2011 (paper). –Chute CG, Pathak J, Savova GK, Bailey KR, Schor MI, Hart LA, Beebe CE, Huff SM. The SHARPn Project on Secondary Use of Electronic Medical Record Data: Progress, Plans and Possibilities. AMIA 2011 (paper). –Conway MA, Pathak J. Analyzing the Prevalence of Hedges in Electronic Health Record Oriented Phenotyping Algorithms. AMIA 2011 (poster). –Tao C, Welch SR, Wei WQ, Oniki TA, Parker CA, Pathak J, Huff SM, Chute CG. Normalized Representation of Data Elements for Phenotype Cohort Identification in Electronic Health Record. AMIA 2011 (poster).  Prototype software –Drools-based implementation of the diabetes algorithm

40 Thank You!


Download ppt "Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping June 30, 2011 Jyoti Pathak,"

Similar presentations


Ads by Google