Presentation is loading. Please wait.

Presentation is loading. Please wait.

Session II: Standardization of phenotyping algorithms and data Jyotishman Pathak, PhD Associate Professor of Biomedical Informatics Department of Health.

Similar presentations


Presentation on theme: "Session II: Standardization of phenotyping algorithms and data Jyotishman Pathak, PhD Associate Professor of Biomedical Informatics Department of Health."— Presentation transcript:

1 Session II: Standardization of phenotyping algorithms and data Jyotishman Pathak, PhD Associate Professor of Biomedical Informatics Department of Health Sciences Research 2015 AMIA Translational Summits, San Francisco March 23 rd, 2015

2 2015 AMIA Translational Summits DISCLOSURES Relevant Financial Relationship(s) Mayo Clinic and Dr. Pathak have a financial interest related to a device, product or technology referenced in this presentation with Apervita Inc.® Off Label Usage None ©2015 MFMER | slide-2

3 2015 AMIA Translational Summits Outline Standards-based approaches to EHR phenotyping NQF Quality Data Model JBoss® Drools business rules environment PhenotypePortal.org Standards-based approaches to phenotype data representation Biomedical vocabularies and information models eleMAP data element harmonization toolkit ©2015 MFMER | slide-3

4 2015 AMIA Translational Summits Key lessons learned from eMERGE Algorithm design and transportability Non-trivial; requires significant expert involvement Highly iterative process Time-consuming manual chart reviews Representation of “phenotype logic” is critical Standardized data access and representation Importance of unified vocabularies, data elements, and value sets Questionable reliability of ICD & CPT codes (e.g., billing the wrong code since it is easier to find) Natural Language Processing (NLP) is critical [Kho et al. Sc. Trans. Med 2011; 3(79): 1-7] ©2015 MFMER | slide-4

5 2015 AMIA Translational Summits Example algorithm: Hypothyroidism ©2012 MFMER | slide-5

6 2015 AMIA Translational Summits Data Transform EHR-driven Phenotyping Algorithms Phenotype Algorithm Visualization Evaluation NLP, SQL Rules Mappings [eMERGE Network] ©2015 MFMER | slide-6

7 2015 AMIA Translational Summits Initial eMERGE plan: sharing SQL, XML, near executable pseudo-code That didn’t work: the sites were too heterogeneous More effective: sharing documents and human- readable flowcharts Downside: requires humans to translate, each implementation requires interpretation of ambiguity Moving forward: executable algorithm and workflow specifications ( back to the intentions of the initial plan) Requirements: portability, standards-based (when possible), buy-in at different levels Porting Phenotype Algorithms ©2015 MFMER | slide-7

8 2015 AMIA Translational Summits Algorithm Execution Process (baseline) Human Custom Scripts Text Custom Scripts/Code [eMERGE Network] ©2015 MFMER | slide-8

9 2015 AMIA Translational Summits Pros and Cons Pros Very flexible “Portable” (with caveats) Cons Too flexible (no standard format) Requires implementation from scratch for every algorithm Error prone; can lead to ambiguity ©2015 MFMER | slide-9

10 2015 AMIA Translational Summits Algorithm Execution Process (standardized phenotype definitions) Human Custom Scripts QDM Custom Scripts/Code [eMERGE Network] ©2015 MFMER | slide-10

11 2015 AMIA Translational Summits Pros and Cons Pros Consistent format Standards based Automatic translation to HTML (etc.) for human consumption Cons Not as flexible as text May require extensions to cover algorithms not expressible as Boolean rules Still requires mapping to executable code ©2015 MFMER | slide-11

12 2015 AMIA Translational Summits Algorithm Execution Process (custom execution engine) Automatic Custom Engine QDM Custom Scripts/Code Target [eMERGE Network] ©2015 MFMER | slide-12

13 2015 AMIA Translational Summits Pros and Cons Pros Same as for previous, plus Consistent target for mapping specifications Allows possibility of automatic mapping Cons Need to transform source data into format required by execution engine Not portable to other institutions (may not be a problem if you don’t care) Unclear how to integrate NLP scripts/code ©2015 MFMER | slide-13

14 2015 AMIA Translational Summits Algorithm Execution Process (common execution engine) Automatic Drools, KNIME QDM Custom Scripts/Code Target [eMERGE Network] ©2015 MFMER | slide-14

15 2015 AMIA Translational Summits Pros and Cons Pros Same as for previous, plus Can re-use mappings developed externally Cons Need to transform source data into format required by execution engine Automatic mappings may be sub-optimal Execution engine may not fit idiosyncratic features of local data sources ©2015 MFMER | slide-15

16 2015 AMIA Translational Summits NQF Quality Data Model (QDM) - I Standard of the National Quality Forum (NQF) A structure and grammar to represent quality measures and phenotype definitions in a standardized format Groups of codes in a code set (ICD-9, etc.) "Diagnosis, Active: steroid induced diabetes" using "steroid induced diabetes Value Set GROUPING ( )” Supports temporality & sequences AND: "Procedure, Performed: eye exam" > 1 year(s) starts before or during "Measurement end date" Implemented as a set of XML schemas Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.) ©2015 MFMER | slide-16

17 2015 AMIA Translational Summits NQF Quality Data Model (QDM) - II [NQF QDM update, June 2012] ©2015 MFMER | slide-17

18 2015 AMIA Translational Summits NQF Quality Data Model (QDM) - III [NQF QDM update, June 2012] ©2015 MFMER | slide-18

19 2015 AMIA Translational Summits ©2015 MFMER | slide-19

20 2015 AMIA Translational Summits Example: Diabetes & Lipid Mgmt. - I ©2012 MFMER | slide-20

21 2015 AMIA Translational Summits Example: Diabetes & Lipid Mgmt. - II Human readable HTML ©2015 MFMER | slide-21

22 2015 AMIA Translational Summits Example: Diabetes & Lipid Mgmt. - III ©2015 MFMER | slide-22

23 2015 AMIA Translational Summits Example: Diabetes & Lipid Mgmt. - IV ©2015 MFMER | slide-23

24 2015 AMIA Translational Summits Example: Diabetes & Lipid Mgmt. - V Computer readable HQMF XML (based on HL7 v3 RIM) ©2015 MFMER | slide-24

25 2015 AMIA Translational Summits Modeling NQF criteria using Measure Authoring Tool (MAT) ©2015 MFMER | slide-25

26 2015 AMIA Translational Summits ©2012 MFMER | slide-26

27 2015 AMIA Translational Summits [Thompson et al., AMIA 2012] ©2015 MFMER | slide-27

28 2015 AMIA Translational Summits [Thompson et al., AMIA 2012] ©2015 MFMER | slide-28

29 2015 AMIA Translational Summits JBoss® Drools rules management system Represents knowledge with declarative production rules Origins in artificial intelligence expert systems Simple when then rules specified in text files Separation of data and logic into separate components Forward chaining inference model (Rete algorithm) Domain specific languages (DSL) ©2015 MFMER | slide-29

30 2015 AMIA Translational Summits Example Drools rule rule “Glucose <= 40, Insulin On” when $msg : GlucoseMsg(glucoseFinding 0 ) then glucoseProtocolResult.setInstruction(GlucoseInstructions. GLUCOSE_LESS_THAN_40_INSULIN_ON_MSG); end {binding} {Java Class} {Class Getter Method} Parameter {Java Class} {Class Setter Method} {Rule Name} ©2015 MFMER | slide-30

31 2015 AMIA Translational Summits Automatic translation from NQF criteria to Drools Measure Authoring Toolkit Measure Authoring Toolkit Drools Engine From non-executable to executable Data Types XML-based structured representation Data Types XML-based structured representation Value Sets saved in XLS files Value Sets saved in XLS files Measures XML-based Structured representation Measures XML-based Structured representation Mapping data types and value sets Mapping data types and value sets Fact Models Fact Models Converting measures to Drools scripts Converting measures to Drools scripts Drools scripts Drools scripts ©2015 MFMER | slide-31

32 2015 AMIA Translational Summits [Li et al., AMIA 2012] ©2015 MFMER | slide-32

33 2015 AMIA Translational Summits The “executable” Drools workflow [Li et al., AMIA 2012] ©2015 MFMER | slide-33

34 [Endle et al., AMIA 2012]

35 1.Converts QDM to Drools 2.Rule execution by querying the CEM database 3.Generate summary reports

36 2015 AMIA Translational Summits ©2012 MFMER | slide-36

37 2015 AMIA Translational Summits ©2012 MFMER | slide-37

38 2015 AMIA Translational Summits [Peterson AMIA 2014] ©2015 MFMER | slide-38

39 2015 AMIA Translational Summits Example: Initial Patient Population criteria for CMS eMeasure (CMS163V1) ©2012 MFMER | slide-39

40 2015 AMIA Translational Summits [Peterson AMIA 2014] ©2015 MFMER | slide-40

41 2015 AMIA Translational Summits Validation using Project Cypress [Peterson AMIA 2014] ©2015 MFMER | slide-41

42 2015 AMIA Translational Summits Alternative: Execution using KNIME ©2015 MFMER | slide-42

43 2015 AMIA Translational Summits Outline Standards-based approaches to EHR phenotyping NQF Quality Data Model JBoss® Drools business rules environment PhenotypePortal.org Standards-based approaches to phenotype data representation Biomedical vocabularies and information models eleMAP data element harmonization toolkit ©2015 MFMER | slide-43

44 2015 AMIA Translational Summits Overall Objective - I Without terminology and metadata standards: Health data is non-comparable Health systems cannot meaningfully interoperate Secondary uses of data for research and applications (e.g., clinical decision support) is not possible Our goal: Standardized and consistent representation of eMERGE phenotype data submitted to dbGaP (Database of Genotypes and Phenotypes) ©2015 MFMER | slide-44

45 2015 AMIA Translational Summits Overall Objective - II 1.Create/modify a data dictionary 2. Harmonize data elements to standards 3. Generate standardized phenotype data Raw/Local Data from EMR ©2015 MFMER | slide-45

46 2015 AMIA Translational Summits What are Data Dictionaries? Data dictionaries Collections of variables Often constructed for a particular study May contain common and study-specific variables Wide spectrum of formality Inconsistencies can complicate (or prevent) data integration Differences in format (require transformations) Differences in semantic meaning Differences in data values [PHONT RIPS 2011] ©2015 MFMER | slide-46

47 2015 AMIA Translational Summits Data Element Standardization Lab Measurements Similar data element with different semantics Fasting vs. non-fasting (implied semantics?) Specific time is implied (visit number) Mathematically transformed (natural log) [PHONT RIPS 2011] ©2015 MFMER | slide-47

48 2015 AMIA Translational Summits Data Element Standardization Medications Different permissible values Content and meaning of values Local meanings ("other", "unknown") Could also have different representations (codes vs. text) Can use standard drug ontologies (RxNorm) Standardize drug names and classes (NDF-RT) Auto-classify agents (more flexible, less error prone) [PHONT RIPS 2011] ©2015 MFMER | slide-48

49 2015 AMIA Translational Summits eMERGE Data Dictionary Standardization Data dictionary standardization effort Harmonize core eMERGE data elements Leverage standardized ontologies and metadata Develop SOPs for data dictionary usage Standardization Collect Dictionaries Library of Standardized Data Elements [PHONT RIPS 2011] ©2015 MFMER | slide-49

50 2015 AMIA Translational Summits Background: Clinical Terminology Standards and Resources NCI Cancer Data Standards Repository Metadata registry based on ISO/IEC standard for storing common data elements (CDEs) Allows creating, editing, deploying, and finding of CDEs Provides the backbone for NCI’s semantic-computing environment, including caBIG (Cancer Biomedical Informatics Grid) Approx. 40,000 CDEs ©2015 MFMER | slide-50

51 2015 AMIA Translational Summits 51

52 2015 AMIA Translational Summits 52

53 2015 AMIA Translational Summits Background: Clinical Terminology Standards and Resources CDISC Terminology To define and support terminology needs of the CDISC models across the clinical trial continuum Used as part of the Study Data Tabulation Model: an international standard for clinical research data, approved by the FDA as a standard electronic submission format Comprises approx terms covering demographics, interventions, findings, events, trial design, units, frequency, and ECG terminology ©2015 MFMER | slide-53

54 2015 AMIA Translational Summits 54

55 2015 AMIA Translational Summits NCI Thesaurus Reference terminology for clinical care, translational and basic cancer research Comprises approx. 70,000 concepts representing information for nearly 10,000 cancers and related diseases NCI Enterprise Vocabulary Services (LexEVS) provides the terminology infrastructure for caBIG, NCBO etc. Background: Clinical Terminology Standards and Resources ©2015 MFMER | slide-55

56 2015 AMIA Translational Summits 56

57 2015 AMIA Translational Summits 57

58 2015 AMIA Translational Summits Systematized Nomenclature of Medicine Clinical Terms is a comprehensive terminology covering most areas of clinical information including diseases, findings, procedures, microorganisms, pharmaceuticals etc. Comprises approx. 370,000 concepts Acquired by International Health Terminology Standards Development Organization (IHTSDO) in 2007 Background: Clinical Terminology Standards and Resources ©2015 MFMER | slide-58

59 2015 AMIA Translational Summits 59

60 2015 AMIA Translational Summits Background: Clinical Terminology Standards and Resources LOINC Logical Observation Identifiers Names and Codes provides a set of universal codes and names to identify laboratory and other clinical observations Over 100,000 terms RELMA: Regenstreif LOINC Mapping Assistant Program helps users map their local terms or lab tests to universal LOINC codes ©2015 MFMER | slide-60

61 2015 AMIA Translational Summits 61

62 2015 AMIA Translational Summits eleMAP Conceptual Architecture https://victr.vanderbilt.edu/ele MAP ©2015 MFMER | slide-62

63 2015 AMIA Translational Summits eleMAP Data Harmonization Process 5 easy steps 1. Create a study 2. Create your data dictionary 3. Harmonize the data elements 4. Harmonize the actual/raw data 5. Iterate if necessary… Quick demo or Screen Shots ©2015 MFMER | slide-63

64 2015 AMIA Translational Summits eleMAP Data Harmonization Process: 1 Step 1: Select “Harmonize Data” under My Account ©2015 MFMER | slide-64

65 2015 AMIA Translational Summits eleMAP Data Harmonization Process: 2 Step 2: Select Study, Source, and Upload raw data file ©2015 MFMER | slide-65

66 2015 AMIA Translational Summits eleMAP Data Harmonization Process: 3 Step 3: Click OK to confirm import of data file ©2015 MFMER | slide-66

67 2015 AMIA Translational Summits eleMAP Data Harmonization Process: 4 Step 4: Display of harmonized data; Download file ©2015 MFMER | slide-67

68 2015 AMIA Translational Summits eleMAP Data Harmonization Process: 5 Step 5: Harmonized data file for dbGaP submission ©2015 MFMER | slide-68

69 2015 AMIA Translational Summits eleMAP Statistics (as of 03/09/2015) 18 different eMERGE studies 407 data elements, across 13 different categories 68% mapped to caDSR CDEs 41% mapped to SNOMED CT concepts 41% mapped to NCI Thesaurus concepts 25% mapped to SDTM DEs 30% DEs have no mapping ©2015 MFMER | slide-69

70 2015 AMIA Translational Summits Key lessons learned from eMERGE phenotype data integration Use case: eMERGE Network Combined Dataset Studies: RBC, WBC, Height, Lipids, Diabetic Retinopathy, and Hypothyroidism 1. Issue: Data value inconsistencies were found in common variables among studies (e.g. race). Suggestion: Use eleMAP to define study phenotypes/data elements and disseminate finalized DD to all sites prior to actual data collection. 2. Issue: Same variable name was used for different data element concepts (e.g. weight, height, and BMI) Suggestion: Use eleMAP to review concept/description of existing data elements and define new data elements if necessary. 3. Issue: Inconsistent data values were received (e.g. Sex). Some were original values (F=Female; M=Male) and some were mapped values (Female=C46110;Male=C46109). Suggestion: Best to gather data in original values, combined data sets, and then harmonize merged data files via eleMAP. ©2015 MFMER | slide-70

71 2015 AMIA Translational Summits PheKB Data Dictionary Validation Validate data dictionary Columns Formatting Validate data file against data dictionary Variable names, order Data types, min, max Encoded values ©2015 MFMER | slide-71

72 2015 AMIA Translational Summits PheKB Data Dictionary Validation ©2015 MFMER | slide-72

73 2015 AMIA Translational Summits PheKB Data Dictionary Validation ©2015 MFMER | slide-73

74 2015 AMIA Translational Summits PheKB Data Dictionary Validation ©2015 MFMER | slide-74

75 2015 AMIA Translational Summits PheKB Data Dictionary Validation ©2015 MFMER | slide-75

76 2015 AMIA Translational Summits Future eleMAP/PheKB Integration Tightly integrate data dictionary authoring in the phenotype definition process Utilize eleMAP functionality to define the data dictionary Submitted data sets validated against the built- in definition ©2015 MFMER | slide-76

77 2015 AMIA Translational Summits Concluding remarks Standardization of phenotyping algorithms and data dictionaries is critical Portability of algorithms across multiple EMR environments Consistent and comparable data sets To the extent possible, the goal should be to leverage on-going community-wide and national standardization efforts Join the club! ©2015 MFMER | slide-77

78 2015 AMIA Translational Summits Relevant presentations Monday (03/23/15) Session: TBI03 (Cyril Magnin 3:30PM Computational Phenotyping from Electronic Health Records across National Networks Wednesday (03/25/15) Session: CRI02 10:30AM A Modular Architecture for Electronic Health Record-Driven Phenotyping ©2015 MFMER | slide-78

79 2015 AMIA Translational Summits ©2015 MFMER | slide-79

80 2015 AMIA Translational Summits Thank You! ©2015 MFMER | slide-80


Download ppt "Session II: Standardization of phenotyping algorithms and data Jyotishman Pathak, PhD Associate Professor of Biomedical Informatics Department of Health."

Similar presentations


Ads by Google