4EntitiesEntities in social web: person, location, organization, book, music (freebase.com: Metformin)Entities in translational medicine: gene, drug, disease, protein, side effect (conceptwiki: Disease Lafora)Data: scientific papers (PubMed, PubMed central), and experimental data (SwissPro, KEGG, DrugBank,)
6Use Case: Individualized Cohort in EHR EHR-based individualized cohort can provide a better solution than standard guidelines because the cohort is drawn from a patient population of the same geolocation, demographics, and socio-economic group to the given patient.EHRs are organized around the patient, not by concepts (diseases, lab results, medications, etc.)
7Use Case: Individualized Cohort in EHR EHR data contains controlled vocabularies (e.g., demographics, diagnostic codes, medications, procedures, etc.) and continuous values (e.g., lab tests, medication doses, etc.).Category hierarchy (parent, siblings, subtrees): search patients like a given diagnosis “ICD10:E11.21” (diabetes with nephropathy) ICD10:E11.22 (with chronic kidney disease) ICD10:E11 (diabetes in general)Continuous values: serum glucose = 120 mg/dL (many continuous values may not have a natural aggregate binning)Query for searching patients are rarely exact (fasting serum glucose =126 serum glucose between 120 and 130), or serum glucose in the 80th percentile at this timeA patient can have ,000 property values which contain 100 controlled vocabulary values and 1000 continuous values. Most values are time based.
8Challenges Searching challenges Category hierarchy (parent, siblings, subtrees): search patients like a given diagnosis “ICD10:E11.21” (diabetes with nephropathy) ICD10:E11.22 (with chronic kidney disease) ICD10:E11 (diabetes in general)Continuous values: serum glucose = 120 mg/dL (many continuous values may not have a natural aggregate binning)Query for searching patients are rarely exact (fasting serum glucose =126 serum glucose between 120 and 130), or serum glucose in the 80th percentile at this timeMap the changes in value with changes in time: search for a patient for a 60th% to 90th% transition between two serum glucose over a 6 month time frame. If we have N glucose values, for any two patient, we have to compare N*(N-1)/2 time-based glucose-value comparison. How to scale it up?Find common patterns from a set of individualized cohort patients. This means compare with the combination of subsets of million’s of differentials for each patient in the cohort.
9Relational Database Semantic Graph Paradigm shift from relational row-column lookup to semantic graph traversalRelational Database is less efficient in joins,Big indexing overhead (need to indexing every column)
10EHR RDF GraphPatient EHR data in semantic graph representation. EHR timeline for Patient A and B are shown as RDF graphs. Property values of each patient (demographics, labs, diagnosis, etc.) are connected to their respective ontologies. Enabling searching for patterns across different patients.
11EHR RDF GraphApplication of continuous value classes will enrich the patients retrieved from the database.2A. Property values as literal nodes will not link “like” patients together without a “relational” query.2B. By using controlled vocabulary (CV)-ontology edges, we will be able to link patients through CV-value nodes.2C. By adding “nearby” classes to continuous value nodes, we will link additional patients. Different strategies will create different “nearby” links.
12Challenges: Semantic Graph Mining Graph indexinggIndex: indexing frequent subgraphs, using subgraphs as featuresGraph classification, clusteringPath-based clustering and top-k similarity problems in heterogeneous information networkPath-based graph miningComplex dependencies within heterogeneous networkConventional supervised classification methods assume that the objects are independentSequential matching vs. snapshot matching as EHR records have a time dimension.
14Challenges for Semantic Web How to handle ontology graph + instance graphHow to handle inferred triples and existing triples (reasoning)Graph pattern search vs. Graph miningDatatype properties vs. object propertiesDifferent levels of semantics: ontology (schema), categorized values (terminology), continuous values (binning?), literal