Knowledge Graph: Connecting Big Data Semantics

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
The work proposed in this study is an attempt to use Semantic Web technologies for integrating patient clinical data derived from Electronic Health Records.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.
Neo4j Adam Foust.
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Systems analysis and design, 6th edition Dennis, wixom, and roth
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Archival Integration with Neo4j Mike Bryant Centre for e-Research King’s College London.
Paul Groth VU University Amsterdam Convergence Meeting: Semantic Interoperability for Clinical Research & Patient.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Temporal Reasoning and Planning in Medicine Frame-Based Representations and Description Logics Yuval Shahar, M.D., Ph.D.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Steven Seida How Does an RDF Knowledge Store Compare to an RDBMS?
Kaiser Permanente Convergent Medical Terminology (CMT) Using Oxford RDFox and SNOMED for Quality Measures.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
NoSQL: Graph Databases
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
NeurOn: Modeling Ontology for Neurosurgery
Kaiser Permanente Convergent Medical Terminology (CMT)
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
David Ostrovsky | Couchbase
NOSQL databases and Big Data Storage Systems
Lifting Data Portals to the Web of Data
Associative Query Answering via Query Feature Similarity
Analyzing and Securing Social Networks
Logics for Data and Knowledge Representation
UMBC AN HONORS UNIVERSITY IN MARYLAND
CSE572, CBS572: Data Mining by H. Liu
Information Networks: State of the Art
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

Knowledge Graph: Connecting Big Data Semantics Ying Ding Indiana University

Entity in Big Data Entity: things, not strings Relationship matters: connecting entities Changing in searching: string entityrelationsubgraph

entity relations

Entities Entities in social web: person, location, organization, book, music (freebase.com: Metformin) Entities in translational medicine: gene, drug, disease, protein, side effect (conceptwiki: Disease Lafora) Data: scientific papers (PubMed, PubMed central), and experimental data (SwissPro, KEGG, DrugBank,)

Challenges Knowledge Graph – Entity Graph Schema graph (small size) vs. Instance graph (large size) Graph mining (e.g. shortest path, depth-first/breath first, pagerank) Neo4j, NoSQL graph database, Graph pattern search (SPARQL) Triple store, virtuoso (openlinksw )

Use Case: Individualized Cohort in EHR EHR-based individualized cohort can provide a better solution than standard guidelines because the cohort is drawn from a patient population of the same geolocation, demographics, and socio-economic group to the given patient. EHRs are organized around the patient, not by concepts (diseases, lab results, medications, etc.)

Use Case: Individualized Cohort in EHR EHR data contains controlled vocabularies (e.g., demographics, diagnostic codes, medications, procedures, etc.) and continuous values (e.g., lab tests, medication doses, etc.). Category hierarchy (parent, siblings, subtrees): search patients like a given diagnosis “ICD10:E11.21” (diabetes with nephropathy)  ICD10:E11.22 (with chronic kidney disease) ICD10:E11 (diabetes in general) Continuous values: serum glucose = 120 mg/dL (many continuous values may not have a natural aggregate binning) Query for searching patients are rarely exact (fasting serum glucose =126 serum glucose between 120 and 130), or serum glucose in the 80th percentile at this time A patient can have 100-100,000 property values which contain 100 controlled vocabulary values and 1000 continuous values. Most values are time based.

Challenges Searching challenges Category hierarchy (parent, siblings, subtrees): search patients like a given diagnosis “ICD10:E11.21” (diabetes with nephropathy)  ICD10:E11.22 (with chronic kidney disease) ICD10:E11 (diabetes in general) Continuous values: serum glucose = 120 mg/dL (many continuous values may not have a natural aggregate binning) Query for searching patients are rarely exact (fasting serum glucose =126 serum glucose between 120 and 130), or serum glucose in the 80th percentile at this time Map the changes in value with changes in time: search for a patient for a 60th% to 90th% transition between two serum glucose over a 6 month time frame. If we have N glucose values, for any two patient, we have to compare N*(N-1)/2 time-based glucose-value comparison. How to scale it up? Find common patterns from a set of individualized cohort patients. This means compare with the combination of subsets of million’s of differentials for each patient in the cohort.

Relational Database Semantic Graph Paradigm shift from relational row-column lookup to semantic graph traversal Relational Database is less efficient in joins, Big indexing overhead (need to indexing every column)

EHR RDF Graph Patient EHR data in semantic graph representation. EHR timeline for Patient A and B are shown as RDF graphs. Property values of each patient (demographics, labs, diagnosis, etc.) are connected to their respective ontologies. Enabling searching for patterns across different patients.

EHR RDF Graph Application of continuous value classes will enrich the patients retrieved from the database. 2A. Property values as literal nodes will not link “like” patients together without a “relational” query. 2B. By using controlled vocabulary (CV)-ontology edges, we will be able to link patients through CV-value nodes. 2C. By adding “nearby” classes to continuous value nodes, we will link additional patients. Different strategies will create different “nearby” links.

Challenges: Semantic Graph Mining Graph indexing gIndex: indexing frequent subgraphs, using subgraphs as features Graph classification, clustering Path-based clustering and top-k similarity problems in heterogeneous information network Path-based graph mining Complex dependencies within heterogeneous network Conventional supervised classification methods assume that the objects are independent Sequential matching vs. snapshot matching as EHR records have a time dimension.

Linked Open Data

Challenges for Semantic Web How to handle ontology graph + instance graph How to handle inferred triples and existing triples (reasoning) Graph pattern search vs. Graph mining Datatype properties vs. object properties Different levels of semantics: ontology (schema), categorized values (terminology), continuous values (binning?), literal