Presentation is loading. Please wait.

Presentation is loading. Please wait.

Translational Medicine from a Semantic Web Perspective Eric Neumann W3C June 16, 2006.

Similar presentations


Presentation on theme: "Translational Medicine from a Semantic Web Perspective Eric Neumann W3C June 16, 2006."— Presentation transcript:

1

2 Translational Medicine from a Semantic Web Perspective Eric Neumann W3C June 16, 2006

3 2 Drug Discovery and Medicine Hygieia, G. Klimt Health Practice Safety Prevention Privacy Knowledge

4 3 Data Expansion Large Data Sets Variables >> Samples Many New Data Types Which Formats? Combine

5 4 Where Information Advances are Most Needed Supporting Innovative Applications in R&D –Translational Medicine (Biomarkers) –Molecular Mechanisms (Systems) –Data Provenance, Rich Annotation Clinical Information –eHealth Records, EDC, Clinical Submission Documents –Safety Information, Pharmacovigilance, Adverse Events, Biomarker data Standards –Central Data Sources Genomics, Diseases, Chemistry, Toxicology –MetaData Ontologies Vocabularies

6 5 Knowledge --is the human acquired capacity (both potential and actual) to take effective action in varied and uncertain situations. How does this translate into using Information Systems better in support of Innovation?

7 6 Knowledge Predictiveness Knowledge of Target Mechanisms Knowledge of Toxicity Knowledge of Patient-Drug Profiles Drug Discovery Challenges

8 7 Current Challenges: Drug Discovery Business –Costly, lengthy drug discovery process (12-14 years) –Poor funding to find new uses for existing therapies (ie antibiotics) –Insufficient economic drivers for certain disease areas –Discovery and clinical trials design not well aligned with anticipating adverse effect detection Post-launch surveillance is weak Science & Technology –Counteracting the legacy of Silos –How to break away from the DD conveyor belt model to the Translation model gaining and sharing insights throughout the process –The Benefit of New Targets for New Diseases –How to best identify safety and efficacy issues early on, so that cost and failure are reduced A D3 Knowledge-base: Drugability and Safety

9 8 The Big Picture - Hard to understand from just a few Points of View

10 9

11 10 Complete view tells a very different Story

12 11 Distributed Nature of R&D Silos of Data…

13 12 Static, Untagged, Disjoint Existing Web Data Throttles the R&D Potential R&D Scientist Integrating Data Manually LIMS Bioinformatics CheminformaticsPublic Data Sources Dolor Sit Amet Consectetuer Lacreet Dolore Euismod Volutpat Lacreet Dolore Magna Volutpat Nibh Euismod Tincidunt Aliguam Erat Dolor Sit Amet Consectetuer Lacreet Dolore Euismod Volutpat Lacreet Dolore Magna Volutpat Nibh Euismod Tincidunt Aliguam Erat

14 13 Data Integration: Biology Requirements DiseaseProteinsGenesPapers Retention Policy Audit Trail CurationToolsOntologyExperiment Assays Compounds

15 14 Semantic Web Data Integration R&D Scientist BioinformaticsCheminformaticsLIMSPublic Data Sources Dynamic, Linked, Searchable

16 15 Decision Support Translational Research Toxicity New Applications Safety Target Validation Biomarker Qualification GO BioPAX ICH Raw Data MAGE ML ASN1. XLS Psi XML CSV SAS Tables CDISC Semantic Bridge

17 16 Key Technologies Pharmaceuticals use to Exchanging Knowledge

18 17 New Regulatory Issues Confronting Pharmaceuticals from Innovation or Stagnation, FDA Report March 2004 Tox/Efficacy ADME Optim

19 18 Key Functionality Ubiquity –Same identifiers for anything from anywhere Discoverability –Global search on any entity Interoperability –=> Application independence: Recombinant Data

20 19 Additional Functionality Provenance –Origin and history of data and annotations Scalability –Over all potentially relevant data and content Authentication/Security –Single user and team identity and granular data security –Non-repudiation of authorship –Encryption of graphs –Policy Awareness Data Preservation –Long-term persistence by minimizing API needs

21 20 Translational Research and Personalized Medicine Research Practice Clinical Biological Personalized Medicine Translational Medicine -Two significant areas of HCLS activity - Span most areas of activity

22 21 HCLS Framework: Biomedical Research Molecular, Cellular and Systems Biology/Physiology –Organism as an integrated an interacting network of genes, proteins and biochemical reactions –Human body as a system of interacting organs Molecular Cell Biology/Genomic and Proteomic Research –Gene Sequencing, Genotyping, Protein Structures –Cell Signaling and other Pathways Biomarker Research –Discovery of genes and gene products that can be used to measure disease progression or impacts of drug Pharmaco-genomics –Impact of genetic inheritance on Drug Discovery and Translational Research –Use of preclinical research to identify promising drug candidates

23 22 HCLS Framework: Clinical Research Clinical Trials –Determination of efficacy, impact and safety of drugs for particular diseases Pharmaco-vigilance/ADE Surveillance –Monitoring of impacts of drugs on patients, especially safety and adverse event related information Patient Cohort Identification and Management –Identifying patient cohorts for drug trials is a challenging task Translational Research –Test theories emerging from pre-clinical experimentation on disease affected human subjects Development of EHRs/EMRs for both clinical research and practice –Currently EHRs/EMRs focussed on clinical workflow processes –Re-using that information for clinical research and trials is a challenging task

24 23 Translational Medicine Enable physicians to more effectively translate relevant findings and hypotheses into therapies for human health Support the blending of huge volumes of clinical research and phenotypic data with genomic research data Apply that knowledge to patients and finally make individualized, preventative medicine a reality for diseases that have a genetic basis

25 24 Translational Research Improve communication between basic and clinical science so that more therapeutic insights may be derived from new scientific ideas - and vice versa. Testing of theories emerging from preclinical experimentation on disease-affected human subjects. Information obtained from preliminary human experimentation can be used to refine our understanding of the biological principles underpinning the heterogeneity of human disease and polymorphism(s). Reference NIH Digital Roadmap activity

26 25 Clinical Practice Electronic Medical/Health Record –Integration of Structured and Unstructured Information –Design of EHRs/EMRs for both clinical research and practice Computerized Physician Order Entry –Computerized aids for submitting medication and lab orders Clinical Disease Support –Physican perpective: Therapeutic Decision Support, Drug Drug Interactions Structured Clinical Documentation –Templated forms to aid structured observation capture and storage into the electronic medical record Enterprise Terminological Services –Standardization of definitions and codes for conditions, findings, observations, labs, therapies, diagnoses, etc. Disease Management –Portals containing information relevant to a particular disease condition, e.g., diabetes Personalized Medicine –Personalizing therapeutic recommendations based on genetic profile of patient

27 26 Public and Consumer Health Epidemiology/Bio-surveillance –Monitoring of disease occurrences for unusual patterns –Indicative of epidemics, terrorist attacks Bio-sensors –E.g., detection of cancer causing agents in ground water (ORNL) Consumer Health Portals –Health Information Prescription –Electronic Prescription Disease Management –Portals containing reminders and alerts for patients for upcoming physicals, labs, etc. Personalized Health Records –Presentation of patient health related information in a language understandable to the lay person Clinical Decision Support: –Patient Perspective – help in choosing a good doctor –Population perspective – help deploy appropriate resources in appropriate areas

28 27 Personalized Medicine Propagation of insights from Genomic research into clinical practice Impact of new Molecular diagnostic tests hitting the market –How can they be incorporated into clinical care? –How does one update current clinical guidelines to incorporate the use of these tests –How can one enable novel clinical decision support? How can phenotypic characteristics and genomic markers be used to: –Stratify patient populations –Personalize clinical care Genetic test results as risk factors Therapeutic use of genomic markers

29 28 The Bench Bedside Vision Healthcare and Life Sciences: Framework –An informaticians viewpoint Current Challenges The Healthcare Life Sciences (HCLS) Ecosystem HCLS Ecosystem Business Drivers

30 29 Ecosystem: Current State Pharmaceutical Companies Clinical Research Organizations (CROs) FDA National Institutes Of Health Hospitals Universities, Academic Medical Centers (AMCs) Characterized by silos with uncoordinated supply chains leading to inefficiencies in the system Center for Disease Control Hospitals Doctors Payors Patients Patients, Public Patients Biomedical Research Clinical Practice Clinical Trials/Research Clinical Practice

31 30 Ecosystem: Goal State /* Need to expand this to include Healthcare and Biomedical Research Players as well… Show an integrated picture with continuous information flow */ /* Need to expand this with Biomedical Research + Clinical Practice */ Biomedical ResearchClinial Practice

32 31 Ecosystem: Goal State Patients, Public HospitalsDoctors Payors CDC CROs Pharmaceutical Companies FDA NIH (Research) Universities, AMCs (1)(2) (3) (4) (5) (6) Synergies (1)NIH and FDA knowledge sharing on biomedical research (2)FDA and CDC knowledge sharing on ADE monitoring + epidemics (3)Pharma, CDC knowledge sharing on epidemic outbreaks (4)AMCs, Pharma knowledge sharing on clinical and biomedical research (5)Universities/AMCs, CROs knowledge sharing on clinical resarch and trials (6)CROs, Hospitals knowledge sharing on identifying patient cohorts (7)Pharma, Hospitals knowledge sharing on post market drug surveillance (8)Payors get information regarding new conditions, drug efficacies from FDA and CDC (7) (8) From FDA, CDC

33 32 Use Case Flow: Drug Discovery and Development Qualified Targets Lead Generation Toxicity & Safety Biomarkers Pharmacogenomics Clinical Trials Molecular Mechanisms Lead Optimization KDKD

34 33 Drug Discovery & Development Knowledge Qualified Targets Lead Generation Toxicity & Safety Biomarkers Pharmacogenomics Clinical Trials Molecular Mechanisms Lead Optimization Launch

35 34 London Underground App View

36 35 Semantic Web Drug DD Application Space Genomics Therapeutics Biology HTS NDA Compound Opt safety eADME DMPK informatics manufacturing genes Clinical Studies Patent Chem Lib Production Critical Path

37 36 Opportunities for Semantics in HealthCare Enhanced interoperability via: –Semantic Tagging –Grounding of concepts in Standardized Vocabularies –Complex Definitions Semantics-based Observation Capture Inference on Diseases –Phenotypes –Genetics –Mechanisms Semantics-based Clinical Decision Support –Guided Data Interpretation –Guided Ordering Semantics-based Knowledge Management

38 37 Text Unstructured Data Types Structured and Complex Data Types Histology Profiling Data Semantics in the Life Sciences Publications Image + Text Publications + data Text + data items genomics Gene expression Data Items Clinical Findings Categorical Taxonomic Data Items Pathways, Biomarkers Complex Objects Clinical trials Complex Objects with Categorical/T axonomic Data Items Systems Biology Composite Objects with Embedded process

39 38 DB Mapping from Current Formats

40 39 RDB => RDF Virtualized RDF

41 40 Excel => RDF ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2; ls:GE_Expected_Ratio "0.2726"; ls:conditionHub gl:BREAST_MALIGNANT } ; ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:TNFRS; ls:GE_Expected_Ratio "0.0138"; ls:conditionHubgl:BREAST_MALIGNANT } ; ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2; ls:GE_Expected_Ratio "0.1275"; ls:conditionHubgl:BREAST_NORMAL} ; Casp2 TNFRS Breast Malig

42 41 Use-Case: Semantics of Multivariate Analysis Column Semantic Row Semantic Make the Row and Column types explicit and universal. Link Entities to unique web resources Include experimental Metadata

43 42 Use-Case: COSA Row Semantic Data Set Column Semantic

44 43 Use-Case: Experimental Design Definition Treatment W Control Time Points Staining Visible Microscopy Fluorescent Microscopy Cultured Cells Treatment Z Image Analysis

45 44 Case Study: Drug SafetySafety Lenses Lenses can focus data in specific ways –Hepatoxicity, genotoxicity, hERG, metabolites Can be wrapped around statistical tools Aggregate other papers and findings (knowledge) in context with a particular project Align animal studies with clinical results Support special Alert-channels by regulators for each different toxicity issue Integrate JIT information on newly published mechanisms of actions

46 45 Courtesy of BG-Medicine Example: Knowledge Aggregation

47 46 Case Study: Omics ApoA1 … … is produced by the Liver … is expressed less in Atherosclerotic Liver … is correlated with DKK1 … is cited regarding Tangiers disease … has Tx Reg elements like HNFR1 Subject Verb Object

48 47 Scenario: Kinase Targets Chemical Knowledgebase (Hits, Leads, NCE) –Classification of all chemical entities by classified targets: VEGFR, HER2, EGFR, c-ABL, P38, CDK2, GSK3b, AKT, PKC, ERK1/2, JAK, JNK MEK, PDK1, SYK, PI3K –Secondary Targets (when known) –Molecular interaction models –Integrate external databases –Manage scientific literature –Cross-map/cluster chemical properties –Known and applied synthesis methods –Toxicity (mol and cell markers) and ADME Link in all screening data and analyses –Mine patterns from across projects –Connect team and expert information Chemical Annotations –Common interface for project teams –Maintain strategy and challenges record

49 48 Scenario: Biomarker Qualification Biomarker Roles –Disease –Toxicity –Efficacy Molecular and cytological markers –Tissue-specific –High content screening derived information –Different sets associated with different predictive tools Statistical discrimination based on selected samples –Predictive power –Alternative cluster prediction algorithms –Support qualifications from multiple studies (comparisons) Causal mechanisms –Pathways –Population variation

50 49 BioMarker Semantics Disease Pathways Significance & Strength +Samples-Samples Biomarker Set

51 50 Scenario: Toxicity Mechanisms –Tissue-selective, Species-specific –Pathways, Off-Targets –Metabolites, PK sensitivity Evidence –Biomarkers In vitro assays (cell lines), Animal models, Clinical Phase 1 –Literature Population Variation –Drug Metabolism to toxic forms (CYP, SULT, UGT) –Target interaction variability –Potential vs. Demonstrated Predictions –Data Mining Patterns –Computational Modeling Working Solutions –Chemical modifications –Dosing, Reformulation –Documented animal human similarity and variation

52 51 Knowledge Mining using Semantic Web Gene Prioritization through Data Fusion - Aerts et al, 2006, Nature -Use of quantitative and qualitative information for statistical ranking. -Can be used to identify novel genes involved in diseases

53 52 Dishevelled to GSK3beta IRREVERSIBLE-LEFT-TO-RIGHT INHIBITION Case Study: BioPAX (Pathways)

54 53 Dishevelled to GSK3beta IRREVERSIBLE-LEFT-TO-RIGHT INHIBITION Case Study: BioPAX (Pathways)

55 54 Dishevelled to GSK3beta IRREVERSIBLE-LEFT-TO-RIGHT INHIBITION Case Study: BioPAX (Pathways) Modulation CHIR99102 affectedBy

56 55 Potential Linked Clinical Ontologies Clinical Trials ontology RCRIM (HL7) Genomics CDISC IRB Applications Molecules Clinical Obs ICD10 Pathways (BioPAX) Disease Models Extant ontologies Mechanisms Under development Bridge concept SNOMED Disease Descriptions Tox

57 56 Case Study: Drug Discovery Dashboards Dashboards and Project Reports Next generation browsers for semantic information via Semantic Lenses Renders OWL-RDF, XML, and HTML documents Lenses act as information aggregators and logic style-sheets add { ls:TheraTopic hs:classView:TopicView }

58 57 Drug Discovery Dashboard Topic: GSK3beta Topic Target: GSK3beta Disease: DiabetesT2 Alt Dis: Alzheimers Cmpd: SB44121 CE: DBP Team: GSK3 Team Person: John Related Set Path: WNT

59 58 Bridging Chemistry and Molecular Biology urn:lsid:uniprot.org:uniprot: P49841 Semantic Lenses: Different Views of the same data Apply Correspondence Rule: if ?target.xref.lsid == ?bpx:prot.xref.lsid then ?target.correspondsTo.?bpx:prot BioPax Components Target Model

60 59 Lenses can aggregate, accentuate, or even analyze new result sets Behind the lens, the data can be persistently stored as RDF-OWL Correspondence does not need to mean same descriptive object, but may mean objects with identical references Bridging Chemistry and Molecular Biology

61 60 Pathway Polymorphisms Merge directly onto pathway graph Identify targets with lowest chance of genetic variance Predict parts of pathways with highest functional variability Map genetic influence to potential pathway elements Select mechanisms of action that are minimally impacted by polymorphisms Non-synonymous polymorphisms from db-SNP

62 61 Knowledge Channels High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia. Posted by hannahr to CLLSignalling&Processes on Thu Jan hannahr T11:24:03Z CLLSignalling&Processes High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia. A Sainz-Perez H Gary-Gouy PMID: Leukemia

63 62 Knowledge Channels High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia. Posted by hannahr to CLLSignalling&Processes on Thu Jan hannahr T11:24:03Z CLLSignalling&Processes Giles Day pf#P38 pf#Kinases This paper suggests a mechanism for P38 protection of CLL B-cells High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia. A Sainz-Perez H Gary-Gouy PMID: Leukemia P38 paper N251 Giles Day pf#P38 Pf#Kinases nugget expert topic kChannel

64 63 Case Study: Drug SafetySafety Lenses Lenses can focus data in specific ways –Hepatoxicity, genotoxicity, hERG, metabolites Can be wrapped around statistical tools Aggregate other papers and findings (knowledge) in context with a particular project Align animal studies with clinical results Support special Alert-channels by regulators for each different toxicity issue Integrate JIT information on newly published mechanisms of actions

65 64 Excel => RDF ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2; ls:GE_Expected_Ratio "0.2726"; ls:conditionHub gl:BREAST_MALIGNANT } ; ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:TNFRS; ls:GE_Expected_Ratio "0.0138"; ls:conditionHubgl:BREAST_MALIGNANT } ; ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2; ls:GE_Expected_Ratio "0.1275"; ls:conditionHubgl:BREAST_NORMAL} ; Casp2 TNFRS Breast Malig

66 65 GeneLogic GeneExpress Data Additional relations and aspects can be defined additionally Diseased Tissue Links to OMIM (RDF)

67 66 Bar View of GeneExpress

68 67 ClinDash: Clinical Trials Browser Clinical Obs Expression Data Subjects Values can be normalized across all measurables (rows) Samples can be aligned to their subjects using RDF rules Clustering can now be done over all measureables (rows)

69 68

70 69

71 70

72 71

73 72 Data Integration: Chemistry Requirements SampleTargetBiomarkerExperiment Project Paper CompoundsScientistsIssuesAnnotation Pathway Disease A Single Compound

74 73 Reporting on Progression Notify Others of Decisions Progression Manager Found Determinations Noted Alternatives ScientistToxicogenomicist Shared Annotations Notified of Alternatives Semantic Web in R&D A Single Compound Open Data Format and Flexible Linking Enabled Data Integration and Collaboration

75 74 Progression Manager Project Dashboard Scientist R&D Commons Toxicogenomicist Experiment Manager A Single Compound R&D Applications in the Semantic Web

76 HealthCare and Life Sciences IG

77 76 W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group Interest Group formally launched Nov 2005: First Domain Group for W3C - …take SW through its paces An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, AlzForum SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA, Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)

78 77 HCLS Objectives Share use cases, applications, demonstrations, experiences Exposing collections Developing vocabularies Building / extending (where appropriate) core vocabularies for data integration

79 78 HCLS Activities BioRDF - data + NLP as RDF BioONT - ontology coordination Scientific Publishing - evidence management Adaptive Clinical Protocols and Pathways Clinical Trials

80 79 BioRDF: NeuroCommons.org The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: 1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. 2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. 3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.

81 80 BioRDF: Reagents RDF resources that describes various kinds of experimental reagents, starting with antibodies: Initial RDF that captures: Gene, the fact that this is an antibody, various kinds of pages about the antibody, such as vendor documentation, and any other properties that are explicitly captured in the source material Work with the Ontology task force to identify appropriate ontologies and vocabularies to use in the RDF. Write queries against the RDF to answer questions of the sort posed on the Alzforum's

82 81 BioRDF: NCBI NCBI Data: URIs and as RDF Terminology Integration: NLMs UMLS, MESH –SNOMED Olivier Bodensreider

83 82 BioRDF Neuro Tasks Aggregate facts and models around Parkinsons Disease BIRN / Human Brain Project SWAN: scientific annotations and evidence Use RDF and OWL to describe –Brain Connectivity' –Neuronal data in SenseLab

84 83 Other Benefits of Semantic Web Enterprise Distributed Connectivity –Universal Resource Identifiers (URI) Ontologies Data Representation Authenticity –Auditability (Sarbanes-Oxley) –Authorship Non-repudibility Privacy –Encryptibility and Trust Networks Security –At any level of granularity

85 84 W3C Roadmap Semantic Web foundation specificationsfoundation specifications –RDF, RDF Schema and OWL are W3C Recommendations as of Feb 2004 Standardization work is underway in Query, Best Practices and RulesQuery Best PracticesRules Goal of moving from a Web of Document to a Web of Data The Only Open and Web-based Data Integration Model Game in Town

86 85 Leveraging with Semantic Web Free Data from Applications –Map all data to URIs Add semantics to anything on the Web –Ontologies Support the Management of Knowledge –Interpretations and Models

87 86 Leveraging with Semantic Web Free Data from Applications… –Data uniquely defined by URIs, even across multiple databases –Mapped through a common graph semantic model –Data can be distributed (not in one location) –New relations and attributes dynamically added As easy as spreadsheets, but with semantics and web locations Benefit #1

88 87 Leveraging with Semantic Web All things on the Web can have semantics added to them –Ability to define and link in ontologies –Documents Management through Links –Changed data and semantics can be managed as versions –Semantics can be used to define and apply policies –No Need for complex Middleware Benefit #2

89 88 Leveraging with Semantic Web Supporting the Management of Knowledge –All data nodes and doc resources can be linked –Ability to represent Assertions and Hypotheses Include authorship and assumptions Use of KD45 logic –Both Local and Global Knowledge Scientists can upload partially validated facts –View Data and Interpretations through Points-of-View (Semantic Lenses) Share views with others Benefit #3

90 89 What does RDF get you? Structure is not format-rigid (i.e. tree) –Semantics not implicit in Syntax –No new parsers need to be defined for new data Entities can be anywhere on the web (URI) Define semantics into graph structures (ontologies) –Use rules to test data consistency and extract important relations Data can be merged into complete graphs Multiple ontologies supported

91 90 RDF vs. XML example Wang et al., Nature Biotechnology, Sept 2005 AGML HUPML

92 91 RDF Stripe Mode Node>Edge>Node >Edge….

93 92 RDF Graph

94 93 #38;dopt=Abstract&list_uids= e value 23 ; units :nM ; forTarget gsk:GSK3beta } C16H11BrN2O kenpaullone bromo-paullone C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B 1/C16H11BrN2O/c (7-9) (20) (13)16(12)19- 14/h1-7,19H,8H2,(H,18,20)/f/h18H

95 94 gsk:KENPAL rdf:type :Compound ; dc:source bstract&list_uids= ; chemID 3820 ; clogP 2.4 ; kA e-8 ; mw ; ic50 { rdf:type :IC50 ; value 23 ; units :nM ; forTarget gsk:GSK3beta } ; chemStructure C16H11BrN2O ; rdfs:label kenpaullone ; synonym bromo-paullone ; smiles C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B ; inChI 1/C16H11BrN2O/c (7-9) (20) (13)16(12)19- 14/h1-7,19H,8H2,(H,18,20)/f/h18H ; xref

96 95 Multiple Ontologies Used Together Drug target ontology FOAF Patent ontology OMIM Person Group Chemical entity Disease SNP BioPAX UniProt Extant ontologies Protein Under development Bridge concept UMLS Disease Polymorphisms PubChem

97 96 Case Studies

98 97 Case Study: NeuroCommons.org Public Data & Knowledge for CNS R&D Forum Available for industry and academia All based on Semantic Web Standards

99 98 NeuroCommons The Recontribution of Knowledge Publications are usually copyrighted… Knowledge of Nature should be openly shareable!

100 99 NeuroCommons.org The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: 1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. 2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. 3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.

101 100 NeuroCommons Executive Summary The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: 1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. 2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. 3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.

102 101 NeuroCommons First Steps The first stage is underway: Using NLP and other automated technologies, extract machine-readable representations of neuroscience-related knowledge as contained in free text and databases Assemble those representations into a graph Publish the graph with no intellectual property rights or contractual restrictions on reuse

103 102 HCLS Neuro Tasks Aggregate facts and models around Parkinsons Disease SWAN: scientific annotations and evidence Use RDF and OWL to describe –Brain scans in the The Whole Brain Atlas –Neural entries in NCBIs Entrez Gene Database –Brain Connectivity' –Neuronal data in SenseLab –Neurological Disease entries in OMIM

104 103 Semantic Browsers Next generation browsers extend viewing functionality to semantic information via semantic lenses Renders OWL-RDF, XML, and HTML documents Lenses act as information aggregators and logic style-sheets

105 104 Conclusions: Key Semantic Web Principles Plan for change Free data from the application that created it Lower reliance on overly complex Middleware The value in "as needed" data integration Big wins come from many little ones The power of links - network effect Open-world, open solutions are cost effective Importance of "Partial Understanding"

106 Extras…

107 106 What is the Semantic Web ? Its AI Its Web 2.0 Its Ontologies Its Data Tracking Its a Global Conspiracy Its Semantic Webs Its Text Extraction

108 107 W3C Roadmap Semantic Web foundation specificationsfoundation specifications –RDF, RDF Schema and OWL are W3C Recommendations as of Feb 2004 Standardization work is underway in Query, Best Practices and RulesQuery Best PracticesRules Goal of moving from a Web of Document to a Web of Data The Only Open and Web-based Data Integration Model Game in Town

109 108 The Current Web What the computer sees: Dumb links No semantics - treated just like Minimal machine- processable information

110 109 The Semantic Web Machine-processable semantic information Semantic context published – making the data more informative to both humans and machines

111 110 Google Graphs Ranking Sites based on Topology Associate Word frequencies with ranked sites

112 111 The Technologies: RDF Resource Description Framework W3C standard for making statements of fact or belief about data or concepts Descriptive statements are expressed as triples: (Subject, Verb, Object) –We call verb a predicate or a property SubjectObject Property

113 112 Universal, semantic connectivity supports the construction of elaborate structures. What RDF Gets You

114 113 Losing Connectedness in Tables Casp2 Colon ? Fast Uptake and ease of use, but loose binding to entities and terms Casp2 Endodermal

115 114 Data Integration? Querying Databases is not sufficient Data needs to include the Context of Local Scientists Concepts and Vocabulary need to be associated More about Sociology than Technology Information Knowledge

116 115 Standards- Why Not? Good when theres a majority of agreement By vendors, for vendors? Mainly about Data Packing-- should be more about Semantics (user-defined) API dominated (Time trapped) Ease and Expressivity Too often theyre Brittle and Slow to develop Theyre great, thats why there are so many of them

117 116 Data Integration Enables Business Integration: Efficiency and Innovation Searching Visualization Analysis Reporting Notification Navigation

118 117 Searching… #1 way for finding information in companies…

119 118 Catalog General Logical constraints Terms/ glossary Thesauri: BT/NT, Parent/Child, Informal Is-A Formal is-a Frames (Properties) Formal instances Value Restriction Disjointness, Inverse Ontology Dimensions based on McGuinness and Finin Simple Taxonomies Expressive Ontologies MeSH, Gene Ontology, UMLS Meta CYC RDF(S) DB Schema IEEE SUOOWL KEGG TAMBIS EcoCyc BioPAX Ontylog Snomed The Knowledge Semantics Continuum Medication Lists DDI Lists

120 119 Use Case: Personalized Medicine Clinical exam reveals abnormal heart sounds Family History: Father with sudden death at 40, 2 younger brothers apparently normal Ultrasound ordered based on clinical exam reveals cardiomyopathy Structured Physical Exam Structured Family History Imaging Study Reports with Metadata Annotations Dr. Genomus Meets Basketball Player who fainted at Practice

121 120 Clinical Knowledge Genomic Knowledge Figure reprinted with permission from Cerebra, Inc. Information Integration: Ontology OWL ontologies that blend knowledge from the Clinical and Genomic Domains

122 121 Information Integration: Architecture Domain Ontologies for Translational Medicine Research RPDR GIGPAD Study RDF Wrapper RDF Graph 1RDF Graph 2 Merged RDF Graph Instantiation Use of RDF graphs that instantiate these ontologies: -- Rules/semantics-based integration independent of location, method of access or underlying data structures! - Highly configurable, minimize software coding

123 122 Bridging Clinical and Genomic Information Paternal 1 type degree Patient (id = URI1) Mr. X name Person (id = URI2) related_to FamilyHistory (id = URI3) has_family_history Sudden Death problem associated_relative EMR Data Patient (id = URI1) MolecularDiagnosticTestResult (id = URI4) has_structured_test_result MYH7 missense Ser532Pro (id = URI5) identifies_mutation Dialated Cardiomyopathy (id = URI6) indicates_disease LIMS Data Rule/Semantics-based Integration: - Match Nodes with same Ids - Create new links: IF a patients structured test result indicates a disease THEN add a suffers from link to that disease 90% evidence1 95% evidence2

124 123 Bridging Clinical and Genomic Information RDF Graphs provide a semantics-rich substrate for decision support. Can be exploited by SWRL Rules Patient (id = URI1) Mr. X name Person (id = URI2) related_to FamilyHistory (id = URI3) has_family_history Sudden Death problem Paternal 1 type degree associated_relative StructuredTestResult (id = URI4) MYH7 missense Ser532Pro (id = URI5) identifies_mutation Dialated Cardiomyopathy (id = URI6) indicates_disease has_structured_test_result suffers_from has_gene 90% evidence

125 124 Healthcare and Life Sciences: Framework Research Practice Clinical Biological Personalized Medicine Translational Medicine Public Health Research Biosurveillance


Download ppt "Translational Medicine from a Semantic Web Perspective Eric Neumann W3C June 16, 2006."

Similar presentations


Ads by Google