Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Semantic Web Standards to Drug Discovery and Development Eric Neumann W3C HCLS co-chair.

Similar presentations


Presentation on theme: "Applying Semantic Web Standards to Drug Discovery and Development Eric Neumann W3C HCLS co-chair."— Presentation transcript:

1

2 Applying Semantic Web Standards to Drug Discovery and Development Eric Neumann W3C HCLS co-chair

3 2 Knowledge --is the human acquired capacity (both potential and actual) to take effective action in varied and uncertain situations. How does this translate into using Information Systems better in support of Innovation?

4 3 Knowledge Predictiveness Knowledge of Target Mechanisms Knowledge of Toxicity Knowledge of Patient-Drug Profiles

5 4 Where Information Advances are Most Needed Supporting Innovative Applications in R&D –Mol Diagnostics (Biomarkers) –Molecular Mechanisms (Systems) –Data Provenance, Rich Annotation Clinical Information –eHealth Records + EDC –Clinical Submission Documents –Safety Information, Pharmacovigilance, Adverse Events –Handling Biomarker evidence Standards –Central Data Sources Genomics, Diseases, Chemistry, Toxicology –MetaData Ontologies Vocabularies

6 5 Decision Support Translational Research Tox New Applications Safety Target Validation Biomarker Qualification GO BioPAX ICH Raw Data MAGE ML ASN1. XLS Psi XML CSV SAS Tables CDISC Semantic Bridge

7 6 Losing Connectedness in Tables Genes Tissues ? Fast Uptake and ease of use, but loose binding to entities and terms

8 7 Data Integration? Querying Databases is not sufficient Data needs to include the Context of Local Scientists Concepts and Vocabulary need to be associated More about Sociology than Technology Information Knowledge

9 8 Data Integration: Biology Requirements DiseaseProteinsGenesPapers Retention Policy Audit Trail CurationToolsOntologyExperiment Samples Compounds

10 9 Standards- Why Not? Good when theres a majority of agreement By vendors, for vendors? Mainly about Data Packing-- should be more about Semantics (user-defined) Ease and Expressivity Too often theyre Brittle and Slow to develop Theyre great, thats why there are so many of them

11 10 Data Integration Enables Business Integration: Efficiency and Innovation Searching Visualization Analysis Reporting Notification Navigation

12 11 Searching… #1 way for finding information in companies…

13 12 Static, Untagged, Disjoint Existing Web Data Throttles the R&D Potential R&D Scientist Integrating Data Manually LIMS Bioinformatics CheminformaticsPublic Data Sources Dolor Sit Amet Consectetuer Lacreet Dolore Euismod Volutpat Lacreet Dolore Magna Volutpat Nibh Euismod Tincidunt Aliguam Erat Dolor Sit Amet Consectetuer Lacreet Dolore Euismod Volutpat Lacreet Dolore Magna Volutpat Nibh Euismod Tincidunt Aliguam Erat

14 13 Semantic Web Data Integration R&D Scientist BioinformaticsCheminformaticsLIMSPublic Data Sources Dynamic, Linked, Searchable

15 14 The Current Web What the computer sees: Dumb links No semantics - treated just like Minimal machine- processable information

16 15 The Semantic Web Machine-processable semantic information Semantic context published – making the data more informative to both humans and machines

17 16 The Web of Data URIs are universal IDs Distributed data references Non-locality of data NamedGraphs can help segment external references New meaning for Annotation target gene pathway

18 17 Case Study: Omics ApoA1 … … is produced by the Liver … is expressed less in Atherosclerotic Liver … is correlated with DKK1 … is cited regarding Tangiers disease … has Tx Reg elements like HNFR1 Subject Verb Object

19 18 Courtesy of BG-Medicine Example: Knowledge Aggregation

20 19 Data Integration: Chemistry Requirements SampleTargetBiomarkerExperiment Project Paper CompoundsScientistsIssuesAnnotation Pathway Disease A Single Compound

21 20 Tim Berners-Lees App View

22 21 Semantic Web Drug DD Application Space Genomics Therapeutics Biology HTS NDA Compound Opt safety eADM E DMPK informatics manufacturing genes Clinical Studies Patent Chem Lib Production

23 22 W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group Interest Group formally launched Nov 2005: First Domain Group for W3C - …take SW through its paces An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, AlzForum SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA, Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)

24 23 HCLS Objectives Share use cases, applications, demonstrations, experiences Exposing collections Developing vocabularies Building / extending (where appropriate) core vocabularies for data integration

25 24 HCLS Activities BioRDF - data as RDF BioNLP - unstructured data BioONT - ontology coordination Clinical Trials - CDISC/HL7 Scientific Publishing - evidence management Adaptive Healthcare Protocols

26 25 Reporting on Progression Notify Others of Decisions Progression Manager Found Determinations Noted Alternatives ScientistToxicogenomicist Shared Annotations Notified of Alternatives Semantic Web in R&D A Single Compound Open Data Format and Flexible Linking Enabled Data Integration and Collaboration

27 26 Progression Manager Project Dashboard Scientist R&D Commons Toxicogenomicist Experiment Manager A Single Compound R&D Applications in the Semantic Web

28 27 Other Benefits of Semantic Web Enterprise Distributed Connectivity –Universal Resource Identifiers (URI) Authenticity –Auditability (Sarbanes-Oxley) –Authorship Non-repudibility Privacy –Encryptibility and Trust Networks Security –At any level of granularity

29 28 What is the Semantic Web ? Its AI Its Web 2.0 Its Ontologies Its Data Tracking Its a Global Conspiracy Its Semantic Webs Its Text Extraction

30 29 W3C Roadmap Semantic Web foundation specificationsfoundation specifications –RDF, RDF Schema and OWL are W3C Recommendations as of Feb 2004 Standardization work is underway in Query, Best Practices and RulesQuery Best PracticesRules Goal of moving from a Web of Document to a Web of Data The Only Open and Web-based Data Integration Model Game in Town

31 30 Leveraging with Semantic Web Free Data from Applications… –Data uniquely defined by URIs, even across multiple databases –Mapped through a common graph semantic model –Data can be distributed (not in one location) –New relations and attributes dynamically added As easy as spreadsheets, but with semantics and web locations Benefit #1

32 31 Leveraging with Semantic Web All things on the Web can have semantics added to them –Ability to define and link in ontologies –Documents Management through Links –Changed data and semantics can be managed as versions –Semantics can be used to define and apply policies –No Need for complex Middleware Benefit #2

33 32 Leveraging with Semantic Web Supporting the Management of Knowledge –All data nodes and doc resources can be linked –Ability to represent Assertions and Hypotheses Include authorship and assumptions Use of KD45 logic –Both Local and Global Knowledge Scientists can upload partially validated facts –View Data and Interpretations through Points-of-View (Semantic Lenses) Share views with others Benefit #3

34 33 The Technologies: RDF Resource Description Framework Think: "Relational Data Format" W3C standard for making statements of fact or belief about data or concepts Descriptive statements are expressed as triples: (Subject, Verb, Object) –We call verb a predicate or a property SubjectObject Property

35 34 Universal, semantic connectivity supports the construction of elaborate structures. What RDF Gets You

36 35 What does RDF get you? Structure is not format-rigid (i.e. tree) –Semantics not implicit in Syntax –No new parsers need to be defined for new data Entities can be anywhere on the web (URI) Define semantics into graph structures (ontologies) –Use rules to test data consistency and extract important relations Data can be merged into complete graphs Multiple ontologies supported

37 36 RDF vs. XML example Wang et al., Nature Biotechnology, Sept 2005 AGML HUPML

38 37 RDF Stripe Mode Node>Edge>Node >Edge….

39 38 RDF Graph

40 39 #38;dopt=Abstract&list_uids= e value 23 ; units :nM ; forTarget gsk:GSK3beta } C16H11BrN2O kenpaullone bromo-paullone C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B 1/C16H11BrN2O/c (7-9) (20) (13)16(12)19- 14/h1-7,19H,8H2,(H,18,20)/f/h18H

41 40 gsk:KENPAL rdf:type :Compound ; dc:source bstract&list_uids= ; chemID 3820 ; clogP 2.4 ; kA e-8 ; mw ; ic50 { rdf:type :IC50 ; value 23 ; units :nM ; forTarget gsk:GSK3beta } ; chemStructure C16H11BrN2O ; rdfs:label kenpaullone ; synonym bromo-paullone ; smiles C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B ; inChI 1/C16H11BrN2O/c (7-9) (20) (13)16(12)19- 14/h1-7,19H,8H2,(H,18,20)/f/h18H ; xref

42 41 DB Mapping from Current Formats

43 42 Excel => RDF ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2; ls:GE_Expected_Ratio "0.2726"; ls:conditionHub gl:BREAST_MALIGNANT } ; ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:TNFRS; ls:GE_Expected_Ratio "0.0138"; ls:conditionHubgl:BREAST_MALIGNANT } ; ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2; ls:GE_Expected_Ratio "0.1275"; ls:conditionHubgl:BREAST_NORMAL} ; Casp2 TNFRS Breast Malig

44 43 W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group Interest Group formally launched Nov 2005: First Domain Group for W3C - …take SW through its paces –Not a standards group, but a group to identify the best implementations of current SW Standards! An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)

45 44 W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group First formal meeting: Jan 25-26, 2006 Cambridge, MA SW Supporting Vendors: Oracle, IBM, HP, Siemens, Agfa, Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, U Bolzano, AlzForum, Joining W3C gets you in as s group member –Early access to technology and discussions –Interaction with potential partners and clients

46 45 Multiple Ontologies Used Together Drug target ontology FOAF Patent ontology OMIM Person Group Chemical entity Disease SNP BioPAX UniProt Extant ontologies Protein Under development Bridge concept UMLS Disease Polymorphisms PubChem

47 46 Potential Linked Clinical Ontologies Clinical Trials ontology RCRIM (HL7) Genomics CDISC IRB Applications Molecules Clinical Obs ICD10 Pathways (BioPAX) Disease Models Extant ontologies Mechanisms Under development Bridge concept SNOMED Disease Descriptions Tox

48 47 Case Studies

49 48 Case Study: NeuroCommons.org Public Data & Knowledge for CNS R&D Forum Available for industry and academia All based on Semantic Web Standards

50 49 NeuroCommons The Recontribution of Knowledge Publications are usually copyrighted… Knowledge of Nature should be openly shareable!

51 50 NeuroCommons.org The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: 1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. 2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. 3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.

52 51 NeuroCommons Executive Summary The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: 1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. 2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. 3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.

53 52 NeuroCommons First Steps The first stage is underway: Using NLP and other automated technologies, extract machine-readable representations of neuroscience-related knowledge as contained in free text and databases Assemble those representations into a graph Publish the graph with no intellectual property rights or contractual restrictions on reuse

54 53 HCLS Neuro Tasks Aggregate facts and models around Parkinsons Disease SWAN: scientific annotations and evidence Use RDF and OWL to describe –Brain scans in the The Whole Brain Atlas –Neural entries in NCBIs Entrez Gene Database –Brain Connectivity' –Neuronal data in SenseLab –Neurological Disease entries in OMIM

55 54 Dishevelled to GSK3beta IRREVERSIBLE-LEFT-TO-RIGHT INHIBITION Case Study: BioPAX (Pathways)

56 55 Dishevelled to GSK3beta IRREVERSIBLE-LEFT-TO-RIGHT INHIBITION Case Study: BioPAX (Pathways) Modulation CHIR99102 affectedBy

57 56 Case Study: Drug Discovery Dashboards Dashboards and Project Reports Next generation browsers for semantic information via Semantic Lenses Renders OWL-RDF, XML, and HTML documents Lenses act as information aggregators and logic style-sheets add { ls:TheraTopic hs:classView:TopicView }

58 57 Semantic Browsers Next generation browsers extend viewing functionality to semantic information via semantic lenses Renders OWL-RDF, XML, and HTML documents Lenses act as information aggregators and logic style-sheets

59 58 Drug Discovery Dashboard Topic: GSK3beta Topic Target: GSK3beta Disease: DiabetesT2 Alt Dis: Alzheimers Cmpd: SB44121 CE: DBP Team: GSK3 Team Person: John Related Set Path: WNT

60 59 Bridging Chemistry and Molecular Biology urn:lsid:uniprot.org:uniprot: P49841 Semantic Lenses: Different Views of the same data Apply Correspondence Rule: if ?target.xref.lsid == ?bpx:prot.xref.lsid then ?target.correspondsTo.?bpx:prot BioPax Components Target Model

61 60 Lenses can aggregate, accentuate, or even analyze new result sets Behind the lens, the data can be persistently stored as RDF-OWL Correspondence does not need to mean same descriptive object, but may mean objects with identical references Bridging Chemistry and Molecular Biology

62 61 Case Study: Drug SafetySafety Lenses Lenses can focus data in specific ways –Hepatoxicity, genotoxicity, hERG, metabolites Can be wrapped around statistical tools Aggregate other papers and findings (knowledge) in context with a particular project Align animal studies with clinical results Support special Alert-channels by regulators for each different toxicity issue Integrate JIT information on newly published mechanisms of actions

63 62 GeneLogic GeneExpress Data Additional relations and aspects can be defined additionally Diseased Tissue Links to OMIM (RDF)

64 63 ClinDash: Clinical Trials Browser Clinical Obs Expression Data Subjects Values can be normalized across all measurables (rows) Samples can be aligned to their subjects using RDF rules Clustering can now be done over all measureables (rows)

65 64 Case Study: Nokia Developers Forum Portal

66 65 Case Study: TERANODE Design Suite Supports Laboratory Data and Workflow Protocol Modeler –Accelerates workflow development –Eliminates database programming Protocol Player –Guides users through workflow –Automates data capture –Automates complex data flow plates –Integrates lab data with project and enterprise data

67 66 Conclusions: Key Semantic Web Principles Plan for change Free data from the application that created it Lower reliance on overly complex Middleware The value in "as needed" data integration Big wins come from many little ones The power of links - network effect Open-world, open solutions are cost effective Importance of "Partial Understanding"

68 67 Applications of Ontologies Controlling vocabulary (e.g. SNOMED CT) Controlling data types (concepts) Integrating data (instance serialization) Tagging of text to associate meta-data (publishing and search) Reasoning over aggregated information Web-service categorization

69 68 Where is this compound in development? Progression Manager How does this compound affect the pathway? R&D Scientist Toxicogenomicist Are there compounds with similar profiles? Teranode Vision A Single Compound To integrate data and workflow across laboratories and research groups throughout the R&D value chain

70 Efficiency and Innovation: Semantic Web Applications Roadmap


Download ppt "Applying Semantic Web Standards to Drug Discovery and Development Eric Neumann W3C HCLS co-chair."

Similar presentations


Ads by Google