Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Acknowledgements UGA Biologists/Biochemists Will York, Complex Carbohydrate Research Center Jonathan Arnold, Fungal Genomics Phillip Bowen, CCQC Project.
PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
The bioinformatics of biological processes The challenge of temporal data Per J. Kraulis CMCM, Tartu University.
Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications WWW2004 (New York, May 22,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Knowledge Representation Ontology are best delivered in some computable representation Variety of choices with different: –Expressiveness The range of.
Semantic Web Technologies ufiekg-20-2 | data, schemas & applications | lecture 21 original presentation by: Dr Rob Stephens
Knowledge Enabled Information and Services Science GlycO.
Kno.e.sis Center, Wright State University,
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Semantics in the Semantic Web– the implicit, the formal and the powerful (with a few examples from Glycomics) Amit Sheth Large Scale Distributed Information.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Flexible Text Mining using Interactive Information Extraction David Milward
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Semantic (Web) Technology in Action - today The Semantic Web – Scientific American article considered harmful? WWW2003 Panel (PN2), Budapest, May 21, 2003.
Knowledge Enabled Information and Services Science Glycomics project overview.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
OntoQA: Metric-Based Ontology Quality Analysis Samir Tartir, I. Budak Arpinar, Michael Moore, Amit P. Sheth, Boanerges Aleman-Meza IEEE Workshop on Knowledge.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Mining the Biomedical Research Literature Ken Baclawski.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
Copy right 2004 Adam Pease permission to copy granted so long as slides and this notice are not altered Ontology Overview Introduction.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
High throughput biology data management and data intensive computing drivers George Michaels.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Special thanks to Christopher Thomas & Satya Sanket Sahoo
‘Ontology Management’ Peter Fox (Semantic Web Cluster lead)
ece 627 intelligent web: ontology and beyond
LSDIS Lab, Department of Computer Science,
Semantic Visualization
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Amit Sheth LSDIS Lab & Semagix University of Georgia
ece 627 intelligent web: ontology and beyond
Piotr Kaminski University of Victoria September 24th, 2002
Collaborative RO1 with NCBO
Amit Sheth, CTO, Semagix Inc
Service-enabling Biomedical Research Enterprise
Presentation transcript:

Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of Georgia and Semagix, Inc. W3C workshop on Semantic Web for Life SciencesW3C workshop on Semantic Web for Life Sciences, October 28, 2004, Cambridge MA Thanks to Will York, Christopher Thomas, Satya Sanket Sahoo

NIH Integrated Technology Resource for Biomedical Glycomics Complex Carbohydrate Research Center The University of Georgia Biology and Chemistry Michael Pierce – CCRC (PI) Al Merrill - Georgia Tech Kelley Moremen - CCRC Ron Orlando - CCRC Parastoo Azadi – CCRC Stephen Dalton – UGA Animal Science Bioinformatics and Computing Will York - CCRC Amit Sheth, Krys Kochut, John Miller; UGA Large Scale Distributed Information Systems Laboratory

Quick Take Comprehensive and deep domain ontology –GlycO Process ontology to go beyond provenance –ProGlycO Semantic Annotation of Scientific data –Textual –Experimental, machine-generated, non-textual Tool for ontology visualization, querying,… All open source, free

Central Role of Ontology Ontology represents agreement, represents common terminology/nomenclature Ontology is populated with extensive domain knowledge or known facts/assertions Key enabler of semantic metadata extraction from all forms of content: –unstructured text (and 150 file formats) –semi-structured (HTML, XML) and –structured data Ontology is in turn the center price that enables –resolution of semantic heterogeneity –semantic integration –semantically correlating/associating objects and documents

Types of Ontologies (or things close to ontology) Upper ontologies: modeling of time, space, process, etc Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), SWETO, WordNet ; Domain-specific or Industry specific ontologies –News: politics, sports, business, entertainment –Financial Market –Terrorism –Pharma –GlycO –(GO (a nomenclature), UMLS inspired ontology, …) Application Specific and Task specific ontologies –Anti-money laundering –Equity Research –Repertoire Management Blue: Commercial ontologies developed by Semagix or its customers; Brown: open/public ontologies from LSDIS Lab, Univ. of Georgia

Expressiveness Range: Knowledge Representation and Ontologies Catalog/ID General Logical constraints Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restriction Disjointness, Inverse, part of… Ontology Dimensions After McGuinness and Finin Simple Taxonomies Expressive Ontologies Wordnet CYC RDFDAML OO DB SchemaRDFS IEEE SUOOWL UMLS GO KEGG TAMBIS EcoCyc BioPAX GlycO SWETO Pharma

Ontology can be very large Semantic Web Ontology Evaluation Testbed – SWETO v1.4 is Populated with over 800,000 entities and over 1,500,000 explicit relationships among them Continue to populate the ontology with diverse sources thereby extending it in multiple domains, new larger release due soon Two other ontologies of Semagix customers have over 10 million instances, and requests for even larger ontologies exist

GlycO statistics: Ontology schema can be large and complex 767 classes 142 slots Instances Extracted with Semagix Freedom: –69,516 genes (From PharmGKB and KEGG) –92,800 proteins (from SwissProt) –18,343 publications (from CarbBank and MedLine) –12,308 chemical compounds (from KEGG) –3,193 enzymes (from KEGG) –5,872 chemical reactions (from KEGG) –2210 N-glycans (from KEGG)

GlycO is a focused ontology for the description of glycomics models the biosynthesis, metabolism, and biological relevance of complex glycans models complex carbohydrates as sets of simpler structures that are connected with rich relationships

GlycO taxonomy The first levels of the GlycO taxonomy Most relationships and attributes in GlycO GlycO exploits the expressiveness of OWL-DL. Cardinality constraints, value constraints, Existential and Universal restrictions on Range and Domain of properties allow the classification of unknown entities as well as the deduction of implicit relationships.

Query and visualization

A biosynthetic pathway GNT-I attaches GlcNAc at position 2 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + G00020 UDP + G00021 N-acetyl-glucosaminyl_transferase_V N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4

The impact of GlycO GlycO models classes of glycans with unprecedented accuracy. Implicit knowledge about glycans can be deductively derived Experimental results can be validated according to the model

Identification and Quantification of N-glycosylation Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Peptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1

ProglycO – Structure of the Process Ontology Four structural components † :  Sample Creation  Separation (includes chromatography)  Mass spectrometry  Data analysis †: pedro download.man.ac.uk/Domains.shtml

Semantic Annotation of Scientific Data ms/ms peaklist data <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_m ass_spectrometer mode = “ms/ms”/> Annotated ms/ms peaklist data

Semantic annotation of Scientific Data Annotated ms/ms peaklist data <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_s pectrometer” mode = “ms/ms”/>

Beyond Provenance…. Semantic Annotations  Data provenance: information regarding the ‘ place of origin ’ of a data element  Mapping a data element to concepts that collaboratively define it and enable its interpretation – Semantic Annotation  Data provenance paves the path to repeatability of data generation, but it does not enable:  Its (machine) interpretability  Its computability (e.g., discovery) Semantic Annotations make these possible.

Identified and quantified peptides Specific cellular process Lectin Collection of N-glycan ligands Collection of Biosynthetic enzymes Discovery of relationship between biological entities Fragment of Specific protein GlycOProglycO Gene Ontology (GO) Genomic database (Mascot / Sequest ) The inference: instances of the class collection of Biosynthetic enzymes (GNT-V) are involved in the specific cellular process (metastasis). processprocess

Dimensions of expressiveness complexity bivalentMultivalued discrete continu ous Degree of Agreement Informal Semi-Formal Formal Expressiveness XML RDF FOL with functions Current Semantic Web Focus Future research Cf: Guarino, Gruber RDFS/OWL FOL w/o functions

The downside That a structure is not valid according to the ontology could just mean that it is a new kind of structure that needs to be incorporated That a substance can be synthesized according to one pathway does not exclude the synthesis through another pathway

Man 9 GlcNAc 2 Glycan is a Glycosyl Transferase is a synthesizes May Synthesize Mannose contains transfers May Synthesize Lipid  -mannosyl transferase Probabilistic Relationships

For more information –Especially see Glycomics projectGlycomics –SWETO ontologySWETO

Backup slides

Automatic Semantic Annotation of Text: Entity and Relationship Extraction KB, statistical and linguistic techniques

Ontologies – many questions remain How do we design ontologies with the constituent concepts/classes and relationships? How do we capture knowledge to populate ontologies Certain knowledge at time t is captured; but real world changes imprecision, uncertainties and inconsistencies –what about things of which we know that we don’t know? –What about things that are “in the eye of the beholder”? Need more powerful semantics

What we need We need a formalism that can –express the degree of confidence that e.g. a glycan is synthesized according to a certain pathway. –express the probability of a glycan attaching to a certain site on a protein –derive a probability for e.g. a certain gene sequence to be the origin of a certain protein

What we want Validate pathways with experimental evidence. Many pathways still need to be verified. Reason on experimental data using statistical techniques such as Bayesian reasoning Are activities of iso-forms of biosynthetic enzymes dependent on physiological context? (e.g. is it a cancer cell?)

How to power the semantics A major drawback of logics dealing with uncertainties is the assignment of prior probabilities and/or fuzzy membership functions. Values can be assigned manually by domain experts or automatically Techniques to capture implicit semantics –Statistical methods –Machine Learning