Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,

Slides:



Advertisements
Similar presentations
Semantic Similarity Measures Across The Gene Ontology. Relating Sequence to Annotation. P.W. Lord, R.D. Stevens, A.Brass, and C. Goble Department of Computer.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Improved TF-IDF Ranker
Using Semantic Similarity Measures in the Biomedical Domain for Computing Similarity between Genes based on Gene Ontology By : Elham Khabiri Adviser :
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Creating a Similarity Graph from WordNet
Gene Ontology John Pinney
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
COG and GO tutorial.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
09 / 23 / Predicting Protein Function Using Machine-Learned Hierarchical Classifiers Roman Eisner Supervisors: Duane Szafron.
Internet tools for genomic analysis: part 2
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Protein and Function Databases
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
An introduction to using the AmiGO Gene Ontology tool.
Automatic methods for functional annotation of sequences Petri Törönen.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Discovery from Linking Open Data (LOD) Annotated Datasets Louiqa Raschid University of Maryland PAnG/PSL/ANAPSID/Manjal.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Julia Stoyanovich, William Mee, Kenneth A. Ross New England DB Summit 2010 Semantic Ranking and Result Visualization for Life Sciences Publications.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1 Dong-Soo Han 1
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
The Gene Ontology Categorizer C.A. Joslyn 1, S.M. Mniszewski 1, A. Fulmer 2 and G. Heaton 3 1 Computer and Computational Sciences, Los Alamos National.
Algorithmic Detection of Semantic Similarity WWW 2005.
Labeling and Enhancing Life Science Links S. Heymann*, F. Naumann*, L. Raschid +, P. Rieger * * Humboldt Universität zu Berlin + University of Maryland.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Semantic Processing with Context Analysis
GO : the Gene Ontology & Functional enrichment analysis
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Department of Genetics • Stanford University School of Medicine
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Overview Gene Ontology Introduction Biological network data
Batyr Charyyev.
Anastasia Baryshnikova  Cell Systems 
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Predicting Gene Functions from Text Using a Cross-Species Approach
Presentation transcript:

Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load, R.D.Stevens,A.Brass and C.A.Goble University of Manchester Presented by 임 동혁 July 22, 2005

Contents Introduction Semantic Similarity Measures Validating Semantic Similarity Investigating Semantic and Sequence Similarity Semantic Searching of GO Annotated Resources Discussion

Introduction Bioinformatics resources Ontologies In form of sequence, which are then annotated In scientific natural language as text Human readable and understandable Not easy to interpret computationally Ontologies Provide a mechanism for capturing a view of a domain in a shareable form Both accessible by humans and computationally amenable Provide a set of vocabulary terms that label concepts in the domain “is-a” relationship between parent and child “part-of” relationship between part and whole

Gene Ontology(1/2) GO comprises three orthogonal taxonomies of aspects Molecular function Biological process Cellular component GO is a rapidly growing collection of about 11000 phrases, representing terms or concepts Directed Acyclic Graph(DAG)

Gene Ontology(2/2) Allow improved querying of databases Different resources queried with the same term Shared understanding improve retrieval consistency across resources and recall and precision One obvious alternative way Ask for proteins semantically similar to a query protein Semantic similarity Taxonomy of biomedical terms Ex) Medical Subject Heading(MeSH) : similar content(by words)

the Gene Ontology Receptor-associated protein GO:0016962 p=0.00159 Transmembrane receptor GO:0004888 p=0.0997 isa isa signal transducer GO:0004871 p=0.208 isa receptor GO:0004872 p=0.124 isa isa photoreceptor GO:0009881 p=0.000433 molecular function GO:0003674 p=1 isa Receptor signaling protein GO:0005057 p=0.0281 isa isa chaperone GO:0003754 p=0.0102 ligrand GO:0005102 p=0.0460 Two proteins are both annotated as “transmembrane receptor” (GO:0004888) Similar semantic description One as just “receptor”(GO:0004872) Semantically less similar

Semantic Similarity Measure(1/3) Early techniques (Rada et al, 1989) Path distances between terms Assumes that all of semantic links are of equal weight Poor assumption Ex) “photoreceptor” and “transmembrane receptor” are semantically more closely related than “chaperone” and “signal transducer”

Semantic Similarity Measure(2/3) Edge could be weighted The greater distance from root of the graph, the more specific the terms However, GO varies widely in the distance of nodes from the root Ex) (GO:0005300) is 14 terms deep, (GO:0008435) is only 3 terms deep Not significantly less semantically precise

Semantic Similarity Measure(2/3) Usage of terms within the corpus (Resnik, 1999) Use the notion of “information content” Familiar from most internet search engines Ex) “chaperone” is a more informative term than “signal transducer” The former is used several times, the later thousand times GO:0004872 occurs, GO:0004871 and GO:0003674 have also occurred (“is-a” link are considered) More informative

Probabilities in the Gene Ontology Each node is annotated with its GO accession and the probability of this term occurring in the SWISS-PROT-Human database 1. Count the number of times each concept occurrs, 2. A concept occurs if a term, or any node its children occur 3. The probability, p(c), for each node is this value, divided by the number of times (the probability of root node will be 1)

Semantic Similarity between terms Use simplest of measure (Resnik, 1999) Based on the information content of shared parents of the two terms S(c1, c2) is the set of parental concepts shared by both c1 and c2 Minimum p(c) : GO allows multiple path Pms(probability of the minimum subsumer) Similirity score between two terms As probability increase, informativeness decrease

Validating Semantic Similarity How do we validate such a measure? Protein’s sequence relates to its function Highly similar sequences should be highly semantically similar Protein sequences in pairs and plotting sequence similarity against semantic ssimilarity should a relationship

Adapting the Similarity Measures to GO and SWISS-PROT “part-of” relationship Orphan term Linked them directly to the root Ex) GO:0009542 Is-a’s links alone Proteins may be annotated with more than a single term Wordnet : Maximum similarity GO : average similarity

Comparing Semantic Similarity Across GO Aspects There is a good correlation between sequence similarity and semantic similarity The correlation is greater when measured against the “molecular function”

The Relationship Between Semantic Similarity and Evidence Codes TAS : regarded as the highest standard of evidence When only TAS GO annotation are considered, the correlation is much greater

Effect of Using Semantic Links in Semantic Similarity Consider only links of a single type “is-a” or “part-of” Little difference between all link and “is-a” : almost link are of “is-a” type (6167 / 6202) No links drop in the middle part : proteins share similar (links are included in semantic similarity measure)

Analysis(1/2) Very high semantic similarity but little sequence similarity “Polymorphic” groups Two or more classes of protein involved in the same process Heterodimerize or sub-families Hyper variable protein families arbitrary Mis-annotations SWISS-PROT “x-like” but in GO “x” Spelling mistake

Analysis(2/2) - Example

Semantic Searching of GO Annotated Resources Develop a search tool Given query protein against all the others in SWISS-PROT-Human Generates a ranked list of semantically similar proteins Ex) “OPSR_HUMAN”

Discussion Investigated semantic similarity measure Future work All cases semantic similarity is correlated with sequence similarity GO aspect : molecular funstion Evidence code : “Traceable Author Statement” Future work Effect of the different semantic links in ontologies Co-expression as revealed by microarray experiments Expect that biological process aspect would be of great use