Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Slides:



Advertisements
Similar presentations
The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.
Advertisements

Semantic Similarity Measures Across The Gene Ontology. Relating Sequence to Annotation. P.W. Lord, R.D. Stevens, A.Brass, and C. Goble Department of Computer.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
The Temptation of Technology Robert Stevens BioHealth Informatics Group University of Manchester
Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas.
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Pfam(Protein families )
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Internet tools for genomic analysis: part 2
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Protein function and classification Hsin-Yu Chang
Automatic methods for functional annotation of sequences Petri Törönen.
Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY.
Machine-learning in building bioinformatics databases for infectious diseases Victor Tong Institute for Infocomm Research A*STAR, Singapore ASEAN-China.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), Genome Analysis.
An Ontological Approach for Describing Phospho-proteins in Rhodococcus Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha,
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein and RNA Families
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Copyright OpenHelix. No use or reproduction without express written consent1.
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Semantic Web BY: Josh Rachner and Julio Pena. What is the Semantic Web? The semantic web is a part of the world wide web that allows data to be better.
Bioinformatics and Computational Biology
You can request PRO terms by using the SourceForge PRO tracker (Fig 3A) or by directly contributing to PRO by providing the information in the RACE-PRO.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Protein families, domains and motifs in functional prediction May 31, 2016.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Protein families, domains and motifs in functional prediction
Biological Databases By: Komal Arora.
Uncovering the Protein Tyrosine Phosphatome in Cattle
Protein Families, Motifs & Domains.
GO : the Gene Ontology & Functional enrichment analysis
Demo: Protein Information Resource
Sequence based searches:
Genome Annotation Continued
ece 720 intelligent web: ontology and beyond
PIR: Protein Information Resource
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Prediction of protein function from sequence analysis
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester

Introduction Automated classification of proteins into protein subfamilies 1.Background 2.Architecture 3.Advantages 4.Results 5.Future directions

Motivation Biological data production fast -High throughput techniques -Large numbers of species being sequenced -Large amount of data uncharacterised Data analysis is now the rate-limiting step

Why Classify? Classification and curation of a genome is the first step in understanding the processes and functions happening in an organism Classification enables comparative genomic studies - what is already known in other organisms The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology

Protein Classification Proteins divided into broad functional classes Protein Families - evolutionary relationships - common domain architecture Relationship between sequence and structure allows searching for distinct structural (and functional) domains within the sequence Domains could be several amino acids long – or could span most of the protein

Example A search of the linear sequence of protein tyrosine phosphatase type K – identified 9 functional domains >uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC ) (R-PTP-kappa). MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV………..

Protein Family Classification Often diagnostic domains/motif signify family membership e.g. ALL proteins with a tyrosine protein kinase-specific active site (IPR008266) domain are types of tyrosine kinase

Current Techniques Human expert classification –gold standard – human knowledge applied to results from bioinformatics analysis tools Automated use of bioinformatics analysis tools –quick –less detailed

Automated Methods Bioinformatics analysis tools top BLAST hit - annotating as similar to other known proteins - Could result in protein A is similar to protein B, which is similar to protein C, which is similar to protein D etc, etc, Interpro Scan analysis - shows number and types of domains, but does not provide interpretations

Human Expert Annotation Same similarity searching tools used for domain/motif identification Humans use expert knowledge to classify proteins according to domain arrangements Presence / order / number of each important Can an ontology be used to capture this knowledge to the standard of a human annotator?

Ontology Approach Use ontology to capture the rules for protein family membership in formal OWL representation Ontology contains the human expert knowledge Ontology reasoning can take the place of human analysis of the data

The Protein Phosphatases large superfamily of proteins – involved in the removal of phosphate groups from molecules Important proteins in almost all cellular processes Involved in diseases – diabetes and cancer human phosphatases well characterised

Phosphatase Functional Domains Andersen et al (2001) Mol. Cell. Biol

Determining Class Definitions R5 -Contains 2 protein tyrosine phosphatase domains -Contains 1 transmembrane domain -Contains 1 fibronectin domains -Contains 1 carbonic anhydrase

Protégé OWL Modelling

Requirements Extract phosphatase sequences from rest of protein sequences from a whole genome Identify the domains present in each Compare these sequences to the formal ontology descriptions Classify each protein instance to a place in the hierarchy

Architecture Instance Store myGrid Services OWL DL ontology Reasoner (racer) Classified Protein Phosphatases Raw protein sequences

myGrid Services extract protein phosphatase sequences from whole genome using simple filtering – patmatdb EMBOSS tool used to extract proteins with phosphatase diagnostic motifs perform InterproScan to determine domain architecture transform the InterproScan results into abstract OWL instance descriptions

InterproScan Results

Conversion to abstract OWL format restriction( cardinality(1))

Instance Store Instance Store enables reasoning over individuals Can support much higher numbers of individuals OWL ontology is loaded into the instance store A DL reasoner (racer) is used to compare individuals to the OWL ontology definitions

Instance Store

Example Instances Protein Individual Dual Specificity Phosphatase DUSE restriction( cardinality(1)) Ontology Definition of Dual Specificity Phosphatase containsDomain IPR Necessary and Sufficient for class membership Also inherits containsDomain IPR from Parent Class PTP

Results Human phosphatases have been classified using the system The ontology classification performed equally well as expert classification The ontology system refined classification - DUSC contains zinc finger domain characterised and conserved – but not in classification - DUSA contains a disintegrin domain previously uncharacterised – evolutionarily conserved

Aspergillus fumigatus Phosphatase proteins very different from human >100 human <50 A.fumigatus Whole subfamilies missing Different fungi-specific phosphorylation pathways? No requirement for tissue-specific variations? Novel serine/threonine phosphatase with homeobox conserved in aspergillus and closely related species, but not in any other - virulence

Ongoing Work Phosphatases in other genomes –Trypanosomes –Plasmodium falciparum Other protein families –Ion Channels –ABC transporters –Nuclear receptors

Conclusions Using ontology allows automated classification to reach the standard of human expert annotation Reasoning capabilities allow interpretation of domain organisation Highlights anomalies and variations from what is known Allows fast, efficient comparative genomics studies

Acknowledgements PhD Supervisors: Andy Brass, Robert Stevens Group: myGrid, Phil Lord, Carole Goble Phosphatase Biologist: Lydia Tabernero Medical Research Council