EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI 18.09.2015.

Slides:



Advertisements
Similar presentations
EBI is an Outstation of the European Molecular Biology Laboratory. Bird‘s Eye View of... Molecular Interaction Standards: PSI-MI XML PSI-MI Tool support.
Advertisements

Sandra Orchard EMBL-EBI Molecular Interactions
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title International Molecular Exchange Consortium - IMEx Sandra Orchard EMBL-EBI.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions and Pathways Sandra Orchard EMBL-EBI
In silico systems biology:network reconstruction, analysis and network based modelling EMBO practical course April 2010, Hinxton, UK.
IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Computational analysis of protein-protein interactions for bench biologists 2-8 September, Berlin Protein Interaction Databases Francesca Diella.
Gene Ontology John Pinney
Session outline 1.Standards and the problem of data integration Example: PSICQUIC and the PSICQUIC game 2.Introduction to ontologies. Exploring the Gene.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Protein-protein Interactions Hsueh-Fen Juan 2003, Mar 31 NTNU.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Ch10. Intermolecular Interactions and Biological Pathways
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein.
An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006.
Sandra Orchard Introduction to Molecular Interaction Data Master headline.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein interactions and Pathways
Copyright OpenHelix. No use or reproduction without express written consent1.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Copyright OpenHelix. No use or reproduction without express written consent1.
CZ5211 Topics in Computational Biology Lecture 6: Biological Pathways I: Molecular Interactions Prof. Chen Yu Zong Tel:
IntAct- An Open Standard and Software for Protein-Protein Interaction Data Henning Hermjakob 1, Luisa Montecchi-Palazzi 9, Chris Lewington 1, Dan Wu 1,
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Introduction to IntAct Pablo Porras Millán, IntAct
Copyright OpenHelix. No use or reproduction without express written consent1.
The Mammalian Protein – Protein Interaction Database and Its Viewing System That Is Linked to the Main FANTOM2 Viewer Genome Research (2003) Speaker: 蔡欣吟.
Johannes Griss PSI Meeting Heidelberg, April 2011 EBI is an Outstation of the European Molecular Biology Laboratory. mzTab Proposal for.
Copyright OpenHelix. No use or reproduction without express written consent1.
A curated database of biological pathways.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Protein-Protein Interactions High-throughput strategy –Prediction from sequence In silico analysis –Protein A from species A: domain 1 and 2 –Protein.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
IntAct David Croft A database of Molecular Interactions.
EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Copyright OpenHelix. No use or reproduction without express written consent1.
EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob Henning Hermjakob
High throughput biology data management and data intensive computing drivers George Michaels.
Molecular Interaction Networks Service providers at the BioHackathon: - DIP (Lukasz Salwinski, UCLA) - STRING/STICH (Michael Kuhn, EMBL) - IntAct (Bruno.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Protein-protein Interactions
Ingenuity Pathway Analysis Alex Pico. Description "IPA is a software application that enables researchers to analyze and understand the complex biological.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Interactions and Ontologies
The Complex Portal Birgit Meldal
Presentation transcript:

EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI

What data are we dealing with ? Why are we interested in Interactions ? 1.As a means of precisely understanding a protein role inside a specific cell type 2.To verify data, visualise your own interaction netwrok over the known space 3.Guilt by Association – it may be the only means of predicting a protein’s function 4.As building blocks for System’s Biology and Drug Discovery

Why are there so many issues with interaction data? 1.Wide variety of methods for demonstrating molecular interactions – all have their strengths and weaknesses 2.No single method accurately defines an interaction as being a true binary interaction observed under physiological conditions

Interaction Detection Methods Interaction Detection Methods 1. Complementation assays Function of the readout mechanism can be split into two independent parts and fused to two proteins of interest – readout is only reconstituted when two halves are brought in close proximity by fusion protein binding Typified by Y2H Advantages -Very high numbers of coding sequences assayed in a relatively simple experiment -Wide variety of interactions detected and characterized following one single commonly used protocol - Binding sites can be accurately mapped - In vivo assay.

1. Complementation assays Disadvantages Technical - Spurious activation of reporter genes, e.g. self activators - Use of multiple reporter genes or swap the two domains in the two proteins - Mutational events leading to an increase in the rate of transcription - Fusion to irrelevant small peptides - The cDNA for the interacting protein might not be represented in the library (or under-represented) - No expression of the fusion protein - Insufficient folding and/or stability of a fusion protein Biological - Possibility of indirect interactions - yeast proteins may act as a bridge - Subcellular location: proteins are brought to proximity in the nucleus. This may not be the physiological location of one of the proteins resulting in proteins brought into proximity which would not normally co-express/locate - Different environment in yeast and mammalian cells – loss of physiological control - Absence of the required post-translational modifications - Toxicity of fusion proteins

2. Affinity-based Assays Techniques which depend upon the strength of the interaction between two entities. Typified by affinity chromatography, pulldown & coimmunopreciptiation Advantages - Proteins can be in their native state and at their native concentration (unless transfected) - transfection/prior isolation of proteins allow binding sites to be mapped, and demonstration of binary interactions

2. Affinity-based Assays Disadvantages Technical - Participant determination more problematic. Ab detection depends on prior knowledge and good quality reagents. Mass spec determination still of variable quality Biological - Mixing of compartments during cell lysis/purification, i.e. interacting proteins might not be in the same cellular compartment - Does not indicate whether interaction is direct (except when in vitro) - Can pulldown entire pathways but very transient, weak interactions probably missed

3. Physical methods Depends on physical properties of molecules to enable measurement of an interaction Typified by X-ray crystallography Advantages - high quality data - can be measurable (e.g. SPR - can be very detailed

3. Physical methods Disadvantages Technical -Tend to rely on large amounts of purified proteins -Tend not to work well on hydrophobic proteins e.g. transmembrane -Very expensive, very low-throughput Biological - In vitro techniques, proteins loose all physiological regulation

4. Enzymatic Assays Enzyme/substrate reaction taken as evidence of interaction Advantages - One of the few ways of identifying transient interactions Disadvantages - Can only use in vitro data, too many unknowns if performed in whole cell - many enzymes promiscuous in vitro - requires purified protein

Why do we need interaction databases Issues with all interaction data – true picture can only be built up by combining data derived using multiple techniques, multiple laboratories Problematic for any bench researcher to do – issues with data formats, molecular identifiers, sheer volume of data Molecular interaction databases publicly funded to collect this data and annotate in a format most useful to researchers

Interaction Databases Deep Curation IntAct – active curation, broad species coverage, all molecule types MINT – active curation, broad species coverage, PPIs DIP – active curation, broad species coverage, PPIs MPACT - no curation, limited species coverage, PPIs MatrixDB – active curation, extracellular matrix molecules only InnateDB - active curation – interactions involved in innate immunity BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated Shallow curation BioGRID – active curation, limited number of model organisms HPRD – active curation, human-centric, modelled interactions MPIDB – active curation, microbial interactions

EBI is an Outstation of the European Molecular Biology Laboratory. Molecular Interactions Through IntAct EBI Walthrough May 2009 EBI Data, Standards and Tools

14 1.Publicly available repository of molecular interactions (mainly PPIs) - >250K binary interactions taken from >4,700 publications 2.Data is standards-compliant and available via our website, for download at our ftp site or via PSICQUIC 3.Provide open-access versions of the software to allow installation of local IntAct nodes. IntAct goals & achievements ftp://ftp.ebi.ac.uk/pub/databases/intact

Master headline “Lifecycle of an Interaction” Publication (full text) Sanity Checks (nightly) IntAct Curation CVs curator report Curation manual. abstract reject Super curator annotate p1 p2 I exp IMEx MatrixDB Mint DIP Public web site FTP site accept check

16 UniProt Knowledge Base Interactions in IntAct are using Splice Variants

17 UniProt Knowledge Base IntAct exports interaction data to UniProt. Only interactions detected by specific methods are exported. Mostly physical -> higher quality interactions !

18 Controlled vocabularies Why do we use them ? e.g. more than 20 ways to write: yeast two hybrid, Y2H, 2H, two-hybrid, … Full integration of PSI-MI ontology Over 1,500 terms, fully defined and cross-referenced

Controlled vocabularies

20 Data model Support for detailed features i.e. definition of interacting interface Interacting domains Overlay of Ranges on sequence:

21 How to deal with Complexes Some experimental protocol do generate complex data: Eg. Tandem affinity purification (TAP) One may want to convert these complexes into sets of binary interactions, 2 algorithms are available:

Community standard for Molecular Interactions XML schema and detailed controlled vocabularies Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others Version 1.0 published in February 2004 The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data. Henning Hermjakob et al, Nature Biotechnology Version 2.5 published in October 2007 Broadening the horizon - Level 2.5 of the HUPO-PSI format for molecular interactions. Samuel Kerrien et al., BMC Biology PSI-MI XML format

Data distribution: PSICQUIC Proteomics Standards Initiative Common QUery InterfaCe. Community effort to standardise the way to access and retrieve data from Molecular Interaction databases. Widely implemented by independent interaction data resources. Based on the PSI standard formats (PSI-MI XML and MITAB) Not limited to protein-protein interactions, also e.g. Drug-target interactions Simplified pathway data A registry listing resources implementing PSICQUIC Documentation:

PSICQUIC: distributing data over multiple sources

IMEx: The International Molecular Exchange Consortium Group of major public interaction data providers sharing curation effort: DIP, IntAct, I2D, MINT, MatrixDB, Molecular Connections, InnateDB and MPIDB Independent molecular interaction resources Common curation standards for detailed curation Common data formats (PSI-MI XML, PSICQUIC) Common accession number space Coordinated & non-redundant curation In production mode since February 2010 Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH , with additional partners Vital-IT, Nature, Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai)

26

EBI is an Outstation of the European Molecular Biology Laboratory. Performing and visualing a Simple Search EBI Walthrough May 2009 EBI Data, Standards and Tools

28 IntAct – Home Page

Performing a Simple Search 29

30 Visualizing - networkView From search to networkView…

Extend and Visualise your Search 31

32 Visualizing - networkView Simple, immediate visualisation of your network For manipulation – go to Cytoscape

Cytoscape View 33

EBI is an Outstation of the European Molecular Biology Laboratory. Exploring a single interaction in more depth

Interaction detail 35 First search from the home page… Choice of UniProtKB or Dasty View UniProt Taxonomy PubMed Expansion method Details of interaction

Participant information 36 Search result for ‘RAD1’

Interaction Detail 37 First search from the home page… Choice of UniProtKB or Dasty View UniProt Taxonomy PubMed Expansion method Details of interaction

38 First search from the home page… Details of interaction Viewing Interaction Data Details of interaction

39 Viewing Interaction Details Additional information

IntAct – Home Page-Quick Search 40

Advanced search: Fields Filtering options Add more filtering options

42 Searching with MIQL First search from the home page… Using the Molecular Interaction Query Language (MIQL), one can also build complex queries List of terms one can query on :

43 Browsing – Molecule View Binary view of o60671_human

44 Browsing – extending your search

45 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

PSI-MI XML 2.5 DATA MODEL An overview of the 46

47 Top level structure unchanged compared to PSI-MI 1.0 Use of Id/Ref on main objects Bird’s eye view of PSI-MI XML 2.5

48 Main objects - Experiment Controlled by Ontologies Literature references Confidence measures

49 Main objects - Interactor Generic interactor Reference to a public database

50 Main objects - Interaction Controlled by Ontology Copyright Experiment Kinetics parameters Confidence value

51 Basics – Controlled Vocabularies Why ? Ensure data consistency Provide reliable mean for searching & filtering data How ? By providing a reference to an ontology term Using Xref !!

52 Main objects - Participant e.g. enzyme target Interactor e.g. bait, prey Delivery method expression level… Interactor used experimentally Building of Complex

PSI-MI TAB DATA MODEL An overview of the 53

54 Standard columns (15): ID(s) interactor A & B Alt. ID(s) interactor A & B Alias(es) interactor A & B Interaction detection method(s) Publication 1st author(s) Publication Identifier(s) Taxid interactor A & B Interaction type(s) Source database(s) Interaction identifier(s) Confidence value(s) PSIMITAB Standard Columns

INTACT EXTENDED MITAB A quick look into 55

56 IntAct specific columns (+11): Experimental role(s) of interactors Biological role(s) of interactors Properties (CrossReference) of interactors Type(s) of interactors HostOrganism(s) Expansion method(s) Dataset name(s) Standard columns (15): ID(s) interactor A & B Alt. ID(s) interactor A & B Alias(es) interactor A & B Interaction detection method(s) Publication 1st author(s) Publication Identifier(s) Taxid interactor A & B Interaction type(s) Source database(s) Interaction identifier(s) Confidence value(s) + PSIMITAB Extended Columns

PSI-MI XML 2.5 JAVA API A hands on introduction to 57

58 PSI-MI XML Java API Uses Java 5 Provides binding between XML and Java object model Tools to read/write XML from/to file Read can be done in 2 fashions: Load a whole file in an EntrySet Only allows to load large files if you have enough memory Easy to update content and write back to file Index XML data and give access though an IndexedEntry Memory efficient with large files Allows to browse through interactions, experiments… Trickier to write updated content (yet, feasible)

PSI-MI TAB 2.5 JAVA API A hands on introduction to 59

60 PSI-MI TAB Java API Uses Java 5 Provides binding between TAB and a Java object model Tools to read/write TAB from/to file You can read in 2 fashions: Load a whole file in a Collection Only allows to load large files if you have enough memory Load interaction one at a time using Iterator Memory efficient with large files

61 PSI-MI XML is the de facto standard for molecular interactions We have code samples & exercises for both APIs ! Let me know if you want access to it … The Java API makes it easy to handle Summary PSI-MI Home page API Download ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25 Data

R packages for PSI-MI Quick introduction to 62

63 Rintact & RpsiXML Initiative from the Wolfgang Huber’s group at the EBI Enables PSI-MI XML data read into R data structure Enables data analysis using existing packages such as: RBGL, ppiStats, apComplex, … Currently supports: IntAct, MINT, HPRD, DIP, BioGRID, MIPS/CORUM, MatriDB, MPACT. API Download Documentation