Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sandra Orchard Introduction to Molecular Interaction Data Master headline.

Similar presentations


Presentation on theme: "Sandra Orchard Introduction to Molecular Interaction Data Master headline."— Presentation transcript:

1 Sandra Orchard (orchard@ebi.ac.uk) Introduction to Molecular Interaction Data Master headline

2 Living cells contain crowded and diverse molecular environments Proteins constitute ~30% of E. coli and ~5% of yeast cytoplasm by weight ~2000 protein types are co-expressed co-localized in yeast cytoplasm

3 3 Example of a PPI Network Nodes – proteins Edges – interactions >80% of proteins are all connected in one giant cluster of PPI network Small-world effect median network distance – 6 steps

4 4 Why is it useful to study PPI networks? Proteins are the workhorses of cell, enzymes, structural proteins, signal transduction, transport, transcription, translation and degradation, traversing membranes … all done as a functional/regulatory network. By mapping these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation One way to predict protein function is through identification of binding partners – Guilt by Association If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s)

5 5 Why is it useful to study the structure of PPI networks? Common properties of biological networks – can use these to understand cell behaviour Can help us relate network structure to biological function Understand a protein’s relative position in a network Correlate conserved functional modules with protein complexes

6 Properties of Networks Scale-free effect - the majority of nodes in a scale-free network have only a few connections to other nodes, whereas some nodes (hubs) are connected to many other nodes in the network Master headline

7 Properties of networks Scale-free networks are stable - if failures occur at random and the vast majority of proteins are those with a small degree of connectivity, the likelihood that a protein hub would be affected is small. Even if a hub-failure occurs, the network will generally not lose its connectedness, due to the remaining hubs. However, if we lose a few major hubs from the network, the network is turned into a set of rather isolated graphs. Many cancer-linked proteins are hub proteins Master headline

8 Properties of networks Scale free networks are resistant to random failure but vulnerable to targeted attack, specifically against hubs. This property has been held to account for the robustness of biological networks to perturbations like mutation and environmental stress. One model of a proteome views date hubs as global or ‘higher level’ connectors between modules and party hubs function inside modules at a ‘lower level’ of the organization Master headline

9 Properties of networks Given the current limited coverage levels (bias towards small soluble cytoplasmic proteins) and variable quality of interaction data, the observed scale-free topology of existing interactome maps cannot be confidently extrapolated to complete interactomes. There is a real need to increase coverage through further experimentation and increased data input into interaction databases Master headline

10 Why are there so many issues with interaction data? 1. Wide variety of methods for demonstrating molecular interactions – all have their strengths and weaknesses 2.No single method accurately defines an interaction as being a true binary interaction observed under physiological conditions

11 Interaction Detection Methods 1. Complementation assays Function of the readout mechanism can be split into two independent parts and fused to two proteins of interest – readout is only reconstituted when two halves are brought in close proximity by fusion protein binding Typified by Y2H Advantages -Very high numbers of coding sequences assayed in a relatively simple experiment -Wide variety of interactions detected and characterized following one single commonly used protocol - Binding sites can be accurately mapped - In vivo assay.

12 1. Complementation assays Disadvantages Technical - Spurious activation of reporter genes, e.g. self activators - Use of multiple reporter genes or swap the two domains in the two proteins - Mutational events leading to an increase in the rate of transcription - Fusion to irrelevant small peptides - The cDNA for the interacting protein might not be represented in the library (or under-represented) - No expression of the fusion protein - Insufficient folding and/or stability of a fusion protein Biological - Possibility of indirect interactions - yeast proteins may act as a bridge - Subcellular location: proteins are brought to proximity in the nucleus. This may not be the physiological location of one of the proteins resulting in proteins brought into proximity which would not normally co-express/locate - Different environment in yeast and mammalian cells – loss of physiological control - Absence of the required post-translational modifications - Toxicity of fusion proteins

13 2. Affinity-based Assays Techniques which depend upon the strength of the interaction between two entities. Typified by affinity chromatography, pulldown & coimmunopreciptiation Advantages - Proteins can be in their native state and at their native concentration (unless transfected) - transfection/prior isolation of proteins allow binding sites to be mapped, and demonstration of binary interactions

14 2. Affinity-based Assays Disadvantages Technical - Participant determination more problematic. Ab detection depends on prior knowledge and good quality reagents. Mass spec determination still of variable quality Biological - Mixing of compartments during cell lysis/purification, i.e. interacting proteins might not be in the same cellular compartment - Does not indicate whether interaction is direct (except when in vitro) - Can pulldown entire pathways but very transient, weak interactions probably missed

15 3. Physical methods Depends on physical properties of molecules to enable measurement of an interaction Typified by X-ray crystallography Advantages - high quality data - can be measurable (e.g. SPR - can be very detailed

16 3. Physical methods Disadvantages Technical - Tend to rely on large amounts of purified proteins - Tend not to work well on hydrophobic proteins e.g. transmembrane - Very expensive, very low-throughput Biological - In vitro techniques, proteins loose all physiological regulation

17 4. Enzymatic Assays Enzyme/substrate reaction taken as evidence of interaction Advantages - One of the few ways of identifying transient interactions Disadvantages - Can only use in vitro data, too many unknowns if performed in whole cell - many enzymes promiscuous in vitro - requires purified protein

18 5. Co-localization Master headline Advantages – the only proof that 2 molecules are expressed in same time and space under ‘normal’ conditions Disadvantage – no actually proof of a physical interaction

19 Molecular Interactions All data artefactual to a greater or lesser extent Interaction determinations build a degree of confidence in an interaction Users need to understand this before attempting to interpret molecular interaction experiments

20 Why do we need interaction databases Issues with all interaction data – true picture can only be built up by combining data derived using multiple techniques, multiple laboratories Problematic for any bench researcher to do – issues with data formats, molecular identifiers, sheer volume of data Molecular interaction databases publicly funded to collect this data and annotate in a format most useful to researchers

21 Interaction Databases Deep Curation IntAct – active curation, broad species coverage, all molecule types MINT – active curation, broad species coverage, PPIs – interactions now in IntAct DIP – active curation, broad species coverage, PPIs MatrixDB – active curation, extracellular matrix molecules only MPACT - no curation, limited species coverage, PPIs BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated Shallow curation BioGRID – active curation, limited number of model organisms HPRD – Ceased curation 2010, human-centric, modelled interactions *InnateDB - active curation – interactions involved in innate immunity *I2D – active curation – PPIs involved in cancer

22 Why are data standards essential Prior to 2003, many databases= many formats. Onus on the user to reformat when merging data File conversion inevitably leads to data loss Many formats compromised tool development – each tool developed tended to be database specific 22

23 23 Community standard for Molecular Interactions XML schema and detailed controlled vocabularies Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others Version 1.0 published in February 2004 The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data. Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183. Version 2.5 published in October 2007 Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions; Samuel Kerrien et al. BioMed Central. 2007. PSI-MI XML format

24 24 Collecting and combining data from different sources has become easier Standardized annotation through PSI-MI ontologies Tools from different organizations can be chained, e.g. analysis of IntAct data in Cytoscape. PSI-MI XML benefits http://www.psidev.info/MI Home page

25 www.ebi.ac.uk/ols Controlled vocabularies

26 IMEx There are many databases providing large amounts of data BUT IntAct, DIP, MINT provide original curation, many databases (IRefIndex, APID, I2D, String..) do not curate but merge data from curation resources. IntAct data is repeated in multiple other resources Curation databases formed a consortium to provide users with a single, non-redundant dataset 26

27 IMEx Independent molecular interaction resources all separately funded and with their own curation priorities Spent several years developing Common curation standards for detailed curation and a joint curation manual Common data formats – all data downloadable in PSI formats (PSI-MI MITAB/XML) IMEx is an instance of PSICQUIC, specific records are tagged as part of the IMEx set and only these records are searchable and downloadable on the website. 27

28 IMEx Coordinated & non-redundant curation – databases ensure that each paper is curated once, and once only by a single member database. Each paper is registered with a central database, IMEx Central, which ensures curation is not repeated by a second database 28

29 IMEx Common accession number space – all submitted data gets an IMEx ID and is searchable on the IMEx site, the site of submission and multiple member database sites 29

30 IMEx partners IntAct – Active DIP – Active MINT – Active MatrixDB – Active I2D - Active Innate DB – Active Molecular Connections – Active UniProtKB – Active UCL-BHF - Active MBInfo – Active (MPACT – Inactive) (BIND – Inactive) (MPIDB – Inactive) – data in IntAct PRIMESDB - Observer BioGRID - Observer

31 MBInfo

32 32

33 33

34 IMEx statistics 34 May 2014 – 311,141 binary interactions from 9010 publications

35 Curation levels 1. IMEx – as agreed by the consortium, detailed curation, full description of constructs with tags, binding sites etc. detailed as features as well as experimental info. 2. MIMIx (Minimal information…) – full experimental information but no details of the constructs 3. Minimal – no experimental detail

36 IMEx In production mode since February 2010 Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH-2007- 223411, with additional partners Vital-IT, Nature, Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai) 36

37 Master headline ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?


Download ppt "Sandra Orchard Introduction to Molecular Interaction Data Master headline."

Similar presentations


Ads by Google