Download presentation
Presentation is loading. Please wait.
1
Protein interactions and Pathways
Jyoti Khadake & Vicky Schneider Joint Wellcome Trust –EBI Summer School 24th June 2011
2
This morning session outline
Where do protein sequences come from? Introduction to protein databases Introduction to protein interactions Standardisation of the protein interaction data IntAct and demo Psicquic/Cytoscape & demo Data visualisation and network building- Including the Protein information from other sources to enhance networks
3
Where do protein sequences come from?
4
Can you name THE database of protein sequences?
Protein databases Based on nucleotide sequence similarity Based on peptide sequences Organism database Organism of protein is important as is sequence – taxonomy databases Can you name THE database of protein sequences?
5
UniProtKB factsheet
6
Let’s explore a protein: CDC42
Cell division control protein 42 homolog also known as CDC42 is a protein involved in regulation of the cell cycle. It is a small GTPase of the Rho-subfamily, which regulates signaling pathways that control diverse cellular functions including cell morphology, migration, endocytosis and cell cycle progression. What could go wrong if CDC42 is not doing its job?
7
UniProtKB (CDC42 protein)
Search for gene - CDC42 Check the different proteins retrieved Organisms Same organism swissprot/trembl Different referenced databases - PRIDE Sequences and References Information about protein Where is it present, how does it act, what are its properties… INTACT, REACTOME, GOA, INTERPRO, PDB How are TREMBL entries generated?
8
UniProt Knowledge Base
Swiss-Prot: Manual annotations (~450,000 proteins) TrEMBL: Automatic (~3,300,000 proteins) Master headline
9
UniProt Knowledge Base
Interactions in IntAct are using Splice Variants Master headline
10
UniProt Knowledge Base
Summary: Master Protein: P60953 Splice variants / Isoform: P , P ! Splice variants have their own sequence. Note the formatting of the AC, it will be useful to detect SV in the IntAct web interface. Master headline
11
UniProt Knowledge Base
Protein Families, domains and motifs
12
What is a Protein families?
Protein domain? And protein motifs? Why to bother creating a db that groups proteins that share the same domain?
13
InterPro factsheet Protein Families, domains (and motifs)
14
UniProt Knowledge Base
Summary: Master Protein: P60953 Interaction and pathway databases ! Splice variants have their own sequence. Note the formatting of the AC, it will be useful to detect SV in the IntAct web interface. Master headline
15
UniProt Taxonomy Web Interface to the NCBI taxonomy Master headline
16
Newt Necessary for demo on programmatical access.
Could add a viewlet on OLS here Master headline
17
PRIDE: where is the data coming from.
18
PRIDE factsheet
19
Protein interactions
20
Interactions Basis of protein action Types
Self Binary: homomeric or heteromeric N-nary complexes Co-localisations Biological types of interactions Information in literature and websites
21
Types of Interaction data in IntAct
1. Direct interactions 2. Association 3. Functional Interaction
22
In pairs start the next activity:
Match the types of experimental techniques (you can find information in the cards provided) with the type of interactions Jyoti just explained : Direct Interactions Association Functional Interaction
23
Standardisation of the protein interaction data
Ontologies factsheet
24
www.ebi.ac.uk/ols for controlled vocabularies
25
Format for storage and exchange –
PSI-MI XML 2.5
26
Interaction Databases
Deep Curation IntAct – active curation, broad species coverage, all molecule types MINT – active curation, broad species coverage, PPIs DIP – active curation, broad species coverage, PPIs MPACT - ? curation, limited species coverage, PPIs MatrixDB – active curation, extracellular matrix molecules only BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated Shallow curation BioGRID – active curation, limited number of model organisms HPRD – active curation, human-centric, modelled interactions MPIDB – active curation, microbial interactions
27
The IMEx consortium
28
IntAct
29
How to model an interaction
Participant3 Interaction2 Interaction4 Interaction1 Interaction3 Protein1 Protein2 Participant1 Participant2 Experiment1 Experiment2 . Roles . Features . Preparations Participant Publication An interaction can have one (auto-phosphorylation) or many participants (binary or n-ary) It can involve protein, but also small molecule, DNA, RNA… A participant is the specific instance of the interactor (eg. protein) in the context of an interaction eg. Interactor: P12345, Participant: P12345 with GST tag and mutated residue
30
Main objects - Experiment
Literature references Controlled by Ontologies Confidence measures
31
Main objects - Participant
Interactor Building of Complex e.g. enzyme target e.g. bait, prey Delivery method expression level… Interactor used experimentally
32
IntAct Search MITab From MiTab to detailed view Expanding network
Network view - TBC Other data that can be visualised
33
IntAct – Home Page http://www.ebi.ac.uk/intact
Complete redesign of the web site, released in July Use of tab for better separation Just reached 200,000 interaction evidences in September ‘09 Master headline
34
Software demonstration
Many ways to search data ! Simple, yet powerful search engine Advanced search – how to build complex queries Searching by ontology terms Searching by chemical substructure IntAct home page Use simple search box (highlight search by xref, annotation, shortlabel, fullname…) GO: : regulation of cell cycle (matches 197 proteins) Show detailed protein view () Show binary View Show binary view Experiment view (details of experiments, interactions, participants, features) Linking to InterPro search Linking to external application: Mine, HierarchView Master headline
35
Details of interaction
Simple Search First search from the home page… Each line is a binary interaction evidence reported by a scientific publication Evidences are grouped by molecule pairs (allowing for subsequent filtering should you need to) Data can be downloaded in standard formats (see table header) Only 30 interactions per page for speedy loading One can customize the list of columns by clicking on the “Change Column Displayed” Details of interaction UniProt Taxonomy PubMed OLS Complex ? Master headline
36
Downloading & Customizing
! Downloading & Customizing First search from the home page… 3 formats to choose from, different flavour of PSIMITAB and PSI-MI XML (XML limited to 300 interaction for download size consideration) One can choose columns to be displayed … state not saved between sessions ! Master headline
37
Searching –more How to build complex queries…
Quick search (top box under the search tab) provide the same functionality as the search field that shows on every tab, just added examples(all examples can be run by clicking on them) and documentation Ontology search allows users to search by ontology terms and provides sugestion as you type. Chemical Search – more on that on the next slides. Advanced fields – more on that on the next slides Master headline
38
Searching – Fields Unsure how to build your own complex query ?
How to build complex queries… Master headline
39
Searching – Fields Some fields provide easy ways to select terms
How to build complex queries… Master headline
40
Software demonstration
Single interaction details Selecting an interaction Looking at the details Fetching all other interaction reported in the same paper Searching for similar interactions in the database Master headline
41
Interaction Details Selecting an interaction… Master headline
42
Interaction Details Looking at the details… Master headline
43
Interaction Details Looking at the details…
Emphasis that this is the place where one would find information about phosphorylation, specific ranges… Also Stoichiometry, experimental parameters… Looking at the details… Master headline
44
Interaction Details Searching for similar interactions…
We search for similar interaction by looking for interactions sharing the same participants. Interactions having the most in commons are shown first. So far all hits are shown, we will work at speeding up that view as it can be rather slow when many participants exist in the original interaction. Master headline
45
Network Visualisation PSICQUIC Cytoscape
46
Network visualisation
In IntAct From IntAct Binary and expanded From IntAct N-nary and expanded Important: type of interaction and method used In Psicquic Data from other interaction databases
47
What is PSICQUIC ? Proteomics Standards Initiative Common QUery InterfaCe. Community effort to standardise the way to access and retrieve data from Molecular Interaction databases. PSICQUIC is a specification of a web service. Resources already implementing PSICQUIC are listed in a registry. Based on the PSI standard formats (XML and MITAB) Documentation:
48
PSICQUIC implementation
PSICQUIC client User PSICQUIC Registry PSICQUIC sources PSICQUIC PSICQUIC PSICQUIC Interaction databases Annotation error …..... ….…. …..... ….…. Publications Observation error Sample
49
PSICQUIC View Enables clustering of queries across providers,
Visualization of graphical network Linking back to the original source for more details …
50
PSICQUIC Services Tagging
Content protein-protein small molecule-protein nucleic acid-protein Interaction representation evidence clustered Curation standards mimix curation imex curation rapid curation Source internally curated text mining predicted imported Complex expansion spoke matrix bipartite This is how we can identify IMEx services amongst all available
51
PSICQUIC View
52
How to deal with Complexes
Some experimental protocol do generate complex data: Eg. Tandem affinity purification (TAP) One may want to convert these complexes into sets of binary interactions, 2 algorithms are available: Computational tool to aid searches, Transforms n-ary interaction into binary, Both are somewhat wrong, spoke is said to generated 3 times less false positive (Bader et al.).
53
In pairs start the next activity: Binary or N-nary? Spoke or Matrix?
Please identify the type of interaction for the interaction method cards given Also choose the method you think is best for the method
54
Software demonstration
Visualising network in Cytoscape Selecting an network Import in cytoscape Change layout Add attributes and change view based on these Change and add properties to nodes and edges
55
Cytoscape network visualisation
56
Visualization Highlighting network layout…
57
Visualization Highlighting network properties edges Master headline
58
Visualization Highlighting network properties nodes Master headline
59
Attributes and analysis using Cytoscape
60
What else? How to look deeper into a dataset…
GO: = T Cell Proliferation Master headline
61
61
62
GOA How to look deeper into a dataset…
Click on the interaction count to restrict your dataset This operation can be done several time to add multiple filters How to look deeper into a dataset… GO: = T Cell Proliferation Master headline
63
Improving and increasing protein annotations
64
PSI, IMEx, Enfin Proteomics community
Acknowledgements Rolf Apweiler PANDA Henning Hermjakob Sandra Orchard Margaret Duesbury Samuel Kerrien Bruno Aranda Marine Dumousseau Proteomics IntAct team PSI, IMEx, Enfin Proteomics community IntAct is funded by the European Commission under FELICS, contract number (RII3)
65
What data are we dealing with ?
System Biology? DNA RNA Protein Small Molecules Genomics Proteomics Transcriptomics Metabolomics Research generates a very large amount of data BioInformatics required to cope with the ever growing amount of data. Current scope: protein-protein interactions Very near future: extension to - RNA/DNA (Genomics/Transcriptomics) - Small Molecules (metabolomics) Genomics: - study of the genes, - their functions, - information carried in DNA (gene expression, splicing…). Transcriptomics: study of messenger RNA expression. Makes use of microarrays. Proteomics: - Protein expression levels, - protein-protein interactions, - subcellular localisation, - analysis of post-translational modifications (PTM). Functional genomics/proteomics: - function of individual genes, - interactions among groups of genes -> used to address biological questions. Jyoti’s version Informatics is the management and analysis of data using advanced computing techniques. Bioinformatics is particularly important as an adjunct to genomics research, because of the large amount of complex data this research generates. Genomics includes the study of the genes and their functions and information carried in the sequence of DNA, it has information about the genes and control of gene expression, information in terms of sequence about the splicing and alternative splicing. Transcriptomics includes The study of messenger RNA expression, the transcriptome, can be performed using microarrays. However analysis at the mRNA level does not always correlate well with changes in protein levels. Proteomics Protein expression levels, protein-protein interactions and subcellular localisation can be studied using proteomic techniques and proteomics can also be applied to the analysis of post-translational modifications (PTM). Functional genomics proteomics Study of the function of individual genes and interactions among groups of genes to address biological questions. Functional Genomics/Proteomics Databases What data are we dealing with ?
66
Pathways:
67
This afternoon session outline
Reactome Overview What type of data it contains Where the data comes from What and how can you access through Reactome Have a go: tutorial
68
A Database of human biological pathways
Steve Jupe Please acknowledge Reactome and the sources of Reactome funding (in the panel bottom right).
69
Rationale – Journal information
Nature 407(6805):770-6.The Biochemistry of Apoptosis. “Caspase-8 is the key initiator caspase in the death-receptor pathway. Upon ligand binding, death receptors such as CD95 (Apo-1/Fas) aggregate and form membrane-bound signalling complexes (Box 3). These complexes then recruit, through adapter proteins, several molecules of procaspase-8, resulting in a high local concentration of zymogen. The induced proximity model posits that under these crowded conditions, the low intrinsic protease activity of procaspase-8 (ref. 20) is sufficient to allow the various proenzyme molecules to mutually cleave and activate each other (Box 2). A similar mechanism of action has been proposed to mediate the activation of several other caspases, including caspase-2 and the nematode caspase CED-3 (ref. 21).” Papers contain rich and useful information, but it is not easily accessible to computers. Languages are highly flexible and nuanced, scientific literature is full of statements that appear definitive but on examination may prove to be based on weak or no cited evidence, or carefully constructed to avoid being too definitive, (a phenomenon known as ‘hedging’). Text mining might spot the red words as ‘molecules’ and blue words as ‘actions’, it might even spot the green possible name of a pathway, and brown literature reference, but to date even the best text mining methods would have trouble identifying the correct relationships. Reactome data consists of reactions and pathways based on information extracted from papers by people, biologists who are experts in the field, assisted by PhD level curators who add structure to the information and place it in the database. This process maximises the details and subtleties but makes it accessible to data mining and re-use How can I access the pathway described here and reuse it?
70
A picture paints a thousand words…
Rationale - Figures A picture paints a thousand words… but…. Just pixels Omits key details Assumes Fact or Hypothesis? A picture is a great way to visualize a process, but impossible for a computer to understand. In most cases you can’t click to obtain more detail. Reactome pathway diagrams are interactive. Another problem with figures in papers is they assume the reader is familiar with the subject, leaving out some details. Often they mix the established with the hypothetical – Reactome gives the full process in detail, with literature references that experimentally demonstrate the event described. Nature Oct 12;407(6805):770-6. The biochemistry of apoptosis.
71
Reactome is… Free, online, open-source curated
database of pathways and reactions in human biology Authored by expert biologists, maintained by Reactome editorial staff (curators) Mapped to cellular compartment Free, unlike most other similar resources, with no restrictions on use (only an acknowledgement is required). Authored by biologists – NOT simply text-mining (which is prone to misinformation). Mapped to compartment – not the case for many other pathway dbs. Tissue and development – too little information to justify when Reactome started, now a recurrent discussion, but not felt to be the primary aim of Reactome – can be achieved by expression data overlay - if no expression, that reaction is likely absent in the tissue or disease.
72
Reactome is… Extensively cross-referenced
Tools for data analysis – Pathway Analysis, Expression Overlay, Species Comparison, Biomart… Used to infer orthologous events in 20 non-human species We link out to many external dbs. Uniprot is our primary external reference for proteins, ChEBI for small molecules, PubMed for literature. Reactome is human-centric, pathways are based on data obtained using human materials or manually inferred from model organism data, i.e. the expert has to believe that the experiment could be repeated in human with the same results (elaborated in the next slide). In a separate process, human pathways are used to computationally infer (by an orthology method) non-human equivalents.
73
Using model organism data to build pathways – Inferred pathway events
PMID:5555 Direct evidence PMID:4444 Direct evidence human Our preferential source is evidence from human experiments for a particular event. However, sometimes all the evidence is from another species. In these cases, we infer the event can happen in human based on experimental data from another species. A parallel pathway step is created in the other species. This contains the associated literature reference with the experimental details. The human pathway step point to this non-human pathway step with a link called ‘Inferrred from’. PMID:8976 mouse Indirect evidence PMID:1234 cow
74
Theory - Reactions = events in biology
Pathway steps = the “units” of Reactome = events in biology BINDING DISSOCIATION DEGRADATION CLASSIC BIOCHEMICAL PHOSPHORYLATION DEPHOSPHORYLATION TRANSPORT We call steps in a pathway ‘reactions’ an analogy with biochemical pathways. These reactions are the units of Reactome. However, Reactome considers many biological events to be reactions, not just the classical biochemical reaction. Examples are in the slide. Any event that changes the state of a molecule is a reaction.
75
Reaction Example 1: Enzymatic
This slide shows a "metabolic" reaction, of the kind that most students will be familiar with. It has reactants (called "inputs" in Reactome- speak), products (outputs) and possibly also a catalyst. In this real example, ADP is regulating the reaction. More explanation of the symbols used later.
76
Reaction Example 2: Transport
Transport of Ca++ from platelet dense tubular system to cytoplasm REACT_945.4 This slide illustrates a transport reaction - glucose is transported from the extracellular space to cytosol. The input of this reaction is glucose in the extracellular compartment; the output is glucose in the cytosol. The facilitator is the transporter protein or complex. N.B. In Reactome a molecule in the cytosol is NOT the same as the same molecule in another part of the cell. This is a very important distinction if you want to model events in biology. Reactome uses the GO compartment classification.
77
Other Reaction Types Dimerization Binding Phosphorylation
Differnt kinds of reaction. IRF3 dimerization, complex formation (LBP/LPS/CD14) N.B. Phosphorylation within Reactome is a reaction, phosphorylated IRF3 is NOT the same as IRF3. Phosphorylation
78
Reactions Connect into Pathways
OUTPUT INPUT CATALYST OUTPUT INPUT CATALYST INPUT OUTPUT CATALYST Pathways are a set of reactions, e.g. Reaction 1: protein X is cleaved, Reaction 2: protein X is phosphorylated, Reaction 3 protein X has catalytic activity. Pathways can have sub-pathways to structure information.
79
Data Expansion - Link-outs From Reactome
GO Molecular Function Compartment Biological process KEGG, ChEBI – small molecules UniProt – proteins Sequence dbs – Ensembl, OMIM, Entrez Gene, RefSeq, HapMap, UCSC, KEGG Gene PubMed references – literature evidence for events Some of the most common link-outs from Reactome. Self-explanatory. Molecular Function is used for catalysis Biological Process is assigned to pathways
80
Species Selection Reactome has computationally predicted pathways across all these species, currently 20.
81
Data Expansion – Projecting to Other Species
Human B A + ATP A -P + ADP Mouse B A + ATP A -P + ADP The basic principle of inference – it’s a little more complicated than this. Details on the website, based on EnsEMBL Compara database. All OK for mouse, but in fly, protein A has no fly orthologue (using Compara) so the reaction cannot be inferred. Because orthology is harder to establish as the evolutionary distance increases, this process works better with species closer to human like mouse and rat than it does for fly (or yeast). Drosophila B Reaction not inferred A + ATP No orthologue - Protein not inferred
82
Exportable Protein-Protein Interactions
Inferred from complexes and reactions Interactions between proteins in the same complex, reaction, or adjoining reaction Lists available from Downloads See Readme document for more details Reactome is not an interactions db, but we infer interactions from the Complexes and Reactions, and provide these for export. Low false positive rates – complimentary to IntAct and other protein-protein interaction dbs. Types of derived interaction: Direct complex – interactors in same complex Indirect complex – interactors in subcomplex of complex Reaction – interactors participate in the same reaction Neighboring reaction – interactors participating in 2 consecutive reactions
83
Coverage – Content, TOC And many more...
Screenshot of current TOC (accessed via the Menu bar on homepage). Point is that we have covered lots of topics and pathways, but we always need volunteer authors for topics not covered!
84
Planned Coverage – Editorial Calendar
Release calendar is on the website – it usually gives details of the next 2 releases. Reactome has a quarterly update cycle.
85
Reactome Tools Interactive Pathway Browser
Pathway Mapping and Over-representation Expression overlay onto pathways Molecular Interaction overlay Biomart These tools are covered in the tutorial. The pathway browser is the main method of visualizing pathways. Pathway mapping and over-representation are tools to compare a dataset to Reactome content. Expression overlay uses a submitted dataset to colour pathways according to the expression values. Useful to see if the pathway exists in your favourite tissue, or changes expression level in response to some treatment or developmental stage. Molecular interaction overlay adds protein-protein or protein-small molecule interactors onto a pathway. Biomart is a tool for federated queries, i.e. it can join data from more than one database.
86
Summary Pathway databases are an integral part of the scientific enterprise. Reactome has deployed a user-friendly web site that promotes integrated research on pathways and networks. Data visualization Data analysis Data expansion Data integration Data standards/exports Develop and distribute open software and standard operating procedures for the management of pathway information.
87
Credits OICR/CSHL NYU EBI Lincoln Stein Peter D'Eustachio Ewan Birney
Michael Caudy Shahana Mahajan Henning Hermjakob Marc Gillespie Lisa Matthews David Croft Robin Haw Veronica Shamovsky Phani Garapati Irina Kalatskaya Bijay Jassal Bruce May Steven Jupe Leontius Pradhana Nelson Ndegwa Guanming Wu Gavin O’Kelly Christina Yung Esther Schmidt Supported by grants from the US National Institutes of Health (P41 HG003751) and EU grant LSHG-CT "ENFIN”
88
In pairs start the Reactome Tutorial
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.