Presentation on theme: "Proteomics technologies and protein-protein interaction Lars Kiemer Center for Biological Sequence Analysis The Technical University of Denmark Advanced."— Presentation transcript:
Proteomics technologies and protein-protein interaction Lars Kiemer Center for Biological Sequence Analysis The Technical University of Denmark Advanced bioinformatics – November 2005
November 2005 Outlining the problem Around 30% of the human proteins still have no annotated function. Even if the function is known, we often don’t know anything about the big picture (regulation?, multiple functions?, pathogenesis?, mutations?, splice variants?). In fact, the individual proteins are as interesting as bricks in a wall – what we want to know about is the system.
November 2005 Ras Raf MEK MAPK NUCLEUS CYTOPLASM EXTRACELLULAR Rap1 bRaf NCAM DAGL Ca 2+ Fyn FGFR CB1 NCAM Frs2 PLC Shc Fak PKC PKA Grb2 Sos GAP43 CaMKII CREB C-Fos Example: signal transduction cascade
November 2005 Ras Raf MEK MAPK Transcription MAPK NUCLEUS CYTOPLASM EXTRACELLULAR cAMP Rap1 bRaf NCAM DAGL DAGPIP 2 2-AG Ca 2+ Fyn FGFR CB1 NCAM Frs2 PLC Shc Fak PKC PKA IP 3 Grb2 Sos Grb2 Sos GAP43 CaMKIICREB C-Fos Example: signal transduction cascade
November 2005 Obtaining data High-throughput data can provide information about interactions with other proteins, protein abundance in different tissues, transcriptional regulation, etc. High-throughput experimental techniques provide large data sets – thus no manual curation is possible. These data sets often contain false positives. But combining several such data sets increases confidence.
November 2005 Protein interactions reveal a lot! Hints of the function of a protein are revealed when its interaction partners are known. Guilt by association! Complexes in which none of the interaction partners have known functions are even more interesting.
November 2005 Yeast-two-hybrid screening Has been widely used Only binary interactions High false postive rate Proteins must be able to enter the nucleus
November 2005 Affinity purification Large-scale Can be done on any preparation of cells Often complexes are purified and the order of binding is not obtained An extra step is needed to identify purified proteins
November 2005 Q1 TOF q2 Mass Analyzer(s) Separates gas-phase Ions by m/z Ion Source Converts the analyte into gasphase ions 3 principal components + Detector Ions are detected as they disharge on the detector Mass spectrometer
November 2005 Mass spectrometry in short Extremely sensitive Weight precision of one atom In principle, detection of one, relatively short peptide allows for unambiguous identification. Some proteins are difficult to chop up with proteases. Some peptides are very difficult to ionize. Due to the high sensitivity of the method, contaminations are difficult to avoid.
November 2005 Affinity pulldown BaitPrey SpokeMatrixTruth? Protein interaction databases: Spoke/Matrix
November 2005 Protein interaction data: A total of 18.629 articles represented in the databases ( June 2005 ). DatabaseUnique article references # interaction pairs in unique references. DIP1.3535.403 (binary?) MINT1.4065.430 (spoke) Intact3556.836 (spoke) GRID1.23249.135 (binary?) BIND* (protein part)5.73344.279 (spoke/matrix) HPRD6.98914.533 (matrix) *Approx. 10% of pp interactions in BIND are db’ imports Protein interaction databases: Overlap
November 2005 Species bias in available data A few select organisms are very well-studied, while others are not. The BIND database, species distribution (Alfarano et al., NAR, 2005):
November 2005 Orthologs? Orthologous genes are direct descendants of a gene in a common ancestor: (O'Brien K, Remm et al. 2005) S. cerevisiae D. melanogaster H. sapiens Trans-organism protein interaction network
November 2005 D. melanogaster Experim. C. elegans Experim. S. cerevisiae Experim. H. sapiens MOSAIC Trans-organism protein interaction network
November 2005 Repetition of experiments adds credibility Light blue connection – 1 experiment. Darker blue connection – >1 experiment, 1 organism. Purple connection - >1 experiment, >1 organisms. Light blue connection – 1 experiment. Darker blue connection – >1 experiment, 1 organism. Purple connection - >1 experiment, >1 organisms.
November 2005 Adding co-expression data Red connector – co- expression in 80 different tissues with a correlation coefficient above 0.7. Grey nodes – no expression data available. Red connector – co- expression in 80 different tissues with a correlation coefficient above 0.7. Grey nodes – no expression data available.
November 2005 Nucleolus dynamics Nodes are coloured according to level of protein in the nucleolus following transcriptional inhibition (Andersen et al., Nature, 2005). decreased unchanged Relativelevelofprotein inthenucleolus afterinhibitionoftranscription increased decreased unchanged Relativelevelofprotein inthenucleolus afterinhibitionoftranscription increased
November 2005 Adding up to make high quality associations Integration of various data sources builds up confidence
November 2005 Upon integration comes enlightenment
November 2005 Upon integration comes enlightenment
November 2005 Identifying functional complexes Ribosome (predominantly 60S) DNA repair SMARCA complex TFIID Arp2/3
November 2005 Summary Protein-protein interactions can reveal hints about the function of a protein (guilt by association). Information about protein interactions is obtained with different technologies each with its own advantages and weaknesses. Due to the high degree of systemic conservation, interactions can be inferred from observed interactions in other species. Data are always error-prone. Repeated observations build up confidence. Integrating different types of data can futher build up confidence.