Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.

Slides:

Advertisements

Similar presentations

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.

Advertisements

Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.

Pfam(Protein families )

50%, guessing 100%, all correct Accuracy = Figure 2 Predictive Accuracy of SMO algorithm using each attribute separately Prediction of catalytic residues.

A Novel Multigene Family May Encode Odorant Receptors: A Molecular Basis for Odor Recognition Linda Buck and Richard Axel Published in Cell, Volume 65,

Structural bioinformatics

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Archives and Information Retrieval

Protein structure (Part 2 of 2).

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.

The Protein Data Bank (PDB)

BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.

Computational Biophysics and Bioinformatics lab Predicting protein-protein interactions and the corresponding 3D structures of protein-protein complexes.

Protein structure Classification Ole Lund, Associate professor, CBS, DTU.

BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.

Protein Structure Prediction II

Protein Structures.

Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.

PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,

Bioinformatics.

Evolution, structure and function of 1pujA Scott L. Allen, Alexander Mulherin, Takayuki Hasegawa.

CATH – a hierarchic classification of protein domain structures Rui Kuang.

BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.

PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,

Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.

Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.

Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009

PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.

Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,

Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.

Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:

PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),

Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.

Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:

EBI is an Outstation of the European Molecular Biology Laboratory. Quaternary Structure.

I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.

While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.

Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.

Comparing and Classifying Domain Structures

Classification of protein and domain families Sequence to function Protein Family Resources and Protocols for Structural and Functional Annotation of Genome.

Bioinformatics and Computational Biology

Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.

Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.

A Tutorial of the PrePPI Database Presenters: Gabriel Leis and Katrina Sherbina Loyola Marymount University Departments of Biology and Computer Science.

Ankita Sarangi School of Informatics, IUB Capstone Presentation, May 11, 2009 Advisor : Yuzhen Ye.

Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.

Protein families, domains and motifs in functional prediction May 31, 2016.

Abstract Our research mainly applies Maximum Likelihood Method (MLE), Dynamic Programming, and Neighbor Joining Method in an attempt of shortening the.

Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Functional manual annotation including GO

Demo: Protein Information Resource

Predicting Active Site Residue Annotations in the Pfam Database

Protein Sequence Analysis - Overview -

Protein Structures.

Prediction of protein function from sequence analysis

Network derived from large‐scale fractionation predicts 48 protein complexes and communities Network derived from large‐scale fractionation predicts 48.

Volume 20, Issue 5, Pages (November 2014)

Eukaryotic Transcription Activation: Right on Target

Sequence similarity clusters of SET domain methyltransferases.

Predicting Gene Expression from Sequence

Volume 20, Issue 5, Pages (November 2014)

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.

Presentation transcript:

Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard Hughes Medical Institute Department of Biochemistry and Molecular Biophysics Center for Computational Biology and Bioinformatics Columbia University

Fold Superfamily Family Classification ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Discrete islands

Thioredoxin Q8L5D4 Glutaredoxin-4 protein disulfide oxidoreductase L-VVVDFS-A-----TWCGPCKMI-KPFFH-SLSEK KSSLVVLY-A-----PWCSFSQAM-DESYN-DVAEK P--ILLYM-KGSPKLPSCGFSAQA-VQALA-AC--- Iron-sulfur cluster assembly

P22 Cro repressorλ Cro repressor 25% Afe1 42% Xfaso 1 39%44% 42% Pfl6

Continuous space

Putative active site (SCREEN)

Formyl-CoA transferase from O. formigenes NESG Target TM1055 from T. maritima Coenzyme-A

CoA from Formyl-CoA transferase SAH from DNA methyltransferase Tyrosine from tyrosyl tRNA synthetase Thiamin diphosphate from DXP synthetase TM1055

Structural neighbors of TM proteins 70 SCOP folds 3 CATH architectures 10 CATH topologies 48 CATH homologous superfamilies ~ 500 distinct ligands

“jelly roll” “β-propeller”“β-prism”            

virus cell bacteriumcell “jelly roll”“β-propeller” phagosomelyzosome “β-prism”

Experimental interactions (from BIND+Cellzome) Modeled interactions Davis FP, Braberg H, et. al. (2006). Nucleic Acids Research 34(10): ,42412,

target sequences ? sequence similaritystructural similarity template complex Modeled complex

Structures from the same SCOP family (non-redundant): 8 (SCOP domain d ) Structures from the same SCOP superfamily (non-redundant) : 23 (SCOP domain d.17.4) SCOP fold (non-redundant): 44 (SCOP domain d.17) Structural neighbors by structure alignment: 420 (PSD < 0.8, the SCOP domain id of the green structure here is d )

Structure model the overlap of modeled interface with predicted (shown in red) goodbad

B. subtilis lethal factor

Pelle B. Subtilis lethal factor

Gene co-expression profiles RGS4blockRASD1 CKS1AinteractSKP2 CD4bindTFAP2A GPNMBcontainPPFIBP1 TACR1requirePARP1 GeneWays (literature) Structures Figure 8. Use Bayesian method to integrate PPI evidence from various sources. The likelihood ratio of an interaction between two proteins (x and y),, is inferred from different evidences (c i ). Here and represent the probability that a “clue”, c i, is observed for proteins x and y that are known to interact or not (represented as and ).

Thioredoxin Q8L5D4 Glutaredoxin-4 protein disulfide oxidoreductase L-VVVDFS-A-----TWCGPCKMI-KPFFH-SLSEK KSSLVVLY-A-----PWCSFSQAM-DESYN-DVAEK P--ILLYM-KGSPKLPSCGFSAQA-VQALA-AC--- Iron-sulfur cluster assembly

Conclusions Structural information needs to be leveraged Interactively combining overall function annotation with analysis that depends on local bioinformatic/biophysical features. Infrastructure applies equally to analyzing subtle differences within families.

Acknowledgements NIH grant U54-GM Honig Lab Markus Fischer Cliff Zhang Kely Norel