Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.

Slides:



Advertisements
Similar presentations
Protein Function Analysis using Computational Mutagenesis
Advertisements

PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
Learning Algorithm Evaluation
Todd J.Taylor, Iosif I.Vaisman Abstract: A method of protein structural domain assignment using an Ising/Potts-like.
Clustering the Temporal Sequences of 3D Protein Structure Mayumi Kamada +*, Sachi Kimura, Mikito Toda ‡, Masami Takata +, Kazuki Joe + + : Graduate School.
50%, guessing 100%, all correct Accuracy = Figure 2 Predictive Accuracy of SMO algorithm using each attribute separately Prediction of catalytic residues.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Mining frequent patterns in protein structures: A study of protease families Dr. Charles Yan CS6890 (Section 001) ST: Bioinformatics The Machine Learning.
Herpes Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella.
The Protein Data Bank (PDB)
Three kinds of learning
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Protein Mutational Analysis Using Statistical Geometry Methods Majid Masso Bioinformatics and Computational.
Performance Metrics for Graph Mining Tasks
A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.
Protein Tertiary Structure Prediction
Truncation of Protein Sequences for Fast Profile Alignment with Application to Subcellular Localization Man-Wai MAK and Wei WANG The Hong Kong Polytechnic.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.
Prediction of HIV-1 Drug Resistance: Representation of Target Sequence Mutational Patterns via an n-Grams Approach Majid Masso School of Systems Biology,
NFL Play Predictions Will Burton, NCSU Industrial Engineering 2015
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.
An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.
Computational prediction of protein-protein interactions Rong Liu
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso Laboratory for Structural Bioinformatics,
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
05/02/2008 Jae Hyun Kim Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Faulon, J. L.,
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Performance measures Morten Nielsen, CBS, Department of Systems Biology, DTU.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Majid Masso School of Systems Biology, George Mason University
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Introduction Feature Extraction Discussions Conclusions Results
Extra Tree Classifier-WS3 Bagging Classifier-WS3

Protein Structures.
WT Segregant * Ds * * Figure S1. Fatty acid methyl esters of seed triacylglycerols for a wild-type segregant (WT Segregant) and transposon-tagged line.
Generalizations of Markov model to characterize biological sequences
Protein structure prediction.
An Energetic Representation of Protein Architecture that Is Independent of Primary and Secondary Structure  Jason Vertrees, James O. Wrabl, Vincent J.
Volume 11, Issue 4, Pages (April 2015)
Protein structure prediction
Suvobrata Chakravarty, Roberto Sanchez  Structure 
Daniel Seeliger, Bert L. de Groot  Biophysical Journal 
Presentation transcript:

Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology George Mason University, Manassas, Virginia, USA BIOSTEC BIOINFORMATICS 2011

IL-3 Structure, Function, and Experimental Mutagenesis Data IL-3 promotes the growth of many hematopoietic cell lines Theoretically, there are 19 × 112 = 2128 possible IL-3 mutants via single residue substitutions at all positions in the structure Experimental dataset: 630 of these IL-3 mutants were synthesized, representing substitutions at all but 12 positions Activity of synthesized IL-3 mutants measured as % of wild type (wt) using erythroleukemic cell proliferation assays: 27 “increased” mutants (>100% wt); 373 “full” (20 – 100% wt); 75 “moderate” (5 – 19% wt); and 155 “low” (< 5% wt) Alternatively, there are 400 “unaffected” (“increased” + “full”) and 230 “affected” (“moderate” + “low”) IL-3 mutants

Delaunay Tessellation of Protein Structure D3 A22 S64 L6 F7 G62 C63 K4 R5 Aspartic Acid (Asp or D) Abstract every amino acid residue to a point Atomic coordinates – Protein Data Bank (PDB) Cα coordinates Delaunay tessellation: 3D “tiling” of space into non-overlapping, irregular tetrahedral simplices. Each simplex objectively identifies a quadruplet of nearest-neighbor amino acids at its vertices.

Delaunay Tessellation of Interleukin-3 (IL-3) Ribbon (left) from PDB file 1jli (112 residues, positions 14 – 125) Each amino acid residue is represented by its Cα in 3D space Tessellation of the 112 Cα points (right) is performed using a 12Å edge-length cutoff, for “true” residue quadruplet interactions

Four-Body Statistical Potential PDB Training set: nearly 1,400 diverse high-resolution x-ray structures 1bniA barnase 3lzm t4 lysozyme 1efaB lac repressor Tessellate Pool together all simplices from the tessellations, and compute observed frequencies of simplicial quadruplets … 1rtjA HIV-1 RT

Four-Body Statistical Potential

Computational Mutagenesis IL-3 tessellation 14 simplices, 11 neighbors of D21 (large C α point) Residual score = EC 21 environmental change (EC) Residual profile vector R mut of IL-3 D21S mutant D21

IL-3 Experimental Data: Structure – Function Relationship

Feature Vectors for IL-3 Mutants For IL-3 mutation at position N, nonzero EC scores in residual profile vector R mut occur only at N and its structural neighbors Every position has at least 6 neighbors, can be ordered based on Euclidean distance from position N (tessellation edge-lengths) So, create new 7D vector: residual score (EC score at N), and EC scores of the 6 closest neighbors (ordered by distance from N) 20 additional features: position number N, wt and replacement residues, residues at neighbor positions, primary sequence location of neighbors relative to N, mean tetrahedrality and volume of simplices using N, secondary structure at N, tessellation-defined depth of N, and number of surface contacts Total: each IL-3 mutant represented as a 27D feature vector

Supervised Classification (unaffected/affected) Algorithm: random forest (RF); Training set: 630 IL-3 mutants Testing: tenfold cross-validation (10-fold CV), leave-one-out CV (LOOCV), and random split (2/3 for training, 1/3 for prediction) Evaluation of performance: Overall accuracy, or proportion of correct predictions: Q Balanced error (accuracy) rate: BAR = 1 – BER Matthew’s correlation coefficient: MCC Area under ROC curve: AUC

Statistical Significance of Predictions

Application: Predict Activity of Remaining IL-3 Mutants

Conclusion and Future Directions Computational mutagenesis procedure effectively elucidates the IL-3 structure-function relationship (via residual scores) Random forest predictive model for any mutational effect on IL-3 activity developed using attributes based on: computational geometry (Delaunay tessellation of IL-3 structure) computational mutagenesis (EC scores of residual profile vectors) Current work focused on inductive learning, future project could apply transductive learning for predicting unknown mutants The techniques can be applied to any similar experimental protein mutant dataset – motivation for robust wet-lab collaborations Contact: Slides available at: