Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge 1, Hernando Escobar 1, Peter E. Jensen 1, Julio C. Delgado 1, David.

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Secondary structure prediction from amino acid sequence.
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Background Goals Methods Results Conclusions Implications.
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
MHC Polymorphism Ole Lund. Objectives What is HLA polymorphism? What is it good for? How does it make life difficult for vaccine design? Definition of.
The Structure and Functions of Proteins BIO271/CS399 – Bioinformatics.
A Study on Feature Selection for Toxicity Prediction*
Proteins Dr Una Fairbrother. Dipeptides u Two amino acids are combined as in the diagram, to form a dipeptide. u Water is the other product.
MHC Polymorphism. MHC Class I pathway Figure by Eric A.J. Reits.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
DNA and Amino Acids Molecular Structure Lecture 3.
Napovedovanje imunskega odziva iz peptidnih mikromrež Mitja Luštrek 1 (2), Peter Lorenz 2, Felix Steinbeck 2, Georg Füllen 2, Hans-Jürgen Thiesen 2 1 Odsek.
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Machine-learning in building bioinformatics databases for infectious diseases Victor Tong Institute for Infocomm Research A*STAR, Singapore ASEAN-China.
Appendix: The WEKA Data Mining Software
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Lecture 10: Protein structure
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Protein “folding” occurs due to the intrinsic chemical/physical properties of the 1° structure “Unstructured” “Disordered” “Denatured” “Unfolded” “Structured”
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
A new way of seeing genomes Combining sequence- and signal-based genome analyses Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI Introduction: So far,
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
110/30/2015 Antigens Antigens Hugh B. Fackrell 210/30/2015 ä Assigned Reading ä Content Outline ä Performance Objectives ä Key terms ä Key Concepts ä.
Telling self from non-self: Learning the language of the Immune System Rose Hoberman and Roni Rosenfeld BioLM Workshop May 2003.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Strong CTL-mediated allogeneic reactions have been reported between HLA-B*4402 and –B*4403 alleles. These alleles differ only by a single residue located.
The α-helix forms within a continuous strech of the polypeptide chain 5.4 Å rise, 3.6 aa/turn  1.5 Å/aa N-term C-term prototypical  = -57  ψ = -47 
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Proteins. Protein Function  Catalysis  Structure  Movement  Defense  Regulation  Transport  Antibodies.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
Specific Defenses of the Host Part 2 (acquired or adaptive immunity)
Bioinformatics in Vaccine Design
Protein backbone Biochemical view:
Marlou Snelleman 2012 Protein structure. Overview Sequence to structure Hydrogen bonds Helices Sheets Turns Hydrophobicity Helices Sheets Structure and.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Cysteine Oxidation Prediction Program (COPP): A New Software Program That Predicts Reversible Protein Cysteine Thiol Oxidation Reactions Ricardo Sanchez,
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
The heroic times of crystallography
Intracellular Pathogens Extracellular Pathogens
Waikato Environment for Knowledge Analysis
Beta sheets come in two flavors: parallel (shown on this slide) and anti parallel. The geometry of the individual beta strandis are almost identical in.
Building Hypotheses and Searching Databases
Virtual Screening.
The Chemistry of Life Proteins
Volume 8, Issue 3, Pages (March 1998)
Ligand Docking to MHC Class I Molecules
Telling self from non-self: Learning the language of the Immune System
Dept. of Computer Science University of Liverpool
Alessandro Sette, Sinu Paul, Kerrie Vaughan, Bjoern Peters 
Assignment 5 Example of multivariate regression
Volume 9, Issue 2, Pages (August 1998)
Structure of CD94 Reveals a Novel C-Type Lectin Fold
Analysis of LC8-binding and nonbinding motifs reveals distinct positional preferences. Analysis of LC8-binding and nonbinding motifs reveals distinct positional.
Presentation transcript:

Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge 1, Hernando Escobar 1, Peter E. Jensen 1, Julio C. Delgado 1, David K. Crockett 1,2 1 ARUP Laboratories, Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT 2 Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT INTRODUCTION HLA class I peptide motifs have been described by dominant amino acid residues located in primary anchor positions. For example, the reported motif for HLA-A*0201 from the SYFPEITHI database is x-[LM]-x-x-x-x-x-x-[VL]. [1] Variations of this nomenclature are also seen in other HLA class I peptide motif databases such as IMGT/HLA [2]. Patterns of anchor residues has led to the development of software tools and algorithms for prediction of peptide binding and screening of target organisms or sequences for a given peptide motif. However, the physical and chemical properties of peptide anchor position residues that confer allele specificity have not been as well described. For this study, supervised feature selection was used to identify the physical and chemical properties that best distinguish A*0201 peptide binders from non- binders. METHODS CONCLUSIONS Supervised feature selection was used to characterize prominent physical and chemical properties for anchoring amino acid residues in HLA-A*0201 allele specificity. Ongoing efforts include allele representation and binding prediction algorithms for different HLA class I subtypes. RESULTS A publicly available data set of A*0201 binding peptides (n=1181) and non-binding peptides (n=1908) was downloaded from the Immune Epitope Database (IEDB) [3]. Amino acid residues of anchor positions (P2 and Pω) were characterized using values of 544 physical, chemical, conformational, or energetic properties (AAindex v9.4). [4] Properties downloaded from the AAindex ( were each represented numerically (each amino acid had a numerical value for each property). In cases where there was no value for a particular amino acid/property combination a value of zero was assigned. We created input files for the next step in processing using a simple Java program. Each amino acid in the anchor positions was assigned the numerical value given from the reported AAindex properties table. For each anchor position, the Correlation-based Feature Subset Selection algorithm [5], together with the Best First (greedy hillclimbing) search method, were used to identify the subset of properties that best distinguished binders from non-binders. Attribute selection algorithms were implemented using the Weka software package v3.6. [6] Selected features using the full training set for anchor 1 and anchor 2 were summarized in Table 1, and results using fivefold cross-validation are reported below. Using fivefold cross-validation, the amino acid properties of normalized frequency of extended structure (Burgess et al., 1974), parameter of charge transfer capability (Charton-Charton, 1983), and relative preference value at C1 (Richardson-Richardson, 1988) best characterized the residues in anchor 1 (P2). The anchor 2 position (Pω), again using fivefold cross-validation, was best represented by the number of atoms in the side chain labeled 3+1 (Charton-Charton, 1983), parameter of charge transfer donor capability (Charton-Charton, 1983), normalized frequency of C- terminal non helical region (Chou-Suzuki, 1976), information measure for middle turn (Robson-Suzuki, 1976), and amphiphilicity index (Mitaku et al., 2002). References: 1. Rammensee, H.G., T. Friede, and S. Stevanoviic, MHC ligands and peptide motifs: first listing. Immunogenetics, (4): p Robinson, J., et al., IMGT/HLA database--a sequence database for the human major histocompatibility complex. Tissue Antigens, (3): p Peters, B., et al., The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol, (3): p. e Kawashima, S. and M. Kanehisa, AAindex: amino acid index database. Nucleic Acids Res, (1): p Hall, M.A., Correlation-based feature selection of discrete and numeric class machine learning, in Computer Science Working Papers. 2000, University of Waikato, Department of Computer Science: Hamilton, New Zealand. 6. Witten and Frank. Data Mining: Practical machine learning tools and techniques. 2nd edition ed. 2005, San Francisco: Morgan Kaufmann. Table 1. Selected attributes for HLA-A*0201 anchor positions 1 and 2. Anchor PositionAAIndex Property a Original Reference Anchor 1 A parameter of charge transfer donor capabilityCharton, 1983 Amino acid compositionDayhoff, 1978 Atom based hydrophobic momentEisenberg, 1986 Partition coefficientGarel, 1973 PolarityGrantham, 1974 Hydrophilicity valueHopp-Woods, 1981 Normalized frequency value of alpha-helix with weightsLevitt, 1978 AA composition of total proteinsNakashima, 1990 Normalized frequency of beta-sheet in all-beta classPalau, 1981 Weights for alpha-helix at the window position of 3Qian-Sejnowski, 1988 Average relative fractional occurrence in E0(i)Rackovsky-Scheraga, 1982 Relative preference value at C-capRichardson, 1988 Normalized positional frequency at helix termini N4Aurora-Rose, 1998 Volumes including crystallographic waters using ProtOrTsai, 1999 Anchor 2 The number of bonds in the longest chainCharton, 1983 Average volume of buried residueChothia, 1975 Normalized frequency of N-terminal beta-sheetChou-Fasman, 1978 Conformational preference for parallel beta-strandsLifson-Sander, 1979 AA composition of mt-proteins from fungi and plantNakashima, 1990 Information measure for C-terminal turnRobson-Suzuki, 1976 Volumes including crystallographic waters using ProtOrTsai, 1999 a Accessed March 2010 from Figure 1. Common HLA-A*0201 motif. Anchor 1 and Anchor 2 were characterized using AAIndex Properties (v9.4). Anchor 1 Anchor 2