Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge 1, Hernando Escobar 1, Peter E. Jensen 1, Julio C. Delgado 1, David.

Similar presentations


Presentation on theme: "Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge 1, Hernando Escobar 1, Peter E. Jensen 1, Julio C. Delgado 1, David."— Presentation transcript:

1 Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge 1, Hernando Escobar 1, Peter E. Jensen 1, Julio C. Delgado 1, David K. Crockett 1,2 1 ARUP Laboratories, Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT 2 Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT 84108 INTRODUCTION HLA class I peptide motifs have been described by dominant amino acid residues located in primary anchor positions. For example, the reported motif for HLA-A*0201 from the SYFPEITHI database is x-[LM]-x-x-x-x-x-x-[VL]. [1] Variations of this nomenclature are also seen in other HLA class I peptide motif databases such as IMGT/HLA [2]. Patterns of anchor residues has led to the development of software tools and algorithms for prediction of peptide binding and screening of target organisms or sequences for a given peptide motif. However, the physical and chemical properties of peptide anchor position residues that confer allele specificity have not been as well described. For this study, supervised feature selection was used to identify the physical and chemical properties that best distinguish A*0201 peptide binders from non- binders. METHODS CONCLUSIONS Supervised feature selection was used to characterize prominent physical and chemical properties for anchoring amino acid residues in HLA-A*0201 allele specificity. Ongoing efforts include allele representation and binding prediction algorithms for different HLA class I subtypes. RESULTS A publicly available data set of A*0201 binding peptides (n=1181) and non-binding peptides (n=1908) was downloaded from the Immune Epitope Database (IEDB) [3]. Amino acid residues of anchor positions (P2 and Pω) were characterized using values of 544 physical, chemical, conformational, or energetic properties (AAindex v9.4). [4] Properties downloaded from the AAindex (http://www.genome.jp/aaindex/) were each represented numerically (each amino acid had a numerical value for each property). In cases where there was no value for a particular amino acid/property combination a value of zero was assigned. We created input files for the next step in processing using a simple Java program. Each amino acid in the anchor positions was assigned the numerical value given from the reported AAindex properties table. For each anchor position, the Correlation-based Feature Subset Selection algorithm [5], together with the Best First (greedy hillclimbing) search method, were used to identify the subset of properties that best distinguished binders from non-binders. Attribute selection algorithms were implemented using the Weka software package v3.6. [6] Selected features using the full training set for anchor 1 and anchor 2 were summarized in Table 1, and results using fivefold cross-validation are reported below. Using fivefold cross-validation, the amino acid properties of normalized frequency of extended structure (Burgess et al., 1974), parameter of charge transfer capability (Charton-Charton, 1983), and relative preference value at C1 (Richardson-Richardson, 1988) best characterized the residues in anchor 1 (P2). The anchor 2 position (Pω), again using fivefold cross-validation, was best represented by the number of atoms in the side chain labeled 3+1 (Charton-Charton, 1983), parameter of charge transfer donor capability (Charton-Charton, 1983), normalized frequency of C- terminal non helical region (Chou-Suzuki, 1976), information measure for middle turn (Robson-Suzuki, 1976), and amphiphilicity index (Mitaku et al., 2002). References: 1. Rammensee, H.G., T. Friede, and S. Stevanoviic, MHC ligands and peptide motifs: first listing. Immunogenetics, 1995. 41(4): p. 178-228. 2. Robinson, J., et al., IMGT/HLA database--a sequence database for the human major histocompatibility complex. Tissue Antigens, 2000. 55(3): p. 280-7. 3. Peters, B., et al., The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol, 2005. 3(3): p. e91. 4. Kawashima, S. and M. Kanehisa, AAindex: amino acid index database. Nucleic Acids Res, 2000. 28(1): p. 374. 5. Hall, M.A., Correlation-based feature selection of discrete and numeric class machine learning, in Computer Science Working Papers. 2000, University of Waikato, Department of Computer Science: Hamilton, New Zealand. 6. Witten and Frank. Data Mining: Practical machine learning tools and techniques. 2nd edition ed. 2005, San Francisco: Morgan Kaufmann. Table 1. Selected attributes for HLA-A*0201 anchor positions 1 and 2. Anchor PositionAAIndex Property a Original Reference Anchor 1 A parameter of charge transfer donor capabilityCharton, 1983 Amino acid compositionDayhoff, 1978 Atom based hydrophobic momentEisenberg, 1986 Partition coefficientGarel, 1973 PolarityGrantham, 1974 Hydrophilicity valueHopp-Woods, 1981 Normalized frequency value of alpha-helix with weightsLevitt, 1978 AA composition of total proteinsNakashima, 1990 Normalized frequency of beta-sheet in all-beta classPalau, 1981 Weights for alpha-helix at the window position of 3Qian-Sejnowski, 1988 Average relative fractional occurrence in E0(i)Rackovsky-Scheraga, 1982 Relative preference value at C-capRichardson, 1988 Normalized positional frequency at helix termini N4Aurora-Rose, 1998 Volumes including crystallographic waters using ProtOrTsai, 1999 Anchor 2 The number of bonds in the longest chainCharton, 1983 Average volume of buried residueChothia, 1975 Normalized frequency of N-terminal beta-sheetChou-Fasman, 1978 Conformational preference for parallel beta-strandsLifson-Sander, 1979 AA composition of mt-proteins from fungi and plantNakashima, 1990 Information measure for C-terminal turnRobson-Suzuki, 1976 Volumes including crystallographic waters using ProtOrTsai, 1999 a Accessed March 2010 from http://www.genome.jp/aaindex/ Figure 1. Common HLA-A*0201 motif. Anchor 1 and Anchor 2 were characterized using AAIndex Properties (v9.4). Anchor 1 Anchor 2


Download ppt "Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge 1, Hernando Escobar 1, Peter E. Jensen 1, Julio C. Delgado 1, David."

Similar presentations


Ads by Google