Presentation is loading. Please wait.

Presentation is loading. Please wait.

Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz, and David Page An ILP Approach to Model and Classify Hexose Binding Sites.

Similar presentations


Presentation on theme: "Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz, and David Page An ILP Approach to Model and Classify Hexose Binding Sites."— Presentation transcript:

1 Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz, and David Page An ILP Approach to Model and Classify Hexose Binding Sites

2 Problem Description Hexoses play a key role in many cellular pathways Hexose binding properties are of great interest to biomedical researchers Current protein-sugar computational models are based on prior biochemical knowledge Goal: Investigate the empirical support for biochemical findings by comparing ILP-induced rules to actual biochemical results

3 Biochemical Review An amino acid consists of: a central carbon atom C an amino group NH2 a carboxyl group COOH a hydrogen atom H and a side chains R Amino acids differ by their side chain R Side chain confer to each amino acid its distinctive properties There are 20 different amino acids, also called residues

4 Biochemical Review A protein is a long chain of amino acids linked together Residue sequence determines protein shape and function Similar residues can be easily interchanged in a protein

5 Biochemical Review Hexoses are 6-carbon sugar molecules Hexoses consist of: A core pyranose ring Hydroxyl ( -OH ) groups sticking out

6 Biochemical Review Pyranose ring: Apolar (no charge) hydrophobic (water hating) Hydroxyl ( -OH ) groups: Polar (negative charge) Hydrophilic (water loving) Form hydrogen bonds Interaction forces between hexoses and residues are due to: Charge Hydrogen bond Hydrophobicity

7 Prior Biochemical Findings Planar polar residues ( Asn, Asp, Gln, Glu, Arg ) are frequently involved in hydrogen bonding. The aromatic residues ( Trp, Tyr, Phe, His ), stack against the apolar surface of the sugar pyranose ring. Planar polar and aromatic residues are present at higher frequencies in hexose binding sites. Ordered water molecules and metal ions are involved in binding specificity and affinity. Hexoses and their binding sites are neither hydrophobic nor hydrophilic. They exhibit both properties in a dual nature.

8 Predicting Sugar Binding Sites Prior biochemical findings have been incorporated in binding site classifiers Black box models We take the opposite approach: Given hexose binding sites data, what biochemical rules can we extract with no prior biochemical knowledge?

9 Dataset Mine Protein Data Bank for galactose, glucose and mannose Remove redundancies and keep proteins with a hexose docked in binding site Get 80 protein-hexose binding sites (positive set) Extract 80 negatives: non-hexose binding sites and non- binding surface grooves Total of 160 entries, equally divided.

10 Binding Site Representation Only the few atoms present at the binding site determine the binding site affinity. We define the binding site as a sphere of radius 10 Å, centered at the binding site center We extract all atoms in this sphere We ONLY consider atoms, not residues For every atom, we compute its charge, hydrogen bonding, and hydrophobicity properties

11 Problem Formulation Use Aleph heuristic search to learn first-order rules Estimate the performance using 10-fold cross-validation The consequent of any rule is bind(A), where A is predicted to be a hexose binding site Restrict clause length to a maximum of 8 literals Tolerate a clause coverage of up to 5 training-set negatives Minimize the cost function: cost = (# covered negatives) − (# covered positives)

12 Literals Individual atom literal: point(A,B,C,D,E, F,G,H, I, J) A: binding site B: atom number C, D, E: Cartesian coordinates F: charge value (nominal) G: hydrogen bonding value (nominal) H: hydrophobicity value (nominal) I, J: atomic element and its name

13 Literals Distance between two atoms literal: dist(A,B1,B2,M,N) A: binding site B1, B2: two atoms numbers M: their Euclidean distance N: the error, resulting in M±N (set to 0.5 Å)

14 Results 32.5% error rate over 10 folds (p-value < 0.0002) comparable to other general sugar binding site classifiers Although we only consider atoms, we can infer valuable information regarding amino acids Example: ND1 atoms are only present in His residues. A rule requiring ND1 is actually requiring His. We present the rule’s English translation, with residue substitution and sorted by coverage

15 A is a hexose-binding site if: 1. It has a Trp residue and a Glu with an OE1 atom that is 8.53 Å away from a negatively charged Oxygen. [Pos cover = 22, Neg cover = 4] 2. It has a Phe or Tyr residue and an Asp with an OD1 atom that is 5.24 Å away from an Asp or Asn ’s OD1. [Pos cover = 21, Neg cover = 3] 3. It has a branching aliphatic residue ( Leu, Val, Ile ), an Asp and an Asn. Asp and Asn ’s OD1 atoms are 3.41 Å away. [Pos cover = 15, Neg cover = 0]

16 A is a hexose-binding site if: 4. It has a hydrophilic non-hydrogen bonding Nitrogen atom ( Pro, Arg, His ) with a distance of 7.95 Å away from a His ND1 nitrogen, and 9.60 Å away from a branching aliphatic residue’s CG1. [Pos cover = 10, Neg cover = 0] 5. It has a hydrophobic CD2 atom, a hydrophilic Pro backbone or His ND1 nitrogen and two Glu (or two Gln ) distant by 11.89 Å. [Pos cover = 11, Neg cover = 2] 6. It has an Asp B, two identical atoms Q and X, and a hydrophilic hydrogen-bonding atom K. Atoms K, Q and X have the same charge. B’s ODE1 oxygen share the same Y-coordinate with K and the same Z- coordinate with Q. Atom X is 8.29 Å away from atom K. [Pos cover = 8, Neg cover = 0]

17 A is a hexose-binding site if: 7. It has a Ser, and two Gln and/or His, with NE2 atoms that are 3.88 Å apart. [Pos cover = 8, Neg cover = 2] 8. It has an Asn and a Phe, Tyr or His residue, with a CE1 atom that is 7.07 Å away from a Calcium. [Pos cover = 5, Neg cover = 0] 9. It has a Lys or Arg, a Phe or Tyr, a Pro or His, and a Sulfate or a Phosphate. [Pos cover = 3, Neg cover = 0]

18 Discussion We infer most of the established biochemical information Rules 1 and 2, with highest coverage, rely on the aromatic residues Trp, Tyr, and Phe. The fourth aromatic residue, His, is mentioned in many different rules. This highlights the docking interaction between the hexose and the aromatic residues. All rules require the presence of a planar polar residue ( Asn, Asp, Gln, Glu, Arg ). These residues are most frequently involved in hexose hydrogen- bonding.

19 Discussion The residues mostly mentioned in the rules are aromatic and planar polar. Which mirrors the fact that they are present at higher frequencies in hexose binding sites Rule 5 requires both a hydrophobic and a hydrophilic elements. It reflects the dual nature of hexose docking. Rules 8 and 9 require the presence of different ions (Calcium, Sulfate, Phosphate), confirming the relevance of ions in hexose binding.

20 New discovery? Rule 2 suggests a dependency between Phe / Tyr and Asn / Asp. Such a relation has been proven in lectins. Similarly, rule 1 suggests a dependency between Trp and Glu. A link not previously identified in the literature Further investigation is needed to confirm this finding

21 Conclusion ILP achieves a similar accuracy as other general sugar black- box classifiers In addition, it offers insight into the discriminating process. Aleph was able to induce most of the known hexose-protein interaction biochemical rules. ILP finds a previously unreported dependency between Trp and Glu.


Download ppt "Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz, and David Page An ILP Approach to Model and Classify Hexose Binding Sites."

Similar presentations


Ads by Google