Presentation on theme: "Taking Geometry to its Edge: Fast Rigid (and Hinge-Bent) Docking Algorithms. Haim Wolfson 1, Dina Duhovny 1, Yuval Inbar 1, Vladimir Polak 1, Ruth Nussinov."— Presentation transcript:
Taking Geometry to its Edge: Fast Rigid (and Hinge-Bent) Docking Algorithms. Haim Wolfson 1, Dina Duhovny 1, Yuval Inbar 1, Vladimir Polak 1, Ruth Nussinov 2,3 1 School of Computer Science, 2 School of Medicine Tel Aviv University, Israel 3 NCI-Frederick, USA
CAPRI: Critical Assessment of PRediction of Interactions First docking contest: 19 groups from all over the world. Round 1 – 3 targets. Round 2 – 4 targets. Only 5 predictions per target can be submitted.
Molecular Surface Representation Local Critical Feature Selection Geometric Matching of Critical Features Filtering and Scoring Active site knowledge Candidate Transformations PDB files Geometric Docking Algorithms
PPD – Norel et al. 1994 Surface representation – Connolly MS. Critical features – local extrema of surface curvature ‘knobs’ / ‘holes’. Matching – pairs of critical points + associated normals are matched using Geometric Hashing. Scoring – shape complementarity (allowing moderate penetration), electrostatics, aromatic residues.
BUDDA – V. Polak (M.Sc. Thesis 2002) Surface representation – Connolly patch centers (caps/pits/belts), distance transform grid. Critical features – ‘knobs’ / ‘holes’+ caps/pits/belts. Option to focus on backbone residue related points. Matching – knob/hole +a pair of neighboring caps/pits are matched using Geometric Hashing. Scoring – shape complementarity, allowing moderate penetration.
PatchDock – Duhovny et al. 2002 Surface representation – distance transform grid, multi-resolution surface. Critical features – three types of surface patches: convex, concave and flat. Focus on active site : hot spot rich patches. Matching – patch points are matched by Geometric Hashing. Scoring – shape complementarity, allowing moderate penetration.
Our Docking Algorithms PPDBUDDAPatchDock Surface representation Connolly’s MSCaps/pits/belts, distance transform grid distance transform grid, multi-resolution surface Critical features ‘knobs’/‘holes’ point+normal pairs backbone ‘knob/hole’+pair of ‘caps/pits’ surface patches: convex, concave and flat (point+normal pairs in a patch) Matching algorithm Geometric Hashing Filtering and scoring shape complementarity, electrostatics, aromatic residues shape complementarity Active Site Focusing in the matching step in matching and scoring steps
Automatic CDR detection The light and heavy chains of CDRs have conserved patterns that enable us to align a given sequence to a consensus sequence which was derived using statistical data. This alignment is used further to locate the CDRs area.
HPr kinase / phosphatase is a key regulatory enzyme controlling carbon metabolism in bacteria. The protein is a hexamer. HprK/P contains the Walker motif - characteristic of nucleotide-binding proteins. Target 1 – HPR Kinase
It catalyses the ATP-dependent phosphorelation/dephosphorelation of Ser46 in HPr. Target 1 – HPR Kinase
What was done: –Distance constraint of 10.0 Å between the oxygen atom of Ser(Asp)-46 and the closest phosphate oxygen.
Target 1 – HPr Kinase/HPr What was done: –Distance constraint of 10.0 Å between the oxygen atom of Ser(Asp)-46 of the HPr and the closest phosphate oxygen. Results: –Best result within top 10 ranked 7; RMSD from native ~8.0 Å –Explanation: A considerable part of the interface surface area is between the HPR and the enzyme flexible helix.
Target 1 – Lessons Learned Flexible hinge-bent docking: –Two rigid parts of enzyme: (i) Helix of chain C (ii) the body of chain A without the helix. New results: –2 nd best scoring result: ~ 3.0 Å, run- time: 2 min. –In our solution the phosphocarrier protein is in red and the helix of the kinase is in orange. –These results were achieved without using the distance constraint.
The position of the helix in the uncomplexed structure (dark green color) The position of the helix in the solution obtained by flexible docking (orange) The position of the helix in the structure of the complex (purple)
VP6 protein of rotavirus that causes gastroenteritis in children. Target 2 – Biological Background
Trimmer (symmetry) The surface of the B (helices) domain is buried in rotavirus capsid. The H-domain interacts with the antibody. A ‘hint’ was given- to use the trimmer in the docking, meaning that active site is expanded to more than one chain. Target 2 – Biological Background
Target 2 – what we did The antibody potential binding site was restricted to CDRs. The antigen VP6 potential binding site was restricted to the β domain. We selected solutions with interfaces that include: 1. at least 4 CDRs of the antibody with high TYR,TRP concentration. 2. at least 2 chains of the antigen. clustering of the solutions obtained for the different chains of the trimer.
Target 2 – our best hit Our solution in blue vs. original complex (RMSD 15A, rank 7) (within 5 results that were not submitted due to technical problems)
Target 2 – Lessons Learned 1 Search only loop regions of the antigen Restrict even further the antigen to the exposed part of the virus capsid Loops of the “cap” region are in spacefill Side view Top view
Target 2 – Lessons Learned 2 Filter out results that cause steric clash of the 3 (symmetric) antibodies binding to the antigen trimer.
Target 2 : a-posteriori best hits Our best hit: RMSD 3.08 rank 76 First 10: RMSD 5.54 A, ranked 9 Run-time: 7 min VP6 molecule in spacefill, original complex Fab is in blue superimposed on our solution (rank 9) in yellow.
Target 2 – analysis of geometric shape complementarity The area of the interface of the original (blue) complex: ~400A 2 The area of the interface in our highest ranked solution (yellow) is ~600A 2. In this result the light chain of the antibody is shifted towards the center of virus capsid, enlarging shape complementarity. The heavy chain is very close to it’s original location. Heavy chains Light chains
Target 2 – shape complementarity Heavy chains only Light chains
Hexamer: 3 dimmers (symmetry) One chain of the dimmer(s) is buried in the capsid. Other antibody- antigen complexes of this antigen also imply that the epitope is on the ‘external’ chains (A,C and E). Target 3 – influenza hemagglutinin:
Target 3 – what we did The antibody potential binding site was restricted to CDRs. We selected solutions with interface that includes: 1. at least 4 CDRs of the antibody with high TYR,TRP concentration. 2. only 1 chain of the antigen main reason to failure! Clustering of the solutions obtained for the different chains of the trimer.
Target 3 – Lessons Learned Restrict antigen potential binding site to the exposed domain of the virus capsid Filter out results that cause steric clash of the 3 antibodies (symmetry constraint). Filter out results that include only one chain of the virus capsid in the interface. Detect structurally conserved regions of Influenza virus Hemagglutinin to reduce the effective protein surface. MultiProt – a tool for multiple alignment and detection of structurally conserved patterns. Applied to 25 structures of Hemagglutinin from the PDB.
Target 3 – MultiProt Results 138 structurally conserved residues out of 320 residues in domain HA1. Some of those residues exhibit significant sequence variability. HA1 domain HA2 domain Structurally conserved residues
Three sites of antibody binding: Capri Target 3 1QFU 2VIR Hemagglutinin molecule Structurally conserved residues
Target 3 – MultiProt Results 1qfucapri3 2vir
Target 3 – final results RMSD 3.10 A, rank 6, run-time: 5 min The original complex antibody is in red and our solution is in green. Virus capsid protein hemagglutinin antibody from complex docking solution
Targets 4,5,6 – alpha-amylase 3 Catalytic Residues in largest cavity: Ca (stabilizer ion) Cl (activator ion) Asp 197,300 Glu 233 Gly rich flexible loop (304-309)
Targets 4,5,6 – what we did Non conserved regions, based on multiple sequence alignment of mammalian amylase, were extracted. These regions were marked as the amylase potential binding site for the camel antibody. Favor results with wider interface area of CDR loop H3. Favor results with wider interface area of variable regions – reason to failure in all 3 targets.
Why non-conserved? amylase antibody ? The camel has it’s own amylase. He can only produce antibodies for the residues that differ between the two amylases. The interface must include some of those different residues. We don’t know the sequence of camel amylase, so we simply consider variable regions.
Targets 4,5,6 – non conserved regions in the interfaces Target 4: 15% of the interface Target 5: 13% of the interface Target 6: 20% of the interface ConSurf output:
Targets 4,5,6 – automatic CDR detection Target 4: 89% of the interface Target 5: 88% of the interface Target 6: 83% of the interface Target 4 Target 5 amylase antibody CDRs
Targets 4,5,6 – new results Only restriction – at least 70% of the interface in the candidate complexes belongs to CDRs. Average running time: 25 min TargetBest RMSD RankInterface Area of the Correct solution Interface Area of the Highest Ranked Solution 42.67169~ 405A 2 ~ 765A 2 51.82156~ 435A 2 ~ 700A 2 61.904~ 570A 2 ~ 600A 2
Target 7: T-cell receptor with streptococcal pyrogenic exotoxin TCR-SAG complex in blue, Streptococcal toxin in yellow In the PDB search we found a complex of TCR with staphylococcal enterotoxin. The toxins have high structural similarity alignment is our first solution
Target 7: Docking Active site focusing: TCR: only loops that are relevant for SAG binding were selected. Toxin: loops from the interface of the complex computed by alignment were selected.
Target 7: Docking Results Active site focusing for TCR and SAG: Best result : RMSD 3.37, Rank 3, Running Time: 1 min Active site focusing for TCR only: Best result : RMSD 3.37, Rank 36, Running Time: 7 min
Conclusions 1 We have presented results of fast rigid docking algorithms, which are based on geometric shape complementarity only. The algorithms can be easily extended to include main-chain flexibility (hinge bending). Successful approximate focusing on the binding sites of the proteins “almost” ensures ranking of a “correct” solution at the top.
Conclusions 2 Despite the heuristic nature of the algorithms, which are based on local shape complementarity and not on exhaustive search of the transformation space, “correct solutions” are not lost. A “correct solution” always appears among the first few hundred, yet the “best solution” might exhibit significantly higher shape complementarity than the “native” one.
Conclusions 3 Re-ranking by a GOOD energy function of the top few hundred geometric solutions, would result in obtaining a correct solution. Biological knowledge of “similar” interactions can assist in the focusing on the binding sites. A fully automatic prediction can help to evaluate the relative merit of the various algorithms employed.
Acknowledgements The CAPRI organizers and evaluators. Maxim Shatsky, Hadar Benyamini, Inbal Halperin, Adi Barzilay, Snait Tamir. Raquel Norel.