Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.

Slides:



Advertisements
Similar presentations
Linear Programming (LP) (Chap.29)
Advertisements

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Crystallography -- lecture 21 Sidechain chi angles Rotamers Dead End Elimination Theorem Sidechain chi angles Rotamers Dead End Elimination Theorem.
1 MERLIN A polynomial solution for the Traveling Salesman Problem Dr. Joachim Mertz, 2005.
Lecture 10: Integer Programming & Branch-and-Bound
EMIS 8373: Integer Programming Valid Inequalities updated 4April 2011.
1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.
Rapid Protein Side-Chain Packing via Tree Decomposition Jinbo Xu Department of Mathematics Computer Science and AI Lab MIT.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Computational problems, algorithms, runtime, hardness
Approximation Algorithms
The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Approximation Algorithms
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Odds and Ends HP ≤ p HC (again) –Turing reductions Strong NP-completeness versus Weak NP-completeness Vertex Cover to Hamiltonian Cycle.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Linear Programming – Max Flow – Min Cut Orgad Keller.
Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’
1 Lecture 4 Maximal Flow Problems Set Covering Problems.
Two Discrete Optimization Problems Problem: The Transportation Problem.
Decision Procedures An Algorithmic Point of View
Fixed Parameter Complexity Algorithms and Networks.
Design Techniques for Approximation Algorithms and Approximation Classes.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Example Question on Linear Program, Dual and NP-Complete Proof COT5405 Spring 11.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
Molecular simulation methods Ab-initio methods (Few approximations but slow) DFT CPMD Electron and nuclei treated explicitly. Classical atomistic methods.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Integer Programming (정수계획법)
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
EMIS 8373: Integer Programming Column Generation updated 12 April 2005.
1 Approximation algorithms Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij TexPoint fonts used in EMF. Read the TexPoint manual.
Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.
A Linear Time Algorithm for the Longest Path Problem on 2-trees joint work with Tzvetalin Vassilev and Krassimir Manev
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Approximation Algorithms based on linear programming.
1 Geometrical solution of Linear Programming Model.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Example for SCP-problem PDB code: 2OL9 #Resides : 6 ① SER ② ASN ④ ASN ⑤ ASN ⑥ PHE ③ GLN 1 ① SER ② ASN ④ ASN ⑤ ASN ⑥ PHE ③ GLN.
Computational problems, algorithms, runtime, hardness
The minimum cost flow problem
Approximation algorithms
EMIS 8373: Integer Programming
Chapter 5. Optimal Matchings
Integer Programming (정수계획법)
Using WinQSB to solve Linear Programming Models
Linear Programming Duality, Reductions, and Bipartite Matching
CPS 173 Computational problems, algorithms, runtime, hardness
Integer Programming (정수계획법)
Dr. Arslan Ornek DETERMINISTIC OPTIMIZATION MODELS
Presentation transcript:

Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics Vol. 21, no. 7, pages 1028–1036, 2005 Date: October 6, 2005 Created by Jing-Liang Hsin

Abstract Side-chain positioning is a central component of homology modeling and protein design. In a common formulation of the problem, the backbone is fixed, side-chain conformations come from a rotamer library, and a pairwise energy function is optimized. It is NP-complete to find even a reasonable approximate solution to this problem. We seek to put this hardness result into practical context. We present an integer linear programming (ILP) formulation of side- chain positioning that allows us to tackle large problem sizes. We relax the integrality constraint to give a polynomial-time linear programming (LP) heuristic. We apply LP to position side chains on native and homologous backbones and to choose side chains for protein design.

Abstract Surprisingly, when positioning side chains on native and homologous backbones, optimal solutions using a simple, biologically relevant energy function can usually be found using LP. On the other hand, the design problem often cannot be solved using LP directly; however, optimal solutions for large instances can still be found using the computationally more expensive ILP procedure. While different energy functions also affect the difficulty of the problem, the LP/ILP approach is able to find optimal solutions. Our analysis is the first large-scale demonstration that LP-based approaches are highly effective in finding optimal (and successive near-optimal) solutions for the side-chain positioning problem.

Problem Formulation The potential energyε = E 0 + Σ i E( i r ) + Σ i<j E( i r, j s ) –E 0 is the self-energyof the backbone. –E( i r ) is the energy resulting from the interaction between the backbone and the chosen rotamer i r at position i as well as the intrinsic energy of rotamer i r. –E( i r, j s ) accounts for the pairwise interaction energy between chosen rotamers i r and j s.

Problem Formulation Reformulate the SCP problem in graph-theoretic terms –Let G be an undirected p-partite graph. –The node set V 1 ∪ · · · ∪ V p, where V i includes a node u for each rotamer i r at position i. –Each node u of V i is assigned a weight E uu = E( i r ). –Each pair of nodes u ∈ V i and v ∈ V j ( i = j ), corresponding to rotamers i r and j s respectively, is joined by an edge with a weight of E uv = E( i r, j s ).

Integer Linear Programming Formulation The vertex set of this graph is V = V 1 ∪ · · · ∪ V p, and its edge set D = {(u, v) : u ∈ V i, v ∈ V j, i = j }. We introduce a { 0, 1 } decision variable x uu for each node u in V, and a { 0, 1 } decision variable x uv for each edge in D. The following integer program ensures these conditions: –Minimize ε = Σ u ∈ V E uu x uu + Σ {u,v} ∈ D E uv x uv –subject to Σ u ∈ V j x uu = 1 for j = 1,..., p Σ u ∈ V j x uv = x vv for j = 1,..., p and v ∈ V \ V j x uu, x uv ∈ {0, 1}

Integer Linear Programming Formulation For each V j, let N + (V j ) be the set union of the V i for which there exists some v ∈ V i and u ∈ V j with E uv > 0. Let D’ be the set of pairs {u, v} with u ∈ V j such that either v ∈ N + (V j ), or v ∈ N + (V j ) but E uv < 0. The modified ILP is as follows: –Minimize ε = Σ u ∈ V E uu x uu + Σ {u,v} ∈ D’ E uv x uv –subject to Σ u ∈ V j x uu = 1 for j = 1,..., p Σ u ∈ V j x uv = x vv for j = 1,..., p and v ∈ N + (V j ) Σ u ∈ V j, Euv < 0 x uv ≤ x vv for j = 1,..., p and v ∈ N + (V j ) x uu, x uv ∈ {0, 1}

LP/ILP Approach We solve the problem of interest using the computationally easier LP formulation. If the solution returned is integral, then the problem instance was easy to solve, and we have the optimal solution to the original SCP problem. Otherwise, we run polynomial-time Goldstein dead-end elimination (DEE) (Goldstein, 1994) until no more rotamers can be eliminated and then solve the more difficult ILP. The CPLEX package with AMPL (Fourer et al., 2002) was used to solve the linear and integer programs.

Rotamer Library Using Dunbrack’s backbone-dependent rotamer library (Dunbrack and Karplus, 1993). For each 10 ◦ range of ψ(phi), φ(psi) backbone angles, this library has 320 rotamers.

Dead-End Elimination In the cases where ILP was necessary, we first processed the problem instances with DEE. A rotamer u ∈ V i can be thrown out if there is some other rotamer v ∈ V i such that E uu − E vv + Σ j =i min w ∈ V j (E uw − E vw ) > 0.

Energy Function Van der Waals interactions between rotamers –The pairwise van der Waals interaction energy between rotamers u and v is the sum of the van der Waals interactions between the side-chain atoms of u and v. –The parameters used in the van derWaals force are those of AMBER96. Van der Waals interactions in self-energy terms

Energy Function Statistical self-energies –For each amino acid i in a particular backbone setting, let p iu be the fraction of times amino acid i is found in rotamer u, and p i 0 be the fraction of times amino acid i is in its most common rotamer. –As in (Bower et al., 1997), the statistical self-energy term for a particular rotamer u is given by − ln (piu/pi 0 ). –The more common a rotamer, the lower the energy assigned to it. Combining the statistical self-energies with the van der Waals interactions –In summing up the total energy of the system, the statistical self- energy term is weighted by a constant C. –C = 10 was taken to be a good choice.

Dataset

Computational Results