Presentation is loading. Please wait.

Presentation is loading. Please wait.

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.

Similar presentations


Presentation on theme: "Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation."— Presentation transcript:

1 Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation

2 CMSC 838T – Presentation Motivation u Problem paper is trying to solve  3D structure prediction using threading  Is a given target sequence likely to fold to a 3D template core?  Find the alignment that minimizes some score function  NP-complete; optimal solution not possible  MAX-SNP-hard; arbitrary approximation not possible u Why do we care  3D structure determines biological function of protein  Amino acid sequence (almost) uniquely determines 3D structure  Threading is usually less accurate than comparative modeling but easier to solve

3 CMSC 838T – Presentation Talk Overview u Overview of talk  Motivation  Techniques  Evaluation  Related work  Observations

4 CMSC 838T – Presentation Techniques u Approach  Reduce the problem to some known theoretical problem of interest l In this case, network flow  Use existing tools for solving the theoretical problem efficiently l CPLEX  Explore possibilities for parallelizing the problem  Investigate the intrinsic hardness for real biological examples

5 CMSC 838T – Presentation Two structurally similar proteins Spatial adjacencies (interactions) Possible threading with a sequence Objective function Mathematical Formulation

6 CMSC 838T – Presentation Reduction to Network Flow: An Example

7 CMSC 838T – Presentation Reduction to Network Flow: Variables and Constraints u Standard Network Flow  Variable x i,t for each segment to position assignment  Restricted to [0, 1]  With standard flow conservation constraints u Additional cost for non-local interactions  Variable z i,t,i’,t’ for each non-local interaction  Restricted to {0, 1}  Constrained to sum to 1 for each non-local pair (i, i’)  Upper bounded by flow entering (i, t) and leaving (i’, t’)

8 CMSC 838T – Presentation Drawbacks of Approach u Integer programming is hard to solve!  Relax to linear programming with (0, 1) variables  Approximate to integer solution using standard heuristics  Existing tools like CPLEX u Huge number of variables  For 36 segments and 81 positions, IP problem has 741264 rows, 360945 columns and 54145231 non-zero variables!  Need to reduce number of variables and constraints  Calls for parallelization if possible

9 CMSC 838T – Presentation Parallel Solution u Utilize special flow constraints  Split into sub-problems that may be solved parallely  Split the k-th layer in the graph into r intervals  Force path for a sub-problem to pass through a particular interval in the layer  Pass best bound for objective function found so far as parameter to sub-problem  Sub-task aborts when dual objective function exceeds the current best bound

10 CMSC 838T – Presentation Improving Parallel Solution u Drawback: Hardest Sub-Problem Dominates!  Parallel strategy was found to be slower than the sequential!  Sub-problems can potentially become harder to solve  Many more difficult sub-problems than easy ones u Solution:  Break the atomicity of the tasks  Each sub-task periodically checks the current best bound and updates its cut-off  Extra overhead is still small compared to task granularity  Now the easiest executing sub-task dominates!

11 CMSC 838T – Presentation Evaluation u Experimental environment  Real protein sequences  ILOG CPLEX Callable Library  SUN Ultra-Sparc II, 450 Mhz  Objective function coefficients generated from FROST  Maximum of 7 processors and 29 sub-problems u Evaluation results  Sequential version much faster than previous branch-and- bound results for the same problem formulation  Time taken comparable to PROSPECT  Splitting and parallelization significantly improve turnaround  Really tiny gap between relaxed LP and ILP solutions  Mostly integer solutions even for relaxed LP!

12 CMSC 838T – Presentation Result Tables Comparison with branch and bound algorithm Comment: Self threading results in significantly lower scores (as should be)

13 CMSC 838T – Presentation Result Tables Gap between relaxed LP and ILP Comment: Tiny relaxation gap. (significance?)

14 CMSC 838T – Presentation Result Tables Size of the LP formulation Comment: LP problem size is still too large.

15 CMSC 838T – Presentation Result Tables Performance with parallel sub-tasks Comment: Longer times with more sub-problems??

16 CMSC 838T – Presentation Related Work u Similar / previous approaches  Lathrop and Smith, 1998 l Uses same cost function l Branch and bound algorithm for searching the space of threadings  Xu, Xu and Uberbacher, 1998 l Divide and conquer algorithm  Xu, Li, Lin, Kim and Xu, 2003 l Linear programming formulation l Solved using b&b algorithm u None of the above suggest any parallelizing scheme

17 CMSC 838T – Presentation Observations u Points of Interest  Mapping to a known problem of interest  Nicely utilizes particular constraints to break into independent subtasks  Threading of real amino acid sequences seems possible  Raises interesting questions about real-life protein threading being in P  Solver tailored for this particular problem may yield better results

18 CMSC 838T – Presentation Observations u Criticism  Not enough experiments with large number of subtasks and processors to show scaling  Prohibitively large number of variables and constraints  How accurate are the objective function coefficients?  What is the resolution of the objective function?  Threading onto multiple sequences for prediction still looks daunting  Not clear how to extend the idea for 3-way and more complex interactions u Improvements  Seems possible to break up the sub-tasks recursively

19 CMSC 838T – Presentation Thank you!


Download ppt "Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation."

Similar presentations


Ads by Google