Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lynx: A Programmatic SAT Solver for the RNA-folding Problem Vijay Ganesh, Charles W. O’Donnell, Mate Soos, Srinivas Devadas, Martin C. Rinard, and Armando.

Similar presentations


Presentation on theme: "Lynx: A Programmatic SAT Solver for the RNA-folding Problem Vijay Ganesh, Charles W. O’Donnell, Mate Soos, Srinivas Devadas, Martin C. Rinard, and Armando."— Presentation transcript:

1 Lynx: A Programmatic SAT Solver for the RNA-folding Problem Vijay Ganesh, Charles W. O’Donnell, Mate Soos, Srinivas Devadas, Martin C. Rinard, and Armando Solar-Lezama SAT Conference, Trento, Italy 2012

2 2 / XX SAT Solvers “are a black-box” Problem Users want more Control The story so far … SAT solvers have been amazingly successful in many fields AI, formal methods, testing, program analysis,… New applications everyday (e.g., biology) However Diminishing returns of “baked-in” solver heuristics For most users solvers are a magic black-box difficult to control “How can I integrate my heuristic into the solver with minimal effort?”

3 3 / XX RNA-folding Problem A programmatic SAT-based Solution Why does a SAT solver-based approach for Bio make sense? Computational biology is an ideal domain for declarative languages like SAT Because problems are often modeled as mathematical constraints Biologists prefer to prototype models quickly, minimize coding Is a simple translation to SAT sufficient? Unfortunately No! Naïve SAT representation of problem instances blows up Users need to have greater control over the solver heuristics

4 4 / XX An Effective Solution to the Black-box Problem Programmatic Solvers SAT SOLVER USER CODE Input Formula Result Key Idea:  Expose solver internals to user through a callback programmatic API  User writes code for the API to influence solver behavior  Users gain control

5 5 / XX Central Dogma of Biology From DNA through RNA to Function DNA RNA DNA transcribed into messenger RNA Function RNA encodes amino acid chains that fold into proteins Folded proteins interact to control function Proteins

6 6 / XX RNA-folding: 3-D Structure of non-coding RNA From Structure to Function DNA RNA RNA-Protein RNA-DNA Protein-RNA-DNA Guttman, Rinn 2012 RNA-RNA-DNA Protein-RNA- Protein-DNA RNA-RNA

7 7 / XX RNA What is its Structure? phosphate + sugar + base RiboNucleic Acid 4 bases C = G = A = U = Cytosine Guanine Adenine Uracil P

8 8 / XX RNA-folding Problem From Structure to Function Question: How does RNA sequence determine its folded 3D structure, and thus its function? Easy to determine RNA’s primary structure through bio experiments Very expensive to determine 3D structure through bio experiments alone Hence, we need computational prediction tools for RNA optimal 3D structure “Primary” structure (sequence) 3D structure

9 9 / XX RNA-folding problem Optimal Secondary Structure Prediction Problem Unfortunately, modeling every atom/electron too computationally demanding Solution? implement reduced model The reduced model is called the secondary structure The secondary structure is an approximate planar representation of 3D structure “Primary” structure (sequence) “Tertiary” structure (3D) “Secondary” structure

10 10 / XX Computational thermodynamics-based solution: Define energetic “cost” function for all possible structures Must be optimal Score( ) = 839 kcal/mol Score( ) = 1029 kcal/mol Score( ) = 992 kcal/mol Score( ) = 2267 kcal/mol UNC LCCC UIUC Rothamstad Res Search all structures to find “best” (minimum energy funnel) Dill/Chan RNA-folding problem Optimal Secondary Structure Prediction Problem

11 11 / XX RNA-folding problem Quick Recap Given the primary RNA sequence and thermodynamic cost function, can we predict the optimal secondary structure? “Primary” structure (sequence) “Tertiary” structure (3D) “Secondary” structure

12 12 / XX Obtaining RNA structures Lynx RNA model for secondary structure: Given a string (RNA sequence) - any nucleic acid at position i can pair with another at j, subject to four general constraints (more later) Lynx decision problem (energy function constraint): Assign independent scores to all potential (i,j) pairs, find a valid assignment of (i,j) pairs whose scores sum to be greater than some threshold t Score( ) =

13 13 / XX Lynx RNA structure constraints Based on this published energy model that assumes score independence, valid structures can be “knot-free” or contain “crossing pseudo-knots” “pseudo-knot”

14 14 / XX Lynx RNA structure constraints Bit-vectors X and Y (length n 2 ), indicates two independent configurations of “knot- free” (i,j) pairings Crossing-pseudoknots allowed by simultaneous assignment of X and Y Constraint 1: Every position (nucleotide) can only pair with at most one other position

15 15 / XX Lynx RNA structure constraints Bit-vectors X and Y (length n 2 ), indicates two independent configurations of “knot- free” (i,j) pairings Crossing-pseudoknots allowed by simultaneous assignment of X and Y Constraint 3: X and Y are knot-free on their own Constraint 2: X and Y cannot assign the same pair

16 16 / XX Lynx RNA structure constraints Bit-vectors X and Y (length n 2 ), indicates two independent configurations of “knot- free” (i,j) pairings Crossing-pseudoknots allowed by simultaneous assignment of X and Y Constraint 4: Only permit pseudoknots will well characterized biophysical energetics (exclusion of constraint would require construction of novel energy function)

17 17 / XX SAT Solvers and RNA representations A Case for a Programmatic SAT Solver A SAT-based solution would be ideal given the constraint representation given above However, constraint-size is n^6 where n is the length of the RNA primary structure The naïve representation is too large We want to use SAT but avoid naïve representation and cost We want to give user to experiment with different secondary structure models and heuristics

18 18 / XX An Effective Solution to the Black-box Problem Programmatic Solvers SAT SOLVER USER CODE Input Formula Result Key Idea:  Expose solver internals to user through a callback programmatic API  User writes code for the API to influence solver behavior  Users gain control

19 19 / XX How does the Programmatic Solver Work? Energetic Constraints are Input, Structural ones are Code SAT SOLVER Structural Constraints (N^6) Energetic Constraints Result Structural constraints can grow to O(N^6) where N is length of RNA Few solvers can deal with such large sizes when N is 100 or more Incrementally adding constraints in inner-loop gives fine-grained control of search

20 20 / XX An Effective Solution to the Black-box Problem Programmatic Solvers SAT SOLVER Structural Constraints (N^6) Energetic Constraints Result User code examines the trail in the solver at regular intervals If the assignment violates a structural constraint in the user code then add clause to block the bad assignment using blocking clause Detect early, block bad assignment quickly. Far more efficient than outer-loop incrementality

21 21 / XX RNA prediction results

22 22 / XX An Effective Solution to the Black-box Problem Advantages of Programmatic Solvers Memory savings if the simple SAT representation of the problem is large (n^6 for RNA) Time savings since bad assignments are detected in the inner loop of the SAT solver Domain-specific heuristics and user control SAT SOLVER Structural Constraints (N^6) Energetic Constraints Result

23 23 / XX Related Work Incrementality and DPLL(T) Incrementality Stuckey et al. (2007) Extensible solvers Abstraction-refinement in model-checking and SMT DPLL(T) Closest related work (Tinelli, Neiuwenheus, Oliveras 06) Programmatic solvers are for the lay users Rich theory sub-solvers (more powerful, but more work) Dynamic Programming approach Zuker (1981), PKNOTS, HOTKNOTS, Vienna RNA Locks you into a set of modeling assumptions, unlike SAT Have to make simplifying assumptions, otherwise NP-complete

24 24 / XX Conclusions The Power of Programmatic Solvers Benefits of Programmatic API Flexible: easy for lay users Adaptive: domain-specific sub-solvers Performance: Improve memory usage and time Possible other programmatic API choices Heuristics: branching heuristics Adaptive Strategies: search and restart User-controlled Portfolios: Parallel SAT with diff. heuristics

25 25 / XX More complex structural representations

26 26 / XX Dynamic programming  Dynamic programming example:  Question: Given sequence: Find minimum energy structure  Assume conformations space: set of up or down interaction pairs E.g.  Energy: sum of interaction pairs: E( )=E( )+E( )

27 27 / XX E( )=E( )+E( ) E.g. additive energy over pairs: Minimum E() of all combinations? E=… min() Dynamic programming

28 28 / XX min() E( )=E( )+E( ) E.g. additive energy over pairs: Minimum E() of all combinations? E=… min() Memoized min() min(, ) Memoized min(, ) Dynamic programming


Download ppt "Lynx: A Programmatic SAT Solver for the RNA-folding Problem Vijay Ganesh, Charles W. O’Donnell, Mate Soos, Srinivas Devadas, Martin C. Rinard, and Armando."

Similar presentations


Ads by Google