# Combinatorial Problems I: Finding Solutions Ashish Sabharwal Cornell University March 3, 2008 2nd Asian-Pacific School on Statistical Physics and Interdisciplinary.

## Presentation on theme: "Combinatorial Problems I: Finding Solutions Ashish Sabharwal Cornell University March 3, 2008 2nd Asian-Pacific School on Statistical Physics and Interdisciplinary."— Presentation transcript:

Combinatorial Problems I: Finding Solutions Ashish Sabharwal Cornell University March 3, 2008 2nd Asian-Pacific School on Statistical Physics and Interdisciplinary Applications KITPC/ITP-CAS, Beijing, China

Computer Science Mathematics Operations Research Physics Cognitive Science Economics Cross-fertilization of ideas for the study and design of Intelligent Systems Phase transition Engineering Research part of Cornells Intelligent Information Systems Institute (IISI) Director: Carla Gomes

3 Combinatorial Problems Examples Routing: Given a partially connected network on N nodes, find the shortest path between X and Y Traveling Salesperson Problem (TSP): Given a partially connected network on N nodes, find a path that visits every node of the network exactly once [much harder!!] Scheduling: Given N tasks with earliest start times, completion deadlines, and set of M machines on which they can execute, schedule them so that they all finish by their deadlines

4 Problem Instance, Algorithm Specific instantiation of the problem E.g. three instances for the routing problem with N=8 nodes: Objective: a single, generic algorithm for the problem that can solve any instance of that problem A sequence of steps, a recipe

5 Measuring the Effectiveness of Algorithms Capture scaling with input size N, rather than runtime on specific instances The most common notion in Computer Science is worst-case complexity: What is the longest time (or number of steps) the algorithm might take on any input of size N? Perhaps only N steps, 100 N+5 Nlinear time, O(N) Maybe N 2 steps, or N 2 + 4 N + 6quadratic,O(N 2 ) Maybe N 3 + 1000 log Ncubic, O(N 3 ) ……… Maybe 2 N, or 2 N + N 1000 exponential, O(2 N )

6 Polynomial vs. Exponential Complexity exponential polynomial Polynomial time: tractable, can hope to solve very large problems with enough computing power E.g. known routing / shortest path algorithms [O(N 3 )] Exponential time: quickly run into scalability issues as N increases E.g. best known algorithms for TSP

Are some problems inherently harder than others? A large amount of work on answering this question: computational complexity theory

8 P NP P^#P PSPACE NP-complete: SAT, scheduling, graph coloring, puzzles, … PSPACE-complete: QBF, adversarial planning, chess (bounded), … EXP-complete: games like Go, … P-complete: circuit-value, … Note: widely believed hierarchy; know PEXP for sure In P: sorting, shortest path, … Computational Complexity Hierarchy Easy Hard PH EXP #P-complete/hard: #SAT, sampling, probabilistic inference, …

9 NP-Completeness P : class of problems for which a solution can be found in poly time e.g. can find a shortest path in poly time NP: class of problems for which a solution can be verified in poly time e.g. cant find a TSP solution in poly time (as far as we know) but, given a candidate solution (a witness) can verify the correctness of the witness in poly time N: non-deterministic, with the power of guessing P: polynomial time NP-complete: the hardest problems within NP

10 One of the biggest discoveries in Computer Science: All NP-complete problems are equally hard! [worst-case complexity] An algorithm for any one NP-complete problem can be used to solve any other NP-complete problem with only a polynomial overhead! There are catalogues of 10,000s of such problems e.g. Boolean satisfiability or SAT, TSP, scheduling, (bounded) planning, chip verification, 0-1 integer programming, graph coloring, logical inference, … [Similarly for PSPACE-complete, #P-complete, etc.] NP-Completeness

Can one design a single algorithm that can efficiently solve thousands of different problems of interest?

12 The Quest for Machine Reasoning A cornerstone of Artificial Intelligence Objective: Develop foundations and technology to enable effective, practical, large-scale automated reasoning. Computational complexity of reasoning appears to severely limit real-world applications Current reasoning technology Revisiting the challenge: Significant progress with new ideas / tools for dealing with complexity (scale-up), uncertainty, and multi-agent reasoning Machine Reasoning (1960-90s)

13 General Automated Reasoning General Inference Engine Solution Domain-specific Problem instance applicable to all domains within range of modeling language Model Generator (Encoder) Research objective Better reasoning and modeling technology Impact Faster solutions in several domains e.g. logistics, chess, planning, scheduling,... Generic

14 EXPONENTIAL COMPLEXITY: INHERENT A N worst case N= No. of Variables/Objects A= Object states TIME/SPACE Granularity Object states Current implementations trade time with soundness Question: Given: X1= true; X2 = false; X7=true. What is X4 = ? Answer Development: Inference Chain Step 1: X7 X8 (rule 4) Step 2: X8 X5 (rule 6) Step 3: X5 X3 or X6 (rule 3) Case A: X6 = true Step 4: X6 not X9 Step 5: X9 not X8 Step 6: Contradiction Backtrack to M Case B: X3 = true X1 & (not X2) & X3 X4 Step 7: X4 = true (Rule 1) M Search for rules to apply Check Contradictions For N variables: 2 N cases drive complexity! Simple Example: Variables (binary) X1 = email_ received X2 = in_ meeting X3 = urgent X4 = respond_to_email X5 = near_deadline X6 = postpone X7 = air_ticket_info_request X8 = travel_ request X9 = info_request Rules: 1.X1 & (not X2) & X3 X4 2.X2 not X4 3.X5 X3 or X6 4. X7 X8 5.X8 X9 6.X8 X5 7.X6 not X9 Knowledge Base Reasoning Complexity

15 Exponential Complexity Growth: The Challenge of Complex Domains 100 200 10K 50K 20K 100K 0.5M 1M 5M Variables 10 30 10 301,020 10 150,500 10 6020 10 3010 Case complexity Car repair diagnosis Deep space mission control Chess (20 steps deep) VLSI Verification War Gaming 100K 450K Military Logistics Seconds until heat death of sun Protein folding Calculation (petaflop-year) No. of atoms on the earth 10 47 10010K20K100K1M Rules (Constraints) Exponential Complexity Note: rough estimates, for propositional reasoning [Credit: Kumar, DARPA; Cited in Computer World magazine]

16 Focus: Combinatorial Search Spaces Specifically, the Boolean satisfiability problem, SAT Significant progress since the 1990s. How much? Problem size: We went from 100 variables, 200 constraints (early 90s) to 1,000,000 vars. and 5,000,000 constraints in 15 years. Search space: from 10^15 to 10^300,000. [Aside: one can encode quite a bit in 1M variables.] Tools: 50+ competitive SAT solvers available Overview of the state of the art: Plenary talk at IJCAI-05 (Selman); Discrete App. Math. article (Kautz-Selman 06) Progress in Last 15 Years

17 How Large are the Problems? A bounded model checking problem:

18 i.e., ((not x 1 ) or x 7 ) ((not x 1 ) or x 6 ) etc. x 1, x 2, x 3, etc. are our Boolean variables (to be set to True or False) Should x 1 be set to False?? SAT Encoding (automatically generated from problem specification)

19 i.e., (x 177 or x 169 or x 161 or x 153 … x 33 or x 25 or x 17 or x 9 or x 1 or (not x 185 )) clauses / constraints are getting more interesting… … Note x 1 … 10 Pages Later:

20 … 4,000 Pages Later:

21 Current SAT solvers solve this instance in under 30 seconds! Search space of truth assignments: Finally, 15,000 Pages Later:

22 SAT Solver Progress Source: Marques-Silva 2002 Solvers have continually improved over time

23 How do SAT Solvers Keep Improving? From academically interesting to practically relevant. We now have regular SAT solver competitions. (Germany 89, Dimacs 93, China 96, SAT-02, SAT-03, …, SAT-07) E.g. at SAT-2006 (Seattle, Aug 06): 35+ solvers submitted, most of them open source 500+ industrial benchmarks 50,000+ benchmark instances available on the www This constant improvement in SAT solvers is the key to making, e.g., SAT-based planning very successful.

24 Current Automated Reasoning Tools Most-successful fully automated methods: based on Boolean Satisfiability (SAT) / Propositional Reasoning – Problems modeled as rules / constraints over Boolean variables – SAT solver used as the inference engine Applications: single-agent search AI planning SATPLAN-06, fastest optimal planner; ICAPS-06 competition (Kautz & Selman 06) Verification – hardware and software Major groups at Intel, IBM, Microsoft, and universities such as CMU, Cornell, and Princeton. SAT has become the dominant technology. Many other domains: Test pattern generation, Scheduling, Optimal Control, Protocol Design, Routers, Multi-agent systems, E-Commerce (E-auctions and electronic trading agents), etc.

25 Recall: General Automated Reasoning General Inference Engine Solution Domain-specific Problem instance applicable to all domains within range of modeling language Model Generator (Encoder) Research objective Better reasoning and modeling technology Impact Faster solutions in several domains e.g. logistics, chess, planning, scheduling,... Generic

26 Automated Reasoning with SAT A simple but useful modeling language: Boolean formulas Corresponding inference engine: Satisfiability or SAT algorithm (e.g. complete search, local search, message passing) Numerous applications: hardware and software verification, planning, scheduling, e-commerce, circuit design, open problems in algebra, …

27 Boolean Logic Defined over Boolean (binary) variables a, b, c, … Each of these can be True (1, T) or False (0, F) Variables connected together with logic operators: and, or, not (denoted ) E.g. ((c d) f) is True iff either c is True and d is False, or f is True Fact: All other Boolean logic operators can be expressed with and, or, not E.g. (a b) same as ( a or b) Boolean formula, e.g. F = (a or b) and (a and (b or c)) (Truth) Assignment: any setting of the variables to True or False Satisfying assignment: assignment where the formula evaluates to True E.g. F has 3 satisfying assignments: (0,1,0), (0,1,1), (1,0,0)

28 Boolean Logic: Example F = (a or b) and (a and (b or c)) Note: True often written as 1, False as 0 There are 2 3 = 8 possible truth assignments to a, b, c –(a=0,b=1,c=0) representing (a=False, b=True, c=False) –(a=0,b=0,c=1) –… Truth Table for F abcF 0000 0010 0101 0111 1001 1010 1100 1110 Exactly 3 truth assignments satisfy F –(a=0,b=1,c=0) –(a=0,b=1,c=1) –(a=1,b=0,c=0)

29 Rules: 1.X1 & (not X2) & X3 X4 2.X2 not X4 3.X5 X3 or X6 4. X7 X8 5.X8 X9 6.X8 X5 7.X6 not X9 Variables X1 = email_ received X2 = in_ meeting X3 = urgent X4 = respond_to_email X5 = near_deadline X6 = postpone X7 = air_ticket_info_request X8 = travel_ request X9 = info_request Boolean Logic: Expressivity All discrete single-agent search problems can be cast as a Boolean formula Variables a, b, c, … often represent states of the system, events, actions, etc. (more on this later, using Planning as an example) Very general encoding language. E.g. can handle Numbers (k-bit binary representation) Floating-point numbers Arithmetic operators like +, x, exp(), log() … SAT encodings (generated automatically from high level languages) routinely used in domains like planning, scheduling, verification, e-commerce, network design, … Recall Example: state action constraint event

30 Boolean Logic: Standard Representations Each problem constraint typically specified as (a set of) clauses: E.g. (a or b), (c or d or f), ( a or c or d), … Formula in conjunctive normal form, or CNF: a conjunction of clauses E.g. F = (a or b) and (a and (b or c)) changes to F CNF = (a or b) and ( a or b) and (b or c) Alternative [useful for QBF]: specify each constraint as a term (only and, not): E.g. (a and d), (b and a and f), ( b and d and e), … Formula in disjunctive normal form, or DNF: a disjunction of terms E.g. F DNF = ( a and b) or (a and b and c) clauses (only or, not)

31 Boolean Satisfiability Testing A wide range of applications Relatively easy to test for small formulas (e.g. with a Truth Table) However, very quickly becomes hard to solve –Search space grows exponentially with formula size (more on this next) SAT technology has been very successful in taming this exponential blow up! The Boolean Satisfiability Problem, or SAT: Given a Boolean formula F, find a satisfying assignment for F or prove that no such assignment exists.

32 SAT Search Space SAT Problem: Find a path to a True leaf node. For N Boolean variables, the raw search space is of size 2 N Grows very quickly with N Brute-force exhaustive search unrealistic without efficient heuristics, etc. All vars free Fix one variable to True or False Fix another var Fix a 3 rd var True False Fix a 4 th var

33 SAT Solution A solution to a SAT problem can be seen as a path in the search tree that leads to the formula evaluating to True at the leaf. Goal: Find such a path efficiently out of the exponentially many paths. [Note: this is a 4 variable example. Imagine a tree for 1,000,000 variables!] All vars free True False Fix another var Fix a 3 rd var Fix a 4 th var Fix one variable to True or False

34 k-CNF, 3-CNF k-CNF: all clauses have k literals 1-CNF SAT: trivial 2-CNF SAT: solvable in O(N 2 ) time [N = num. of variables] 3-CNF SAT: NP-complete 4-CNF SAT: NP-complete … Note: Any Boolean formula can be converted into CNF. -- with or without extra variables (without size increase)

35 Worst-Case Complexity SAT is an NP-complete problem Worst-case believed to be exponential (roughly 2 N for N variables) 10,000+ problems in CS are NP- complete (e.g. planning, scheduling, protein folding, reasoning) P vs. NP --- \$1M Clay Prize However, real-world instances are usually not pathological and can often be solved very quickly with the latest technology! Typical-case complexity provides a more detailed understanding and a more positive picture. exponential polynomial

36 Exponential Complexity Growth Planning (single-agent) : find the right sequence of actions HARD: 10 actions, 10! = 3 x 10 6 possible plans REALLY HARD: 10 x 9 2 x 8 4 x 7 8 x … x 2 256 = 10 224 possible contingency plans! Contingency planning (multi-agent) : actions may or may not produce the desired effect! exponential polynomial … 1 out of 10 2 out of 9 4 out of 8

37 Typical-Case Complexity A key hardness parameter for k-SAT: the ratio of clauses to variables Add Constraints Delete Constraints Problems that are not critically constrained tend to be much easier in practice than the relatively few critically constrained ones [Mitchell, Selman, and Levesque 92; Kirkpatrick and Selman – Science 94]

38 Typical-Case Complexity Random 3-SAT as of 2004 Random Walk DP Walksat SP Linear time algs. GSAT Phase transition SAT solvers continually getting close to tackling problems in the hardest region! SP (survey propagation) now handles 1,000,000 variables very near the phase transition region

39 Tractable Sub-Structure Can Dominate and Drastically Reduce Solution Cost! 2+p-SAT model: mix 2-SAT (tractable) and 3-SAT (intractable) clauses > 40% 3-SAT: exponential scaling 40% 3-SAT: linear scaling! (Monasson, Selman et al. – Nature 99; Achlioptas 00) Number of variables Median runtime

How are other NP-complete problems translated into SAT instances? SAT encoding

41 SAT Encoding Example: Planning Domain Planning Problem Propositional CNF formula by axiom schemas Logistics planning: think of a number of trucks and planes that need to transport a bunch of packages from their origin to their destination Discrete time, modeled by integers state predicates: indexed by time at which they hold E.g. at_location(x,,loc,i), free(x,i+1), route(cityA,cityB,i) action predicates: indexed by time at which action begins E.g. fly(cityA,cityB,i), pickup(x,loc,i), drive_truck(loc1,loc2,i) –each action takes 1 time step –many actions may occur at the same step

42 Encoding Rules Actions imply preconditions and effects fly(x,y,i) at(x,i) and route(x,y,i) and at(y,i+1) Conflicting actions cannot occur at same time (A deletes a precondition of B) fly(x,y,i) and y z not fly(x,z,i) If something changes, an action must have caused it (Explanatory Frame Axioms) at(x,i) and not at(x,i+1) y. route(x,y) and fly(x,y,i) Initial and final states hold at(NY,0) and... and at(LA,9) and...

43 Using SAT Solvers for Planning axiom schemas instantiated propositional clauses satisfying model plan mapping length Problem description in high level language SAT engine(s) instantiate interpret Modeling and Solving a Planning Problem (fully automatic) (manual)

44 Planning Benchmark Complexity Logistics domain – a complex, highly-parallel transportation domain E.g. logistics.d problem: o2,165 possible actions per time slot ooptimal solution contains 74 distinct actions over 14 time slots (out of 5 x 10^46 possible sequential plans of length 14) Satplan [Selman et al.] approach is currently fastest optimal planning approach. Winner ICAPS-05 & ICAPS-06 international planning competitions.

Solution Approaches to SAT

46 Solving SAT: Systematic Search One possibility: enumerate all truth assignments one-by-one, test whether any satisfies F –Note: testing is easy! –But too many truth assignments (e.g. for N=1000 variables, have 2 1000 10 300 truth assignments) 00000000 00000001 00000010 00000011 …… 11111111 2N2N

47 Solving SAT: Systematic Search Smarter approach: the DPLL procedure [1960s] (Davis, Putnam, Logemann, Loveland) 1.Assign values to variables one at a time (partial assignments) 2.Simplify F 3.If contradiction (i.e. some clause becomes False), backtrack, flip last unflipped variables value, and continue search Extended with many new techniques -- 100s of research papers, yearly conference on SAT e.g., extremely efficient data-structures (representation), randomization, restarts, learning reasons of failure Provides proof of unsatisfiability if F is unsat. [complete method] Forms the basis of dozens of very effective SAT solvers! e.g. minisat, zchaff, relsat, rsat, … (open source, available on the www)

48 Solving SAT: Local Search Search space: all 2 N truth assignments for F Goal: starting from an initial truth assignment A 0, compute assignments A 1, A 2, …, A s such that A s is a satisfying assignment for F A i+1 is computed by a local transformation to A i e.g. A 1 = 000110111green bit flips to red bit A 2 = 001110111 A 3 = 001110101 A 4 = 101110101 … … A s = 111010000 solution found! No proof of unsatisfiability if F is unsat. [incomplete method] Several SAT solvers based on this approach, e.g. Walksat

49 Solving SAT: Decimation Search space: all 2 N truth assignments for F Goal: attempt to construct a solution in one-shot by very carefully setting one variable at a time Survey Inspired Decimation: –Estimate certain marginal probabilities of each variable being True, False, or undecided in each solution cluster using Survey Propagation –Fix the variable that is the most biased to its preferred value –Simplify F and repeat A method rarely used by computer scientists But has received tremendous success from the physics community on random k-SAT; can easily solve random instances with 1M+ variables! No searching for solution No proof of unsatisfiability [incomplete method]

50 The Next Two Lectures Problems beyond SAT / searching for a single solution #P-complete: count the number of solutions of a SAT instance #P-hard: sample a solution uniformly at random for a SAT instance PSPACE-complete: quantified Boolean formula (QBF)

Thank you for attending! Slides: http://www.cs.cornell.edu/~sabhar/tutorials/kitpc08-combinatorial-problems-I.ppt Ashish Sabharwal : http://www.cs.cornell.edu/~sabhar http://www.cs.cornell.edu/~sabhar Bart Selman : http://www.cs.cornell.edu/selman http://www.cs.cornell.edu/selman

Similar presentations