Learning to Support Constraint Programmers Susan L. Epstein 1 Gene Freuder 2 and Rick Wallace 2 1 Department of Computer Science Hunter College and The.

Presentation on theme: "Learning to Support Constraint Programmers Susan L. Epstein 1 Gene Freuder 2 and Rick Wallace 2 1 Department of Computer Science Hunter College and The."— Presentation transcript:

Learning to Support Constraint Programmers Susan L. Epstein 1 Gene Freuder 2 and Rick Wallace 2 1 Department of Computer Science Hunter College and The Graduate Center of The City University of New York 2 Cork Constraint Computation Centre

Facts about ACE l Learns to solve constraint satisfaction problems l Learns search heuristics l Can transfer what it learns on simple problems to solve more difficult ones l Can export knowledge to ordinary constraint solvers l Both a learner and a test bed l Heuristic but complete: will find a solution, eventually, if one exists l Guarantees high-quality, not optimal, solutions l Begins with substantial domain knowledge

Outline l The task: constraint satisfaction l Performance results l Reasoning mechanism l Learning l Representations

l Constraint satisfaction problem l Solution: assign a value to every variable consistent with constraints l Many real-world problems can be represented and solved this way (design and configuration, planning and scheduling, diagnosis and testing) The Problem Space Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} Constraints A = B A > D C D Variables A, B, C, D BA CD (1 1) (2 2) (2 1) (3 1) (3 2) (1 3) (2 1) (2 3)

A Challenging Domain l Constraint solving is NP-hard l Problem class parameters: n = number of variables k = maximum domain size d = edge density (% of possible constraints) t = tightness (% of value pairs excluded) l Complexity peak: values for d and t that make problems hardest l Heavy-tailed distribution difficulty [Gomes et al., 2002] l Problem may have multiple or no solutions l Unexplored choices may be good

Finding a Path to a Solution l Sequence of decision pairs (select variable, assign value) l Optimal length: 2n for n variables l For n variables with domain size d, there are (d+1) n possible states Select a variable Assign a value Solution

B D=3 No C=2 A=2 … Solution Method Search from initial state to goal Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} No D D=1 No D D=1D=3 No Constraints A = B A > D C D BA CD (1 1) (2 2) (2 1) (3 1) (3 2) (1 3) (2 1) (2 3) B=1 A CD A A=1 CD C C=1 D

Consistency Maintenance l Some values may initially be inconsistent l Value assignment can restrict domains B=2 … A {1,2} C {1,2} D {1,3} No C {1,2} D No other possibilities Constraints A = B A > D C D B B=1 A A=1 Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} BA CD (1 1) (2 2) (2 1) (3 1) (3 2) (1 3) (2 1) (2 3)

When an inconsistency arises, a retraction method removes a value and returns to an earlier state Retraction Here! B=2 … A {1,2} C {1,2} D {1,3} No! C {1,2} D B B=1 A A=1 BA CD (1 1) (2 2) (2 1) (3 1) (3 2) (1 3) (2 1) (2 3) Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} Wheres the error?

… A=2 B {1,2} C {1,2} D {1,2} Variable Ordering l A good variable ordering can speed search A A=1 Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} B {1,2} C {1,2} D No BA CD (1 1) (2 2) (2 1) (3 1) (3 2) (1 3) (2 1) (2 3)

Value Ordering A good value ordering can speed search too A A=2 Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} B {1,2} C {1,2} D {1,3} D D=1 B {1,2} C {1,2} B B=2 C C=2 C {1,2} Solution: A=2, B=2, C=2, D=1 BA CD (1 1) (2 2) (2 1) (3 1) (3 2) (1 3) (2 1) (2 3)

Constraint Solvers Know… l Several consistency methods l Several retraction methods l Many variable ordering heuristics l Many value ordering heuristics … but the interactions among them are not well understood, nor is one combination best for all problem classes.

Goals of the ACE Project l Characterize problem classes l Learn to solve classes of problems well l Evaluate mixtures of known heuristics l Develop new heuristics l Explore the role of planning in solution

Outline l The task: constraint satisfaction l Performance results ACE l Reasoning mechanism l Learning l Representation

Experimental Design l Specify problem class, consistency and retraction methods l Average performance across 10 runs l Learn on L problems (halt at 10,000 steps) l To-completion testing on T new problems l During testing, use only heuristics judged accurate during learning l Evaluate performance on l Steps to solution l Constraint checks l Retractions l Elapsed time

ACE Learns to Solve Hard Problems l near the complexity peak l Learn on 80 problems l 10 runs, binned in sets of 10 learning problems l Discards 26 of 38 heuristics l Outperforms MinDomain, an off-the-shelf heuristic Steps to solution 2500 1500 1000 500 2000 1 2 3 4 5 6 7 8 Bin # Means in blue, medians in red

ACE Rediscovers Brélaz Heuristic l Graph coloring: assign different colors to adjacent nodes. l Graph coloring is a kind of constraint satisfaction problem. l Brélaz: Minimize dynamic domain, break ties with maximum forward degree. l ACE learned this consistently on different classes of graph coloring problems. [Epstein & Freuder, 2001] Color each vertex red, blue, or green so pair of adjacent vertices are different colors.

ACE Discovers a New Heuristic l Maximize the product of degree and forward degree at the top of the search tree l Exported to several traditional approaches: Min Domain Min Domain/Degree Min Domain + degree preorder l Learned on small problems but tested in 10 runs on n = 150, domain size 5, density.05, tightness.24 l Reduced search tree size by 25% – 96% [Epstein, Freuder, Wallace, Morozov, & Samuels 2002]

Outline l The task: constraint satisfaction l Performance results l Reasoning mechanism l Learning l Representation

Constraint-Solving Heuristic l Uses domain knowledge l What problem classes does it work well on? l Is it valid throughout a single solution? l Can its dual also be valid? l How can heuristics be combined? … and where do new heuristics come from?

FORR (For the Right Reasons) l General architecture for learning and problem solving l Multiple learning methods, multiple representations, multiple decision rationales l Specialized by domain knowledge l Learns useful knowledge to support reasoning l Specify whether a rationale is correct or heuristic l Learns to combine rationales to improve problem solving [Epstein 1992]

An Advisor Implements a Rationale l Class-independent action-selection rationale l Supports or opposes actions by comments l Expresses opinion direction by strengths l Limitedly-rational procedure current problem state Advisor actions

Advisor Categories l Tier 1: rationales that correctly select a single action l Tier 2: rationales produce a set of actions directed to a subgoal l Tier 3: heuristic rationales that select a single action

Choosing an Action take action yes Tier 1: Reaction from perfect knowledge VictoryT-11T-1n … Decision? begin plan yes no Tier 3: Heuristic reactions T-31T-32T-3m … … Voting take action Tier 2: Planning triggered by situation recognition no P-1P-2P-k … Decision? Current state Possible actions

ACEs Domain Knowledge l Consistency maintenance methods: forward checking, arc consistency l Backtracking methods: chronological l 21 variable ordering heuristics l 19 value ordering heuristics l 3 languages whose expressions have interpretations as heuristics l Graph theory knowledge, e.g., connected, acyclic l Constraint solving knowledge, e.g., only one arc consistency pass is required on a tree

An Overview of ACE l The task: constraint satisfaction l Performance results ACE l Reasoning mechanism l Learning l Representation

What ACE Learns l Weighted linear combination for comment strengths l For voting in tier 3 only l Includes only valuable heuristics l Indicates relative accuracy of valuable heuristics l New, learned heuristics l How to restructure tier 3 l When random choice is the right thing to do l Acquire knowledge that supports heuristics (e.g., typical solution path length)

l Learn from trace of each solved problem l Reward decisions on perfect solution path l Shorter paths reward variable ordering l Longer paths reward value ordering l Blame digression-producing decisions in proportion to error l Valuable Advisors weight > baselines Digression-based Weight Learning Select a variable Assign a value Solution digression error

Learning New Advisors l Advisor grammar on pairs of concerns l Maximize or minimize l Product or quotient l Stage l Monitor all expressions l Use good ones collectively l Use best ones individually

Outline l The task: constraint satisfaction l Performance results ACE l Reasoning mechanism l Learning l Representation

No Yes Representation of Experience l State describes variables and value assignments, impossible future values, prior state, connected components, constraint checks incurred, dynamic edges, trees l History of successful decisions l … plus other significant decisions become training examples Is Can beCannot be A12 B 2 C1,2 D1,3 Checks incurred: 4 1 acyclic component: A,C,D Dynamic edges: AD, CD

Representation of Learned Knowledge l Weights for Advisors l Solution size distribution l Latest error: greatest number of variables bound at retraction

ACEs Status Report l 41 Advisors in tiers 1 and 3 l 3 languages in which to express additional Advisors l 5 experimental planners l Problem classes: random, coloring, geometric, logic, n-queens, small world, and quasigroup (with and without holes) l Learns to solve hard problems l Learns new heuristics l Transfers to harder problems l Divides and conquers problems l Learns when not to reason

Current ACE Research l Further weight-learning refinements l Learn appropriate restart parameters l More problem classes, consistency methods, retraction methods, planners, and Advisor languages l Learn appropriate consistency checking methods l Learn appropriate backtracking methods l Learn to bias initial weights l Metaheuristics to reformulate the architecture l Modeling strategies … and, coming soon, ACE on the Web

Acknowledgements Continued thanks for their ideas and efforts go to: Diarmuid Grimes Mark Hennessey Tiziana Ligorio Anton Morozov Smiljana Petrovic Bruce Samuels Students of the FORR study group The Cork Constraint Computation Centre and, for their support, to: The National Science Foundation Science Foundation Ireland

Is ACE Reinforcement Learning? l Similarities: l Unsupervised learning through trial and error l Delayed rewards l Learns a policy l Primary differences: l Reinforcement learning learns a policy represented as the estimated values of states it has experienced repeatedly … but ACE is unlikely to revisit a state; instead it learns how to act in any state l Q-learning learns state-action preferences … but ACE learns a policy that combines action preferences

How is ACE like STAGGER? l STAGGERACE l LearnsBoolean classifier Search control preference function for a sequence of decisions in a class of problems l RepresentsWeighted booleans Weighted linear function l Supervised Yes No l New elementsFailure-drivenSuccess-driven l Initial bias Yes Under construction l Real attributes Yes No [Schlimmer 1987]

l Both learn search control from unsupervised experience, reinforce decisions on a successful path, gradually introduce new factors, specify a threshold, and transfer to harder problems, but… l SAGE.2ACE l Learns onSame task Different problems in a class l RepresentsSymbolic rulesWeighted linear function l ReinforcesRepeating rulesCorrect comments l Failure responseReviseReduce weight l Proportional to errorNo Yes l Compares statesYesNo l Random benchmarksNoYes l SubgoalsNoYes l Learns during search Yes No How is ACE like SAGE.2? [Langley 1985]

Download ppt "Learning to Support Constraint Programmers Susan L. Epstein 1 Gene Freuder 2 and Rick Wallace 2 1 Department of Computer Science Hunter College and The."

Similar presentations