Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh.

Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh and Doina Precup

It’s Hard to Build Large AI Systems Brittleness Unforeseen interactions Scaling Requires too much manual complexity management –people must understand, intervene, patch and tune –like programming Need more autonomy –learning, verification –internal coherence of knowledge and experience

Marr’s Three Levels of Understanding Marr proposed three levels at which any information-processing machine must be understood –Computational Theory Level What is computed and why –Representation and Algorithm Level –Hardware Implementation Level We have little computational theory for Intelligence –Many methods for knowledge representation, but no theory of knowledge –No clear problem definition –Logic

Reinforcement Learning provides a little Computational Theory Policies (controllers)  : States  Pr(Actions) Value Functions 1-Step Models

Outline of Talk Experience Knowledge  Prediction Macro-Predictions Mental Simulation offering a coherent candidate computational theory of intelligence

Experience AI agent should be embedded in an ongoing interaction with a world Agent World actions observations Experience = these 2 time series Enables clear definition of the AI problem –Let {reward } be function of {observation } –Choose actions to maximize total reward Experience provides something for knowledge to be about tt cf. textbook definitions

What is Knowledge? Deny the physical world Deny existence of objects, people, space… Deny all non-answers, correspondence theories All we really know about is our experience Knowledge must be in terms of experience

Grounded Knowledge A is always followed by B A,B observations if = A then = B if A( ) then B( ) A,B predicates if A( ) then B( ) Action conditioning: if A( ) and C( ) then B( ) All of these are predictions

The world is a black box, known only by its I/O behavior (observations in response to actions) Therefore, all meaningful statements about the world are statements about the observations it generates The only observations worth talking about are future ones The only meaningful things to say about the world are predictions World Knowledge  Predictions Therefore:

Non-predictive “Knowledge” Mathematical knowledge, theorems and proofs –always true, but tell us nothing about the world –not world knowledge Uninterpretted signals, e.g., useful representations –real and useful, but not by themselves world knowledge, only an aid to acquiring it Knowledge of the past Policies –could be viewed as predictions of value –but by themselves are more like uninterpretted signals Predictions capture “regular”, descriptive world knowledge

Grounded Knowledge A is always followed by B A,B observations if = A then = B if A( ) then B( ) A,B predicates if A( ) then B( ) Action conditioning: if A( ) and C( ) then B( ) 1-step preds. Still a pretty limited kind of knowledge. Can’t say anything beyond one step!

Grounded Knowledge A is always followed by B A,B observations if = A then = B if A( ) then B( ) A,B predicates if A( ) then B( ) Action conditioning: if A( ) and C( ) then B( ) if A( ) and then B( ) many steps later prior grounding many steps long 1-step preds. macro- pred. posterior grounding

Both Prior and Posterior Grounding are Needed “Classical” AI systems omit prior grounding –e.g., “Tweety is a bird”, “John loves Mary” –sometimes called the “symbol grounding problem” Modern AI sytems tend to skimp the posterior –supervised learning, Bayes nets, robotics… It is not OK to leave posterior grounding to external, human observers –the information is just not in the machine –we don’t understand it; we haven’t done our job! Yet this is such an appealing shortcut that we have almost always done it

Outline of Talk Experience Knowledge  Prediction Macro-Predictions Mental Simulation offering a coherent candidate computational theory of intelligence

Macro-Predictions (Options) a la Sutton, Precup & Singh, 1999 et al. Let  : States  Pr(Actions) be an arbitrary policy Let  : States  Pr({0,1}) be a termination condition Then is a kind of experiment – do  until  =1 – measure something about the resulting experience Suppose we measure the outcome: – the state at the end of the experiment – the total reward during the experiment Then the macro-prediction for would predict Pr(end-state), E{total reward} given start-state This is a very general, expressive form of prediction

Rooms Example Policy of one option: Sutton, Precup, & Singh, 1999

Planning with Macro-Predictions

Learning Path-to-Goal with and without Hallway Macros (Options)

Mental Simulation Knowledge can be gained from experience –by actually performing experiments But knowledge can also be gained without overt experience –we call this thinking, reasoning, planning, cognition… This can be done through “thought experiments” –internal simulation of experience –generated from predictive knowledge –subject to learning methods as before Much thought can be achieved this way...

Illustration: Dynamic Mission Planning for UAVs 30 40 50 60 Static Replanner RL planning w/strategies RL planning w/strategies and real-time control Low Fuel High Fuel Mission: Fly over (observe) most valuable sites and return to base Stochastic weather affects observability (cloudy or clear) of sites Limited fuel Intractable with classical optimal control methods Temporal scales: –Tactics: which way to fly now –Strategies: which site to head for Strategies compress space and time –Reduce no. states from ~10 11 to ~10 6 –Reduce tour length from ~600 to ~6 Reinforcement Learning with strategies and real-time control outperforms optimal tour planner that assumes static weather ? Reward=25 15 8 10 5 Barto, Sutton, and Moll, Adaptive Networks Laboratory, University of Massachusetts Expected Reward/ Mission Base

What to compute and Why Reward Value Functions Knowledge/ Predictions Policy The ultimate goal is reward, but our AI spends most of its time with knowledge

A Candidate Computational Theory of Artificial Intelligence AI Agent should be focused on finding general macro-predictions of experience Especially seeking predictions that enable rapid computation of values and optimal actions Predictions and their associated experiments are the coin of the realm –they have a clear semantics, can be tested & learned –can be combined to produce other predictions, e.g. values Mental Simulation (plus learning) –makes new predictions from old –start of a computational theory of knowledge use

Conclusions World knowledge must be expressed in terms of the data Such posterior grounding is challenging, –lose expressiveness in the short term –lose external (human) coherence, explainability But can be done step by step, And brings palpable benefits –autonomous learning/verification/extension of knowledge –autonomous complexity management due to internal coherence –knowledge suited to general reasoning process – mental simulation We must provide this grounding!

Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh.

Similar presentations

Presentation on theme: "Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh.

Similar presentations

Presentation on theme: "Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh."— Presentation transcript:

Similar presentations

About project

Feedback