Download presentation

Presentation is loading. Please wait.

Published bySkylar Liggett Modified over 4 years ago

1
Mind is About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester

2
Mind is About Predictions Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving What will I see if I go around the corner? Objects: What will I see if I turn this over? Active vision: What will I see if I look at my hand? Value functions: What is the most reward I know how to get? Such knowledge is learnable, chainable Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful

3
Philosophical and Psychological Roots Like classical british empiricism (1650–1800) –Knowledge is about experience –Experience is central But not anti-nativist (evolutionary experience) Emphasizing sequential rather than simultaneous events –Replace association/contiguity with prediction/contingency Close to Tolman’s “Expectancy Theory” (1932–1950) –Cognitive maps, vicarious trial and error Psychology struggled to make it a science (1890–1950) –Introspection –Behaviorism, operational definitions –Objectivity

4
Modern Computional View of Mind OK to talk about insides of minds Ok to talk about the function and purpose of a design We talk about Why –Why a system works –Why it should compute X and in manner Y –Why such a system should achieve purpose Z This is new, and resolves classical struggles –Servo-mechanisms, state-transition probabilities –Utility and decision theory –Information as signal – subjective (private) yet clear Purpose defines and constrains mental constructs

5
Informational View of Mind Mind does information processing Mind exchanges information with the world Only experience is known for sure –Anything more public or “objective” is suspect World is an I-O entity, a black box Although we often seem to talk about what is inside, All we can sensibly talk about is I-O behavior This “interactionist stance” seems to follow from IVoM MindWorld experience

6
Is Mind about Predictions? OR Is Mind about Action (or Policies)? Of course it is ultimately about action But action generation methods are relatively clear –Value functions and decision theory Pick action that maximizes expected cumulative reward –OR Policy gradient RL methods Execution-time search Reflexes and behavior-based robotics Learning-extended reflexes and conditioning Flexible cognition requires more than action generation Most mental activity is working with predictions

7
An old, simple, appealing idea Mind as prediction engine! Predictions are learnable, combinable They represent cause and effect, and can be pieced together to yield plans Perhaps this old idea is essentially correct. Just needs –Development, revitalization in modern forms –Greater precision, formalization, mathematics –The computational perspective to make it respectable –Imagination, determination, patience Not rushing to performance Not building in ungrounded world knowledge

8
Topics Super-Predictions Combining Predictions (reasoning and planning) Predictions and State

9
Experience 1-step Prediction stateaction XY a k-step Prediction XY In general, predictions depend on actions, on policies And there is a huge space of policies…can be closed loop The Simplest Predictions

10
Simple Mixture Predictions Where will I be in 10–20 steps? Where will I be in roughly 10 steps? now 10 steps 20 steps 10 steps Arbitrary termination profiles are possible Closed-loop termination: Terminate depending on what happens Where will I be when X happens? short term medium term long term

11
Closed-loop termination loosens the time-specificity of predictions Instead of “what will I see at t +100?” Can say “what will I see when I open the box?” Will we elect a black or a woman president first? Where will the tennis ball be when it reaches me? What time will it be when the talk starts? or “when John arrives?” “when the bus comes?” “when I get to the store?” A substantial increase in expressiveness

12
Super-Predictions Closed-loop terminations And Closed-loop policies Correspond to arbitrary experiments and the results of those experiments What will I see if I go into the next room? What time will it be when the talk is over? Is there a dollar in the wallet in my pocket? Where is my car parked? Can I throw the ball into the basket? Is this a chair situation? What will I see if I turn this object around?

13
Anatomy of a Super-Prediction 1 Predictor Recognizes the conditions, makes the prediction 2 Experiment - policy - termination condition - measurement function(s) 3 Goal A function of the anticipated measurement to be maximized by choice of policy and termination

14
Example: Open-the-door Predictor Use visual input to estimate –Probabilities of succeeding in opening the door, and of other outcomes (door locked, no handle, no real door) –expected cumulative cost (sub-par reward) in trying Experiment –Policy for walking up to the door, shaping grasp of handle, turning, pulling, and opening the door –Terminate on successful opening or various failure conditions –Measure outcome and cumulative cost Goal –Sum of expected cost and expected value of outcome –Can be used to define experiment’s policy and termination

15
RoboCup-Soccer Example Safe to pass? Predict the outcome of choosing to pass The pass will take several steps to set up – choosing to pass involves a whole action policy You may choose to not to pass half way through Terminations and outcomes: – pass is aborted – opponents touch the ball before teammate – teamate touches first, appears to control ball – ball goes out of bounds

16
Example: Pass-to-Teammate Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of –Successful pass, openness of receiver –Interception –Reception failure –Aborted pass, in trouble –Aborted pass, something better to do –Loss of time Experiment –Policy for maneuvering ball, or around ball, to set up and pass –Termination strategy for aborting, recognizing completion –Measurement of outcome, time Goal –Some combination of outcome values, time, openness of rec.

17
Topics Super-Predictions Combining Predictions (reasoning and planning) Predictions and State

18
Combining Predictions I: Composition If the mind is about predictions, Then thinking is combining predictions to produce new ones X Y Y Z X Z Here each prediction is assumed to predict A transient measurement (e.g., elapsed time, cumulative reward) A final measurement (e.g., partial distribution of outcome states) The new prediction does not necessarily have a goal

19
Combining Predictions I: Composition If the mind is about predictions, Then thinking is combining predictions to produce new ones X Y Y Z X Z 1 1 then if Y 2 2 T 1 .8T 2 Here each prediction is assumed to predict A transient measurement (e.g., elapsed time, cumulative reward) A final measurement (e.g., partial distribution of outcome states) The new prediction does not necessarily have a goal Y’.1 Y’’.1.8 Y’.1 Y’’.1.8

20
Combining Predictions II: Choice A predictor plus a goal compose to form a value function we can do all the usual planning backups with p g X Y g = 5 X Y’ g = 6 In X, for g, is a better Choice than . Store it with g.

21
Room-to-Room Super-Predictions up down rightleft (to each room's 2 hallways) Fail 33% of the time Sutton, Precup, & Singh, 1999 8 multi-step super-predictions 4 stochastic primitive actions “Options” Precup 2000 Sutton, Precup, & Singh 1999 Predict: Probability of reaching each terminal hallway Goal: minimize # steps + values for target and other outcome hallway Policy Termination hallways Target (goal) hallway

22
Planning with Super-Predictions (super-predictions)

23
Topics Super-Predictions Combining Predictions (reasoning and planning) Predictions and State

24
Predictive State Representations Hypothesis: What we normally think of as state is a set of predictions about outcomes of experiments –Wallet’s contents, John’s location, presence of objects… Problem: So far we have assumed states but really world just gives information, “observations” There are several ways to formalize this problem –Learning deterministic Finite State Automata Rivest & Schapire, 1987 –Adding stochasticity: An alternative to Hidden Markov Models Herbert Jaeger, 1999 –Adding action: An alternative to Partially Observable Markov Decision Processes Littman, Sutton, & Singh 2001

25
PSR Formalism 1 MindWorld actions observations Experience: Random variables A test is a subsequence, a simple case of an experiment if the actions are done, will the observations occur? The world is defined by the probabilities of each test from the beginning of time: and after a finite history sequence h (formally another test):

26
PSR Formalism 2 A Predictive State Representation (PSR) is a set of tests whose vector of predictions is sufficient information to predict all tests i.e., whose predictions are a sufficient statistic, a state A linear PSR is a PSR where each f t is linear

27
Walk/Reset Example Actions: Walk : Take a random step left or right, see 0 Reset: Jump to rightmost state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: 1.5.5.375.375.3125.3125… PSR tests: Reset1, Walk0Reset1 1

28
Walk/Reset Example Start on Right... Walk : Take a random step left or right, see 0 Reset: Jump to rightmost state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: 1.5.5.375.375.3125.3125… PSR tests: Reset1, Walk0Reset1 1

29
Walk/Reset Example Start on Right... After one Walk Walk step left or right, see 0 Reset: Jump to rightmost state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: 1.5.5.375.375.3125.3125… PSR tests: Reset1, Walk0Reset1.5 1

30
Walk/Reset Example Start on Right... After one Walk Walk step left or right, see 0 After two Walks state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: 1.5.5.375.375.3125.3125… PSR tests: Reset1, Walk0Reset1.25.5 1

31
PSR Results Exist compact, linear PSRs –# tests ≤ # states in minimal POMDP –# tests ≤ Rivest & Schapire’s Diversity –# tests can be exponentially fewer than diversity and POMDP Compact simulation/update process Construction algorithm from POMDP Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs There are natural EM-like algorithms (current work)

32
Constructing Linear PSRs from POMDPs Outcome vector u(t): the predictions for test t from all POMDP states. A test t is said to be independent of a set of tests T if it’s outcome vector is linearly independent of T’s o.v.s Accumulate tests whose outcome vectors are independent Search: Start with T={} While some extension aot of t T independent, add to T Else terminate, return T.

33
PSR Conclusions A path to exorcizing the assumption of state –Toward the goal of totally data- (experience-) oriented AI The predictive view of state is competitive –Even better (more compact) in some ways –States have data interpretations! –And are thus potentially more learnable, refinable Naturally leads to constructive discovery ideas –Searching for the right tests to understand the world “Tests” generalize naturally to super-predictions

34
Empiricism MindWorld actions observations Experience is the data; it is all we really know Experience should be the focus of AI But by and large it is not… even in robotics, Alife, etc. Experience is central —Knowledge is about experience

35
Mind is About Predictions Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving Such knowledge is learnable, chainable Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful Hypothesis: These ideas are newly viable Unfamiliar flexibiliy & expressiveness of “super”-predictions New engineering planning methods DP/RL/Values New state-representation ideas Hypothesis: Predictions are the Coin of the Mental Realm

36
It’s Hard to Build Large AI Systems Brittleness Unforeseen interactions Scaling Requires too much manual complexity management –people must understand, intervene, patch and tune –like programming Need more autonomy –learning, verification –internal coherence of knowledge and experience

37
AI Implications of Predictive View An alternative theory of knowledge and thought –Alternative to conventional, symbolic “language of thought” –Alternative to “database” view of knowledge Requires experiments to be in the machine, not just the designer — true grounding Automated complexity management –Should help with brittleness and scaling Could permit AI systems of much greater complexity

38
Both Predictors and Experiments must be in the Machine “Classical” AI systems omit both! –e.g., “Tweety is a bird”, “John loves Mary” –sometimes called the “symbol grounding problem” Modern AI sytems tend to skimp the experiments –supervised learning, Bayes nets, robotics… It is not OK to leave the experimental definitions to external, human observers –the information is just not in the machine –we don’t understand it; we haven’t done our job! Yet this is such an appealing shortcut that we have almost always done it

39
More Predictive Knowledge John is in the coffee room My car in is the South parking lot What we know about geography, navigation What we know about how an object looks, rotates What we know about how objects can be used Recognition strategies for objects and letters The portrait of Washington on the dollar in the wallet in my other pants in the laundry, has a mustache on it –Composing experiments creates a productive rep’n language

40
Relational, Propositional, and Deictic objects X, If I drop X, then X will be on the floor –Holding object X means predicting certain sensations if, for example, one directs one’s eyes toward one’s hand –Thus, on dropping, the predicted sensations are merely transferred from the looking-at-hand prediction to the looking-at-floor prediction –Such transfer of existing predictions should be a common part of visual knowledge - updated every time the eyes move X,Y, such that Red(X), Blue(Y), and Above(X,Y) –There is some place I can foveate and see Red –There is some place I can foveate and see Blue –If I foveate first the Red place, “mark” it, then the Blue place, the mark will be Above the fovea (may need to search) –These are typical ideas of modern, active, deictic vision X X

41
Should All Knowledge be Experiential? Allowing only Predictions in terms of Data? loses Expressiveness –can’t talk about objects, space, people; no “is-a” or “part-of” External (human) coherence –verbal labels, interpretability, explainability, calibration –the “shortcut” of entering knowledge directly into the agent gains The knowledge will have meaning to the machine It can be mechanically learned/verified/extended It will be suited for a general reasoning processes –composition and backup of predictions to yield new predictions

42
There is value in forcing world knowledge into prediction form We will finally have all the knowledge in the machine –all will be mechanically interpretable –we will finally really understand the knowledge’s meaning –anything else is just an empty shell Agent will be able to learn/verify/extend knowledge –provides an internal coherence for the knowledge –enable building it up from a firm foundation The knowledge will flow immediately into a general reasoning engine –the concatenation of predictions yields new predictions

43
Conclusions World knowledge must be expressed in terms of the data Such posterior grounding is challenging, –lose expressiveness in the short term –lose external (human) coherence, explainability But can be done step by step, And brings palpable benefits –autonomous learning/verification/extension of knowledge –autonomous complexity management due to internal coherence –knowledge suited to general reasoning process We must provide this grounding!

Similar presentations

OK

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google