Chapter 10 Planning, Acting, and Learning. 2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.

Slides:



Advertisements
Similar presentations
Informed Search Algorithms
Advertisements

Artificial Intelligence Presentation
Solving Problem by Searching
11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Artificial Intelligence Chapter 9 Heuristic Search Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.
CSC 423 ARTIFICIAL INTELLIGENCE
Best-First Search: Agendas
Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson.
Problem Solving and Search in AI Part I Search and Intelligence Search is one of the most powerful approaches to problem solving in AI Search is a universal.
Planning under Uncertainty
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Reinforcement Learning
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Collaborative Reinforcement Learning Presented by Dr. Ying Lu.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
1 1 Slide © 2000 South-Western College Publishing/ITP Slides Prepared by JOHN LOUCKS.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
CS 146: Data Structures and Algorithms July 21 Class Meeting
Chapter 11: Artificial Intelligence
Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Computer Science CPSC 322 Lecture A* and Search Refinements (Ch 3.6.1, 3.7.1, 3.7.2) Slide 1.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Reinforcement Learning
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
1 Solving problems by searching 171, Class 2 Chapter 3.
Search CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Heuristic Search Andrea Danyluk September 16, 2013.
Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
Artificial Intelligence Lecture No. 6 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Introduction to Artificial Intelligence (G51IAI) Dr Rong Qu Blind Searches - Introduction.
A* optimality proof, cycle checking CPSC 322 – Search 5 Textbook § 3.6 and January 21, 2011 Taught by Mike Chiang.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Chapter 3.5 and 3.6 Heuristic Search Continued. Review:Learning Objectives Heuristic search strategies –Best-first search –A* algorithm Heuristic functions.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Lecture 2: Problem Solving using State Space Representation CS 271: Fall, 2008.
Artificial Intelligence Chapter 7 Agents That Plan Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.
Planning, Acting, and Learning Chapter Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
Plan Agents Chapter 7..
CS b659: Intelligent Robotics
Artificial Intelligence Chapter 12 Adversarial Search
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Announcements Homework 3 due today (grace period through Friday)
Problem Solving and Searching
CSE (c) S. Tanimoto, 2001 Search-Introduction
Chapter 8: Generalization and Function Approximation
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
Chapter 9: Planning and Learning
CS 188: Artificial Intelligence Fall 2008
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Text Categorization Berlin Chen 2003 Reference:
Artificial Intelligence Chapter 7 Agents That Plan
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
October 20, 2010 Dr. Itamar Arel College of Engineering
Presentation transcript:

Chapter 10 Planning, Acting, and Learning

2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals

3 The Sense/Plan/Act Cycle Pitfalls on idealized assumptions in Chap. 7 Perceptual processes might not always provide the necessary information about the state of the environment e.g.) perceptual aliasing Actions might not always have their modeled effects There may be other physical processes in the world or other agents The existence of external effects causes another problem

4 The agent might be required to act before it can complete a search to a goal state Even if the agent had sufficient time, its computational memory resources might not permit search to a goal state. Approaches for above difficulties probabilistic methods MDP[Puterman, 1994], POMDP[Lovejoy, 1991] sense/plan/act with environmental feedback working around with various additional assumptions and approximations The Sense/Plan/Act Cycle (cont ’ d)

5 Figure 10.1: An Architecture for a Sense/Plan/Act Agent

6 Approximate Search Definition search process that address the problem of limited computational and/or time resources at the price of producing plans that might be sub-optimal or that might not always reliably lead to a goal state. Relaxing the requirement of producing optimal plans reduces the computational cost of finding a plan. Search for a complete path to a goal node without requiring that it be optimal. Search for a partial path that does not take us all the way to a goal node e.g.) A*-type search, anytime algorithm[Dean & Boddy 1988, Horvitz 1997]

7 Island-Driven Search establish a sequence of “ island nodes ” in the search space through which it is suspected that good paths pass. Approximate Search (cont ’ d) Figure 10.2: An Island-Driven Search

8 Figure 10.3: A Hierarchical Search Hierarchical Search  much like island-driven search except that it do not have an explicit set of islands.

9 Figure 10.4: Pushing a Block

10 Approximate Search (cont ’ d) Limited-Horizon Search It may be useful to use the amount of time or computation available to find a path to a node thought to be on a good path to the goal even if that node is not a goal node itself n*: a node having the smallest value of f ’ among the nodes on the search frontier when search must be terminated.

11 Approximate Search (cont ’ d) Building reactive procedures Reactive agents can usually act more quickly than can planning agents. Pre-compute some frequently used plans off-line and store them as reactive routines that produce appropriate actions quickly online.

12 Figure 10.5: A Spanning Tree for a Block-Stacking Problem

13 Learning Heuristic Functions Learning from experiences continuous feedback from the environment is one way to reduce uncertainties and to compensate for an agent ’ s lack of knowledge about the effects of its actions. Useful information can be extracted from the experience of interacting the environments. Explicit Graphs and Implicit Graphs

14 Learning Heuristic Functions Explicit Graphs Agent has a good model of the effects of its actions and knows the costs of moving from any node to its successor nodes. C(n i, n j ): the cost of moving from n i to n j.  (n 0, a): the description of the state reached from node n after taking action a. DYNA [Sutton 1990] Combination of “ learning in the world ” with “ learning and planning in the model ”.

15 Learning Heuristic Functions Implicit Graphs Impractical to make an explicit graph or table of all the nodes and their transitions. To learn the heuristic function while performing a search process. e.g.) Eight-puzzle W(n): the number of tiles in the wrong place, P(n): the sum of the distances that each tile if from “ home ”

16 Learning Heuristic Functions Learning the weights Minimizing the sum of the squared errors between the training samples and the h ’ function given by the weighted combination. Node expansion Temporal difference learning [Sutton 1988]: the weight adjustment depends only on two temporally adjacent values of a function.

17 Rewards Instead of Goals State-space search more theoretical conditions It is assumed that the agent had a single, short-term task that could be described by a goal condition. Practical problem the task cannot be so simply stated. The user expresses his or her satisfaction and dissatisfaction with task performance by giving the agent positive and negative rewards. The task for the agent can be formalized to maximize the amount of reward it receives.

18 Rewards Instead of Goals Seeking an action policy that maximizes reward Policy Improvement by Its Iteration  : policy function on nodes whose value is the action prescribed by that policy at that node. r(n i, a): the reward received by the agent when it takes an action a at n i.  (n j ): the value of any special reward given for reaching node n j.

19 Value iteration [Barto, Bradtke, and Singh, 1995] delayed-reinforcement learning learning action policies in settings in which rewards depend on a sequence of earlier actions temporal credit assignment credit those state-action pairs most responsible for the reward structural credit assignment in state space too large for us to store the entire graph, we must aggregate states with similar V ’ values. [Kaelbling, Littman, and Moore, 1996]