Miguel Angel Soto Santibanez. Overview Why they are important Previous Work Advantages and Shortcomings.

Slides:



Advertisements
Similar presentations
Viktor Zhumatiya, Faustino Gomeza,
Advertisements

Generating Random Numbers
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Intelligent Agents Russell and Norvig: 2
COMP322/S2000/L41 Classification of Robot Arms:by Control Method The Control unit is the brain of the robot. It contains the instructions that direct the.
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Kostas Kontogiannis E&CE
Unit 4 Sensors and Actuators
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Artificial Neural Networks - Introduction -
Artificial Neural Networks - Introduction -
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Simple Neural Nets For Pattern Classification
Autonomous Robot Navigation Panos Trahanias ΗΥ475 Fall 2007.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Reinforcement Learning
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
1 Geometry A line in 3D space is represented by  S is a point on the line, and V is the direction along which the line runs  Any point P on the line.
Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.
ONLINE Q-LEARNER USING MOVING PROTOTYPES by Miguel Ángel Soto Santibáñez.
October 7, 2010Neural Networks Lecture 10: Setting Backpropagation Parameters 1 Creating Data Representations On the other hand, sets of orthogonal vectors.
Artificial Neural Networks -Application- Peter Andras
Palletizing the Easy Way
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function Networks
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
Fuzzy control of a mobile robot Implementation using a MATLAB-based rapid prototyping system.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Reinforcement Learning
Encoding Robotic Sensor States for Q-Learning using the Self-Organizing Map Gabriel J. Ferrer Department of Computer Science Hendrix College.
Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Reinforcement Learning
Submitted by: Giorgio Tabarani, Christian Galinski Supervised by: Amir Geva CIS and ISL Laboratory, Technion.
Mobile Robot Navigation Using Fuzzy logic Controller
1 Intelligent Autonomous Adaptive Control ( AAC) Method and AAC systems Prof. Alexander ZHDANOV Head of Adaptive control methods Department
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
Lesson 4: Computer method overview
Neural Networks Chapter 7
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Artificial Neural Networks Students: Albu Alexandru Deaconescu Ionu.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

Playing Tic-Tac-Toe with Neural Networks
Flexible and fast convergent learning agent Miguel A. Soto Santibanez Michael M. Marefat Department of Electrical and Computer Engineering University of.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Maestro AI Vision and Design Overview Definitions Maestro: A naïve Sensorimotor Engine prototype. Sensorimotor Engine: Combining sensory and motor functions.
Introduction to Machine Learning, its potential usage in network area,
SNS COLLEGE OF ENGINEERING
FUZZY NEURAL NETWORKS TECHNIQUES AND THEIR APPLICATIONS
Machine Learning for dotNET Developer Bahrudin Hrnjica, MVP
"Playing Atari with deep reinforcement learning."
NC and CNC machines and Control Programming
Creating Data Representations
Reinforcement Learning
Artificial Intelligence Lecture No. 28
Chapter 8: Generalization and Function Approximation
Emir Zeylan Stylianos Filippou
Presentation transcript:

Miguel Angel Soto Santibanez

Overview Why they are important Previous Work Advantages and Shortcomings

Overview Advantages and Shortcomings New Technique Illustration

Overview Illustration Centralized Learning Agent Issue New Techniques

Overview New Techniques Evaluate Set of Techniques Summary of Contributions

Overview Summary of Contributions Software Tool

Early Sensorimotor Neural Networks Sensory System (receptors) Motor System (motor executors )

Modern Sensorimotor Neural Networks Sensory System Motor System Cerebellum

Natural Cerebellum FAST PRECISELY SYNCHRONIZED

Artificial Cerebellum

Previous Work Cerebellatron CMAC FOX SpikeFORCE SENSOPAC

Previous work strengths Cerebellatron improved movement smoothness in robots.

Previous work strengths CMAC provided function approximation with fast convergence.

Previous work strengths FOX improved convergence by making use of eligibility vectors.

Previous work strengths SpikeFORCE and SENSOPAC have improved our understanding of natural cerebellums.

Previous work strengths May allow us to: Treat nervous system ailments. Discover biological mechanisms.

Previous work strengths LWPR a step in the right direction towards tackling the scalability issue.

Previous work issues Cerebellatron is difficult to use and requires very complicated control input

Previous work issues CMAC and FOX depend on fixed sized tiles and therefore do not scale well.

Previous work issues Methods proposed by SpikeFORCE and SENSOPAC require rare skills.

Previous work issues The methods LWPR proposed by SENSOPAC only works well if the problem has only a few non- redundant and non-irrelevant dimensions.

Previous work issues Two Categories: 1) Framework Usability Issues: 2)Building Blocks Incompatibility Issues:

Previous work issues Two Categories: 1) Framework Usability Issues: very difficult to use requires very specialize skills

Previous work issues Two Categories: 2) Building Blocks Incompatibility Issues: memory incompatibility processing incompatibility

Proposed technique. Two Categories: Framework Usability Issues: new development framework

Proposed technique. Two Categories: Building Blocks Incompatibility Issues new I/O mapping algorithm

Proposed Technique Provides a shorthand notation.

Proposed Technique Provides a recipe.

Proposed Technique Provides simplification rules.

Proposed Technique Provides a more compatible I/O mapping algorithm. Moving Prototypes

Proposed Technique The shorthand notation symbols: a sensor:

Proposed Technique The shorthand notation symbols: an actuator:

Proposed Technique The shorthand notation symbols: a master learning agent:

Proposed Technique The shorthand notation symbols: the simplest artificial cerebellum:

Proposed Technique The shorthand notation symbols: an encoder:

Proposed Technique The shorthand notation symbols: a decoder:

Proposed Technique The shorthand notation symbols: an agent with sensors and actuators:

Proposed Technique The shorthand notation symbols: a sanity point: S

Proposed Technique The shorthand notation symbols: a slave sensor learning agent:

Proposed Technique The shorthand notation symbols: a slave actuator learning agent:

Proposed Technique The Recipe: 1)Become familiar with problem at hand. 2) Enumerate significant factors. 3) Categorize factors as either sensors, actuators or both. 4)Specify sanity points.

Proposed Technique The Recipe: 5)Simplify overloaded agents. 6) Describe system using proposed shorthand notation. 7) Apply simplification rules. 8)Specify reward function for each agent.

Proposed Technique The simplification rules: 1) Two agents in series can be merged into a single agent:

Proposed Technique The simplification rules: 2)Ok to apply simplification rules as long as no sanity point is destroyed. S

Proposed Technique The simplification rules: 3)If decoder and encoder share same output and input signals respectively, they can be deleted

Proposed Technique The simplification rules: 4)Decoders with a single output signal can be deleted

Proposed Technique The simplification rules: 5)Encoders with a single output signal can be deleted

Proposed Technique The simplification rules: 6)If several agents receive signals from a single decoder and send their signals to single encoder:

Proposed Technique Q-learning: Off-policy Control algorithm. Temporary-Difference algorithm. Can be applied online.

Proposed Technique L. A. R Q-learning:

Proposed Technique Q-learning: How it does it? By learning an estimate of the long-term expected reward for any state-action pair.

Proposed Technique state long term expected reward action Q-learning: value function

Proposed Technique Q-learning: state long term expected reward action

Proposed Technique Q-learning: state long term expected reward action stateaction

Proposed Technique action state long term expected reward stateaction Q-learning:

Proposed Technique How is the Q-function updated? Initialize Q(s, a) arbitrarily Repeat (for each episode) Initialize s Repeat (for each step of the episode) Choose a for s using an exploratory policy Take action a, observe r, s’ Q(s, a)  Q(s,a) +  [r +  max Q (s’, a’) – Q(s, a)] a’ s  s’

Proposed Technique states actions Q-learning:

Proposed Technique Tile Coding Kanerva Coding Proposed alternatives:

Proposed Technique Distribution of memory resources:

Proposed Technique Moving Prototypes:

Proposed Technique Tile Based vs. Moving Prototypes:

Proposed Technique CMAC vs. Moving Prototypes:

Proposed Technique LWPR vs. Moving Prototypes:

Hidden Input Output Proposed Technique LWPR vs. Moving Prototypes:

Hidden Input Output Proposed Technique O(n)O(log n)

LWPR vs. Moving Prototypes: Hidden Input Output Proposed Technique One trillion interactions A few dozen interactions

Centralized L.A. issue Steering angle L. A. R R R Obstacles’ positions

Centralized L.A. issue L. A. R R R

Centralized L.A. issue L. A. R R R

Centralized L.A. issue L. A. R R R

Centralized L.A. issue ( possible solution ) Centralize L.A.: Distributed L.A.: L. A.

Distributed L.A. Additional Tasks: 1)Analyze Distributed L.A. Systems. 2)Identify most important bottlenecks. 3)Propose amelioration techniques. 4)Evaluate their performance.

Distributed L.A. DEVS Java:

Distributed L.A. Some DEVS Java details: 1)Provides continuous time support. 2)Models are easy to define. 3)Do not need to worry about the simulator. 4)Supports concurrency.

Distributed L.A. First technique: Learning will occur faster if we split the value function in slices along the “action” axis. states actions states actions states actions

Distributed L.A. First Technique Results: 1) 3500 time units to learn to balance pole. 2) 1700 time units to learn to balance pole. ~ 50% improvement.

Distributed L.A. Second technique: We suspect that we may be able to ameliorate this problem by making “sluggish” learning agents pass some of their load to “speedy” learning agents.. Sluggish agent Speedy agent

Distributed L.A. Second Technique Results: 1) time units to learn to balance pole. 2) time units to learn to balance pole. ~ 20% improvement.

Distributed L.A. Third Technique: Giving each distributed LA its own local simulator should improve learning performance significantly.

Distributed L.A. Third Technique Results: 1) time units to learn to balance pole. 2) time units to learn to balance pole. ~ 50% improvement.

Distributed L.A. Fourth Thechnique: Using a push policy instead of a poll policy to propagate max-values among the learning agents should improve learning performance significantly.

Distributed L.A. Fourth Technique Results: 1) time units to learn to balance pole. 2) 6000 time units to learn to balance pole. ~ 60% improvement.

Distributed L.A. How does the Push policy scale?

Distributed L.A. Results: 1) time units to learn to balance pole. 2) time units to learn to balance pole. ~ 50% improvement.

Distributed L.A. Results Suggest the use of a distributed system that: 1) divides value function along “action axis”. 2) auto balances load from slow agents to fast ones. 3) allows each agent access to its own simulator. 4) uses a Push policy to propagate maxQ(s’,a’) information.

Contributions Explained importance of Artificial Cerebellums:

Contributions Provided Analysis of previous work: Cerebellatron CMAC FOX SpikeFORCE SENSOPAC

Contributions Extracted most important shortcomings: 1) Framework usability. 2) Building blocks incompatibility.

Contributions Proposed to use a new framework

Contributions Proposed the use of Moving Prototypes: CMAC, FOX Moving Prototypes

Contributions Proposed the use of Moving Prototypes Hidden Input Output O(n)O(log n)

Contributions Illustrated new technique using a real-life problem

Contributions Proposed using distributed agent system L. A.

Contributions Proposed a set of techniques to ameliorate bottlenecks.

Future Work Development Tool:

Future Work Automatization: Extract Policy Train

Future Work Intuitive: Extract Policy Train Components

Future Work Simple component configuration: Extract Policy Train Hardware Details Signal Table Details …. Components

Future Work Integrated simulator: Extract Policy Train Components

Extract Policy Train Components Library Water Components Space Components Land Components Air Components Future Work Available library:

Questions or Comments

Asteroids Problem

Asteroids Problem (step 1)

spacecraft

Asteroids Problem (step 1) spacecraft

Asteroids Problem (step 2) 1)Distance to nearby Asteroids 1)Angle to nearby Asteroids 1)Angle to goal direction d a g

Asteroids Problem (step 3) 1)Distance to nearby Asteroids  Sensor d

Asteroids Problem (step 3) 2)Angle to nearby Asteroids  Sensor  Actuator a

Asteroids Problem (step 3) 3)Angle to goal direction  Sensor  Actuator

Asteroids Problem (step 3) Sensor 1  find distance to nearby Asteroids Sensor 2  find angle to nearby Asteroids Sensor 3  find angle to goal direction Actuator 1  Steer spacecraft (to avoid collision) Actuator 2  Steer spacecraft (to match goal)

Asteroids Problem (step 3) Sensor 1  find distance to nearby Asteroids Sensor 2  find angle to nearby Asteroids Sensor 3  find angle to goal direction Actuator 1  Steer spacecraft (to avoid collision and to match goal)

Asteroids Problem (step 3) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and instruct the actuators A 1 what to do in this case

Asteroids Problem (step 4) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and instruct the actuators A 1 what to do in this case S desired delta heading angle

Asteroids Problem (step 5) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and instruct the actuators A 1 what to do in this case S desired delta heading angle

Asteroids Problem (step 6) S S1 S2 S3 A1 desired delta heading angle

Asteroids Problem (step 6) S radar angles to nearby asteroids distances to nearby asteroids desired delta heading angle angle to goal

Asteroids Problem (step 6)

Asteroids Problem [0,4] distance [0,0] angle signaldistance (m) 0[0,19) 1[19,39) 2[39,59) 3[59,79) 4[79,∞) signalAngle (°) 0[0,22.5)

Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal distances to nearby asteroids [0,390624] desired delta heading angle

Asteroids Problem (step 6) spacecraft

Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] desired delta heading angle

Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0, ][0,20] desired delta heading angle SignalAngle 0-10° 1-9° …… 100° …… 199° 2010°

Asteroids Problem (step 6)

S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0, ][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle

Asteroids Problem (step 7) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0, ][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle

Asteroids Problem (step 8) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0, ][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle

Asteroids Problem (step 8) spacecraft environment [0, ][0,20] desired delta heading angle propulsion signal [0,20] [0,100] FIRST AGENT:SECOND AGENT:

Asteroids Problem (step 8) FIRST AGENT: REWARD FUNCTION: Reward = distance to an asteroid < 10m  else => (180 – |angleToGoal| ) if Spacecraft environment [0, ][0,20] Desired delta heading angle

Asteroids Problem (step 8) SECOND AGENT: REWARD FUNCTION: Reward = - |achievedAngle – desiredAngle| desired delta heading angle propulsion signal [0,20] [0,100]

Asteroids Problem (training)

spacecraft

Asteroids Problem (issues) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0, ] desired delta heading angle [0,20] propulsion system propulsion signal [0,100] [0,20]

Asteroids Problem (issues) 1)Master L.A. took much longer to converge.

Asteroids Problem (issues) 2)Trained system not as good as experienced human driver.

Asteroids Problem (issues) 3)Slave L.A. should probably react differently to a given command depending on current angular velocity.

Asteroids Problem ( possible solutions ) 1)Should simplify Master Agent. L. A.

Asteroids Problem ( possible solutions ) 2)Use more precise angles in order to characterize a nearby asteroid.

Asteroids Problem ( possible solutions ) 3)Allow the slave agent to know the current angular velocity before selecting the appropriate steering signal.

Asteroids Problem (step 1) Radars are not very precise with respect to predicting distance to target. θ Signal Strength θ

Asteroids Problem (step 1)

Asteroids Problem (step 2) New factors: 1) angular velocity around azimuth of the spacecraft

Asteroids Problem (step 3) Angular velocity around azimuth  Sensor

Asteroids Problem (step 3) Angular velocity around azimuth  Sensor

Asteroids Problem (step 3) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and current angular velocity from s4 and then instruct the actuators A 1 what to do in this case S Desired delta heading angle

Asteroids Problem (step 4) Learning agent receives position of closest asteroids and current heading from sensor S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and current angular velocity and then instruct the actuators A 1 what to do in this case desired delta heading angle S position of closest asteroids S Learning agent receives information about environment from sensor S 1 and generates the “distance to closest asteroids” signal

Asteroids Problem (step 4) Laser Verification

Asteroids Problem (step 5) 1)Slave Sensor Learning Agent: signal from radar  distance to closest asteroids. 2)Master Learning Agent: closest asteroids position and angle to goal  desired delta heading angle. 3)Slave Actuator Learning Agent: desired delta heading angle  steering signal

Asteroids Problem (step 6) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar

Asteroids Problem (step 6)

θ Signal Strength

Asteroids Problem (step 6) θθ Signal Strength

Asteroids Problem (step 6) Signal Strength θ

Asteroids Problem (step 6) Signal Strength θ

Asteroids Problem (step 6) Signal Strength θ

Asteroids Problem (step 6) Signal Strength θ

Asteroids Problem (step 6) [0,4] distance [0,4] angle signaldistance (m) 0[0,19) 1[19,39) 2[39,59) 3[59,79) 4[79,∞) signalangle (°) 0[0,2) 1[2,4) 2[4,6) 3[6,8) 4[8,10)

Asteroids Problem (step 6) [79,∞) 4 [19,39) 1 [39,59) 2 Signal Strength θ

Asteroids Problem (step 6) Signal Strength θ [8,10) 4 [4,6) 2 [2,4) 1

Asteroids Problem (step 6) radar desired delta heading angle S Position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope Solar wind detector radar [0,15624] [0,124] [0,124999] distance to closest asteroids angle to closest asteroids

Asteroids Problem (step 6) θ Signal Strength

Asteroids Problem (step 6) signalamplitude 0[0,0.01) 1[0.01,0.02) … 497[4.97,498) 498[4.98,4.99) 499[4.99,∞)

Asteroids Problem (step 6) radar desired delta heading angle S Position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope Solar wind detector radar [0,15624] [0,124] [0,499] [0,124999] distance to closest asteroids angle to closest asteroids

Asteroids Problem (step 6) [0,9] angular speed signalangular speed (°/s) 0(-∞,-7] 1(-7,-5] 2(-5,-3] 3(-3,-1] 4(-1,1) 5[1,3) 6[3,5) 7[5,7) 8[7, ∞)

Asteroids Problem (step 6) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]

Asteroids Problem (step 7) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]

Asteroids Problem (step 8) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]

Asteroids Problem (step 8) FIRST AGENT: [0,124] [0,499] radar signal distances to closest asteroids

Asteroids Problem (step 8) position of closest asteroids [0,124999][0,20] desired delta heading angle SECOND AGENT:

Asteroids Problem (step 8) desired delta heading angle and current angular velocity propulsion signal [0,188] [0,100] THIRD AGENT:

Asteroids Problem (step 8) FIRST AGENT: [0,124] radar signal distances of closest asteroids REWARD FUNCTION: for each distance: reward = -10* deltaDistance Note: deltaDistance = |distance – laserDistance|

Asteroids Problem (step 8) position of closest asteroids [0,124999][0,20] desired delta heading angle SECOND AGENT:REWARD FUNCTION: if distance to an asteroid reward = else => (180 – |angleToGoal| )

Asteroids Problem (step 8) desired delta heading angle propulsion signal [0,209] [0,100] THIRD AGENT: REWARD FUNCTION: reward = - |achievedAngle – desiredAngle|

Asteroids Problem radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]