Presentation is loading. Please wait.

Presentation is loading. Please wait.

Miguel Angel Soto Santibanez. Overview Why they are important Previous Work Advantages and Shortcomings.

Similar presentations


Presentation on theme: "Miguel Angel Soto Santibanez. Overview Why they are important Previous Work Advantages and Shortcomings."— Presentation transcript:

1 Miguel Angel Soto Santibanez

2 Overview Why they are important Previous Work Advantages and Shortcomings

3 Overview Advantages and Shortcomings New Technique Illustration

4 Overview Illustration Centralized Learning Agent Issue New Techniques

5 Overview New Techniques Evaluate Set of Techniques Summary of Contributions

6 Overview Summary of Contributions Software Tool

7 Early Sensorimotor Neural Networks Sensory System (receptors) Motor System (motor executors )

8 Modern Sensorimotor Neural Networks Sensory System Motor System Cerebellum

9 Natural Cerebellum FAST PRECISELY SYNCHRONIZED

10 Artificial Cerebellum

11 Previous Work Cerebellatron CMAC FOX SpikeFORCE SENSOPAC

12 Previous work strengths Cerebellatron improved movement smoothness in robots.

13 Previous work strengths CMAC provided function approximation with fast convergence.

14 Previous work strengths FOX improved convergence by making use of eligibility vectors.

15 Previous work strengths SpikeFORCE and SENSOPAC have improved our understanding of natural cerebellums.

16 Previous work strengths May allow us to: Treat nervous system ailments. Discover biological mechanisms.

17 Previous work strengths LWPR a step in the right direction towards tackling the scalability issue.

18 Previous work issues Cerebellatron is difficult to use and requires very complicated control input

19 Previous work issues CMAC and FOX depend on fixed sized tiles and therefore do not scale well.

20 Previous work issues Methods proposed by SpikeFORCE and SENSOPAC require rare skills.

21 Previous work issues The methods LWPR proposed by SENSOPAC only works well if the problem has only a few non- redundant and non-irrelevant dimensions.

22 Previous work issues Two Categories: 1) Framework Usability Issues: 2)Building Blocks Incompatibility Issues:

23 Previous work issues Two Categories: 1) Framework Usability Issues: very difficult to use requires very specialize skills

24 Previous work issues Two Categories: 2) Building Blocks Incompatibility Issues: memory incompatibility processing incompatibility

25 Proposed technique. Two Categories: Framework Usability Issues: new development framework

26 Proposed technique. Two Categories: Building Blocks Incompatibility Issues new I/O mapping algorithm

27 Proposed Technique Provides a shorthand notation.

28 Proposed Technique Provides a recipe.

29 Proposed Technique Provides simplification rules.

30 Proposed Technique Provides a more compatible I/O mapping algorithm. Moving Prototypes

31 Proposed Technique The shorthand notation symbols: a sensor:

32 Proposed Technique The shorthand notation symbols: an actuator:

33 Proposed Technique The shorthand notation symbols: a master learning agent:

34 Proposed Technique The shorthand notation symbols: the simplest artificial cerebellum:

35 Proposed Technique The shorthand notation symbols: an encoder:

36 Proposed Technique The shorthand notation symbols: a decoder:

37 Proposed Technique The shorthand notation symbols: an agent with sensors and actuators:

38 Proposed Technique The shorthand notation symbols: a sanity point: S

39 Proposed Technique The shorthand notation symbols: a slave sensor learning agent:

40 Proposed Technique The shorthand notation symbols: a slave actuator learning agent:

41 Proposed Technique The Recipe: 1)Become familiar with problem at hand. 2) Enumerate significant factors. 3) Categorize factors as either sensors, actuators or both. 4)Specify sanity points.

42 Proposed Technique The Recipe: 5)Simplify overloaded agents. 6) Describe system using proposed shorthand notation. 7) Apply simplification rules. 8)Specify reward function for each agent.

43 Proposed Technique The simplification rules: 1) Two agents in series can be merged into a single agent:

44 Proposed Technique The simplification rules: 2)Ok to apply simplification rules as long as no sanity point is destroyed. S

45 Proposed Technique The simplification rules: 3)If decoder and encoder share same output and input signals respectively, they can be deleted

46 Proposed Technique The simplification rules: 4)Decoders with a single output signal can be deleted

47 Proposed Technique The simplification rules: 5)Encoders with a single output signal can be deleted

48 Proposed Technique The simplification rules: 6)If several agents receive signals from a single decoder and send their signals to single encoder:

49 Proposed Technique Q-learning: Off-policy Control algorithm. Temporary-Difference algorithm. Can be applied online.

50 Proposed Technique L. A. R Q-learning:

51 Proposed Technique Q-learning: How it does it? By learning an estimate of the long-term expected reward for any state-action pair.

52 Proposed Technique state long term expected reward action Q-learning: value function

53 Proposed Technique Q-learning: state long term expected reward action

54 Proposed Technique Q-learning: state long term expected reward action stateaction 64140 72140 80140 88140 96140 104140 112140

55 Proposed Technique action state long term expected reward stateaction 0.20.70 0.40.75 0.60.90 0.80.95 1.00.80 1.20.90 1.40.55 Q-learning:

56 Proposed Technique How is the Q-function updated? Initialize Q(s, a) arbitrarily Repeat (for each episode) Initialize s Repeat (for each step of the episode) Choose a for s using an exploratory policy Take action a, observe r, s’ Q(s, a)  Q(s,a) +  [r +  max Q (s’, a’) – Q(s, a)] a’ s  s’

57 Proposed Technique states actions 987654321987654321 1 2 3 4 5 6 7 8 9 Q-learning:

58 Proposed Technique Tile Coding Kanerva Coding Proposed alternatives:

59 Proposed Technique Distribution of memory resources:

60 Proposed Technique Moving Prototypes:

61 Proposed Technique Tile Based vs. Moving Prototypes:

62 Proposed Technique CMAC vs. Moving Prototypes:

63 Proposed Technique LWPR vs. Moving Prototypes:

64 Hidden Input Output Proposed Technique LWPR vs. Moving Prototypes:

65 Hidden Input Output Proposed Technique O(n)O(log n)

66 LWPR vs. Moving Prototypes: Hidden Input Output Proposed Technique One trillion interactions A few dozen interactions

67 Centralized L.A. issue Steering angle L. A. R R R Obstacles’ positions

68 Centralized L.A. issue L. A. R R R

69 Centralized L.A. issue L. A. R R R

70 Centralized L.A. issue L. A. R R R

71 Centralized L.A. issue ( possible solution ) Centralize L.A.: Distributed L.A.: L. A.

72 Distributed L.A. Additional Tasks: 1)Analyze Distributed L.A. Systems. 2)Identify most important bottlenecks. 3)Propose amelioration techniques. 4)Evaluate their performance.

73 Distributed L.A. DEVS Java:

74 Distributed L.A. Some DEVS Java details: 1)Provides continuous time support. 2)Models are easy to define. 3)Do not need to worry about the simulator. 4)Supports concurrency.

75 Distributed L.A. First technique: Learning will occur faster if we split the value function in slices along the “action” axis. states actions states actions states actions

76 Distributed L.A. First Technique Results: 1) 3500 time units to learn to balance pole. 2) 1700 time units to learn to balance pole. ~ 50% improvement.

77 Distributed L.A. Second technique: We suspect that we may be able to ameliorate this problem by making “sluggish” learning agents pass some of their load to “speedy” learning agents.. Sluggish agent Speedy agent

78 Distributed L.A. Second Technique Results: 1) 52000 time units to learn to balance pole. 2) 42000 time units to learn to balance pole. ~ 20% improvement.

79 Distributed L.A. Third Technique: Giving each distributed LA its own local simulator should improve learning performance significantly.

80 Distributed L.A. Third Technique Results: 1) 29000 time units to learn to balance pole. 2) 14000 time units to learn to balance pole. ~ 50% improvement.

81 Distributed L.A. Fourth Thechnique: Using a push policy instead of a poll policy to propagate max-values among the learning agents should improve learning performance significantly.

82 Distributed L.A. Fourth Technique Results: 1) 14000 time units to learn to balance pole. 2) 6000 time units to learn to balance pole. ~ 60% improvement.

83 Distributed L.A. How does the Push policy scale?

84 Distributed L.A. Results: 1) 24000 time units to learn to balance pole. 2) 12000 time units to learn to balance pole. ~ 50% improvement.

85 Distributed L.A. Results Suggest the use of a distributed system that: 1) divides value function along “action axis”. 2) auto balances load from slow agents to fast ones. 3) allows each agent access to its own simulator. 4) uses a Push policy to propagate maxQ(s’,a’) information.

86 Contributions Explained importance of Artificial Cerebellums:

87 Contributions Provided Analysis of previous work: Cerebellatron CMAC FOX SpikeFORCE SENSOPAC

88 Contributions Extracted most important shortcomings: 1) Framework usability. 2) Building blocks incompatibility.

89 Contributions Proposed to use a new framework

90 Contributions Proposed the use of Moving Prototypes: CMAC, FOX Moving Prototypes

91 Contributions Proposed the use of Moving Prototypes Hidden Input Output O(n)O(log n)

92 Contributions Illustrated new technique using a real-life problem

93 Contributions Proposed using distributed agent system L. A.

94 Contributions Proposed a set of techniques to ameliorate bottlenecks.

95 Future Work Development Tool:

96 Future Work Automatization: Extract Policy Train

97 Future Work Intuitive: Extract Policy Train Components

98 Future Work Simple component configuration: Extract Policy Train Hardware Details Signal Table Details …. Components

99 Future Work Integrated simulator: Extract Policy Train Components

100 Extract Policy Train Components Library Water Components Space Components Land Components Air Components Future Work Available library:

101 Questions or Comments

102

103

104 Asteroids Problem

105 Asteroids Problem (step 1)

106 spacecraft

107 Asteroids Problem (step 1) spacecraft

108 Asteroids Problem (step 2) 1)Distance to nearby Asteroids 1)Angle to nearby Asteroids 1)Angle to goal direction d a g

109 Asteroids Problem (step 3) 1)Distance to nearby Asteroids  Sensor d

110 Asteroids Problem (step 3) 2)Angle to nearby Asteroids  Sensor  Actuator a

111 Asteroids Problem (step 3) 3)Angle to goal direction  Sensor  Actuator

112 Asteroids Problem (step 3) Sensor 1  find distance to nearby Asteroids Sensor 2  find angle to nearby Asteroids Sensor 3  find angle to goal direction Actuator 1  Steer spacecraft (to avoid collision) Actuator 2  Steer spacecraft (to match goal)

113 Asteroids Problem (step 3) Sensor 1  find distance to nearby Asteroids Sensor 2  find angle to nearby Asteroids Sensor 3  find angle to goal direction Actuator 1  Steer spacecraft (to avoid collision and to match goal)

114 Asteroids Problem (step 3) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and instruct the actuators A 1 what to do in this case

115 Asteroids Problem (step 4) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and instruct the actuators A 1 what to do in this case S desired delta heading angle

116 Asteroids Problem (step 5) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and instruct the actuators A 1 what to do in this case S desired delta heading angle

117 Asteroids Problem (step 6) S S1 S2 S3 A1 desired delta heading angle

118 Asteroids Problem (step 6) S radar angles to nearby asteroids distances to nearby asteroids desired delta heading angle angle to goal

119 Asteroids Problem (step 6)

120

121 Asteroids Problem [0,4] distance [0,0] angle signaldistance (m) 0[0,19) 1[19,39) 2[39,59) 3[59,79) 4[79,∞) signalAngle (°) 0[0,22.5)

122 Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal distances to nearby asteroids [0,390624] desired delta heading angle

123 Asteroids Problem (step 6) spacecraft

124 Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] desired delta heading angle

125 Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] desired delta heading angle SignalAngle 0-10° 1-9° …… 100° …… 199° 2010°

126 Asteroids Problem (step 6)

127

128

129 S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle

130 Asteroids Problem (step 7) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle

131 Asteroids Problem (step 8) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle

132 Asteroids Problem (step 8) spacecraft environment [0,24999999][0,20] desired delta heading angle propulsion signal [0,20] [0,100] FIRST AGENT:SECOND AGENT:

133 Asteroids Problem (step 8) FIRST AGENT: REWARD FUNCTION: Reward = distance to an asteroid < 10m  -10000 else => (180 – |angleToGoal| ) if Spacecraft environment [0,24999999][0,20] Desired delta heading angle

134 Asteroids Problem (step 8) SECOND AGENT: REWARD FUNCTION: Reward = - |achievedAngle – desiredAngle| desired delta heading angle propulsion signal [0,20] [0,100]

135 Asteroids Problem (training)

136 spacecraft

137 Asteroids Problem (issues) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999] desired delta heading angle [0,20] propulsion system propulsion signal [0,100] [0,20]

138 Asteroids Problem (issues) 1)Master L.A. took much longer to converge.

139 Asteroids Problem (issues) 2)Trained system not as good as experienced human driver.

140 Asteroids Problem (issues) 3)Slave L.A. should probably react differently to a given command depending on current angular velocity.

141 Asteroids Problem ( possible solutions ) 1)Should simplify Master Agent. L. A.

142 Asteroids Problem ( possible solutions ) 2)Use more precise angles in order to characterize a nearby asteroid.

143 Asteroids Problem ( possible solutions ) 3)Allow the slave agent to know the current angular velocity before selecting the appropriate steering signal.

144 Asteroids Problem (step 1) Radars are not very precise with respect to predicting distance to target. θ Signal Strength θ

145 Asteroids Problem (step 1)

146 Asteroids Problem (step 2) New factors: 1) angular velocity around azimuth of the spacecraft

147 Asteroids Problem (step 3) Angular velocity around azimuth  Sensor

148 Asteroids Problem (step 3) Angular velocity around azimuth  Sensor

149 Asteroids Problem (step 3) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and current angular velocity from s4 and then instruct the actuators A 1 what to do in this case S Desired delta heading angle

150 Asteroids Problem (step 4) Learning agent receives position of closest asteroids and current heading from sensor S 3 and generates the “desired heading angle” signal Learning agent receives “desired delta heading angle” signal and current angular velocity and then instruct the actuators A 1 what to do in this case desired delta heading angle S position of closest asteroids S Learning agent receives information about environment from sensor S 1 and generates the “distance to closest asteroids” signal

151 Asteroids Problem (step 4) Laser Verification

152 Asteroids Problem (step 5) 1)Slave Sensor Learning Agent: signal from radar  distance to closest asteroids. 2)Master Learning Agent: closest asteroids position and angle to goal  desired delta heading angle. 3)Slave Actuator Learning Agent: desired delta heading angle  steering signal

153 Asteroids Problem (step 6) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar

154 Asteroids Problem (step 6)

155 θ Signal Strength

156 Asteroids Problem (step 6) θθ Signal Strength

157 Asteroids Problem (step 6) Signal Strength θ

158 Asteroids Problem (step 6) Signal Strength θ

159 Asteroids Problem (step 6) Signal Strength θ

160 Asteroids Problem (step 6) Signal Strength θ

161 Asteroids Problem (step 6) [0,4] distance [0,4] angle signaldistance (m) 0[0,19) 1[19,39) 2[39,59) 3[59,79) 4[79,∞) signalangle (°) 0[0,2) 1[2,4) 2[4,6) 3[6,8) 4[8,10)

162 Asteroids Problem (step 6) [79,∞) 4 [19,39) 1 [39,59) 2 Signal Strength θ

163 Asteroids Problem (step 6) Signal Strength θ [8,10) 4 [4,6) 2 [2,4) 1

164 Asteroids Problem (step 6) radar desired delta heading angle S Position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope Solar wind detector radar [0,15624] [0,124] [0,124999] distance to closest asteroids angle to closest asteroids

165 Asteroids Problem (step 6) θ Signal Strength

166 Asteroids Problem (step 6) signalamplitude 0[0,0.01) 1[0.01,0.02) … 497[4.97,498) 498[4.98,4.99) 499[4.99,∞)

167 Asteroids Problem (step 6) radar desired delta heading angle S Position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope Solar wind detector radar [0,15624] [0,124] [0,499] [0,124999] distance to closest asteroids angle to closest asteroids

168 Asteroids Problem (step 6) [0,9] angular speed signalangular speed (°/s) 0(-∞,-7] 1(-7,-5] 2(-5,-3] 3(-3,-1] 4(-1,1) 5[1,3) 6[3,5) 7[5,7) 8[7, ∞)

169 Asteroids Problem (step 6) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]

170 Asteroids Problem (step 7) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]

171 Asteroids Problem (step 8) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]

172 Asteroids Problem (step 8) FIRST AGENT: [0,124] [0,499] radar signal distances to closest asteroids

173 Asteroids Problem (step 8) position of closest asteroids [0,124999][0,20] desired delta heading angle SECOND AGENT:

174 Asteroids Problem (step 8) desired delta heading angle and current angular velocity propulsion signal [0,188] [0,100] THIRD AGENT:

175 Asteroids Problem (step 8) FIRST AGENT: [0,124] radar signal distances of closest asteroids REWARD FUNCTION: for each distance: reward = -10* deltaDistance Note: deltaDistance = |distance – laserDistance|

176 Asteroids Problem (step 8) position of closest asteroids [0,124999][0,20] desired delta heading angle SECOND AGENT:REWARD FUNCTION: if distance to an asteroid -10000 reward = else => (180 – |angleToGoal| )

177 Asteroids Problem (step 8) desired delta heading angle propulsion signal [0,209] [0,100] THIRD AGENT: REWARD FUNCTION: reward = - |achievedAngle – desiredAngle|

178 Asteroids Problem radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]

179


Download ppt "Miguel Angel Soto Santibanez. Overview Why they are important Previous Work Advantages and Shortcomings."

Similar presentations


Ads by Google