Download presentation
Presentation is loading. Please wait.
1
Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ
2
A Comparison of Fuzzy & Classical Controllers Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms. Rules close to natural language A priori knowledge Classical Controller: Need analytical task model.
3
Design Problem of FC A priori knowledge extraction is not easy: Disagreement between experts Great number of variables necessary to solve the control task
4
Self Tunning FIS A direct teacher: based on input-output set of trainning data. A distal teacher: does not give the correct actions, but the desired effect on the process. A performance measure: EA A critic: gives rewards and punishment with respect to state reached by the learner. RL methods. There are no more than two fuzzy sets activated for an input value
5
Goal To overcome the limitations of classical reinforcement learning methods, ”discrete state perception and discrete actions”. NOTE: In this paper MISO FIS is used.
6
A MIMO FIS FIS is made of N rules of the following form: R i : i th rule of the rule base S i :input variables L i j : linguistic term of input variable; its membership function Lij Y N O :output variables O i j : linguistic term of output variable
7
Rule Preconditions Membership functions are triangles and trapezoids (altough not differentiable). because they are simple Sufficient in a number of application Strong fuzzy partition used: All values activate at least one fuzzy set, the input universe is completely covered.
8
Strong Fuzzy Partition Example
9
Rule Conclusions Each of i rule has No corresponding conclusions: For Each Rule the truth value with respect to S is computed with: where T norm is implemented by a product: The FIS outputs are
10
Learning Number and positions of the input fuzzy labels being set using a priori knowledge. Structural Learning: consists in tuning the number of rules. FACL and FQL learning: are reinforcement learning methods that deal with only the conclusion part.
11
Reinforcement Learning NOTE: state observability is total.
12
Markovian Decision Problem S a finite discrete state U a finite discrete action R primary reinforcements R:SxU R P transition probabilities P:SxUxS [0,1]. State evaluation function:
13
The Curse of Dimensionality Some form of generalization must be incorporated in state representation. Various function approximators used: CMAC Neural Networks FIS: the state space encoding is based on a vector corresponding to the current state.
14
Adaptive Heuristic Critic AHC is made of two components: Adaptive Critic Element: Critic developed in an adaptive way from primary reinforcements, represent an evaluation function more informative than the one given by the environment through rewards and punishment (V(S) values). Associative Search Element: selects actions which lead to better critic values
15
FACL Scheme
16
The Critic At time step t, the critic value is computed with conclusion vector: TD error is given by: TD-learning update rule:
17
The Actor When the rule R i is activated, one of the R i local action is elected to participate in the global action, based on its quality. The global action triggered: where -greedy is a function implementing mixed exploration-exploitation strategy.
18
Tunning vector w TD error, the improvement measure except in the beginning is a good approximator of the optimal evaluation function. The actor learning rule:
19
Meta Learning Rule Update strategie for learning rate: Every parameter should have its learning rate. ( =1 n) Every learning rate should be allowed to vary over time. (in order V values to converge) When the derivative of a parameter have the same sign for several consecutive time steps, its learning rate should be increased. When the parameter derivative sign alternates for several consecutive time steps, its learning rate should be decreased. Delta-Bar-Delta rule:
20
Execution Procedure 1. Estimation of evaluation function corresponding to the current state. 2. Computation of the TD error. 3. Tunning of parameter vector v and w. 4. Estimation of the new evaluation function for the current state with new conclusion vector v t+1. 5. Learning rate updating with Delta-Bar-Delta rule. 6. For each activated rule, election of the local action: computation and triggering of the global action U t+1.
21
Example
22
Example Cont. The number of rules is twenty five. For the sake of simplicity, the discerete actions available are the same for all rules. The discerete action set: The reinforcement function:
23
Results Performance measure for distance: Results:
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.