Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.

Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ

A Comparison of Fuzzy & Classical Controllers  Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms.  Rules close to natural language  A priori knowledge  Classical Controller: Need analytical task model.

Design Problem of FC  A priori knowledge extraction is not easy:  Disagreement between experts  Great number of variables necessary to solve the control task

Self Tunning FIS  A direct teacher: based on input-output set of trainning data.  A distal teacher: does not give the correct actions, but the desired effect on the process.  A performance measure: EA  A critic: gives rewards and punishment with respect to state reached by the learner. RL methods.  There are no more than two fuzzy sets activated for an input value

Goal  To overcome the limitations of classical reinforcement learning methods, ”discrete state perception and discrete actions”. NOTE: In this paper MISO FIS is used.

A MIMO FIS FIS is made of N rules of the following form: R i : i th rule of the rule base S i :input variables L i j : linguistic term of input variable; its membership function  Lij Y N O :output variables O i j : linguistic term of output variable

Rule Preconditions  Membership functions are triangles and trapezoids (altough not differentiable).  because they are simple  Sufficient in a number of application  Strong fuzzy partition used:  All values activate at least one fuzzy set, the input universe is completely covered.

Strong Fuzzy Partition Example

Rule Conclusions  Each of i rule has No corresponding conclusions:  For Each Rule the truth value with respect to S is computed with: where T norm is implemented by a product:  The FIS outputs are

Learning  Number and positions of the input fuzzy labels being set using a priori knowledge.  Structural Learning: consists in tuning the number of rules.  FACL and FQL learning: are reinforcement learning methods that deal with only the conclusion part.

Reinforcement Learning NOTE: state observability is total.

Markovian Decision Problem  S a finite discrete state  U a finite discrete action  R primary reinforcements R:SxU  R  P transition probabilities P:SxUxS  [0,1].  State evaluation function:

The Curse of Dimensionality  Some form of generalization must be incorporated in state representation. Various function approximators used:  CMAC  Neural Networks  FIS: the state space encoding is based on a vector corresponding to the current state.

Adaptive Heuristic Critic  AHC is made of two components:  Adaptive Critic Element: Critic developed in an adaptive way from primary reinforcements, represent an evaluation function more informative than the one given by the environment through rewards and punishment (V(S) values).  Associative Search Element: selects actions which lead to better critic values

FACL Scheme

The Critic At time step t, the critic value is computed with conclusion vector: TD error is given by: TD-learning update rule:

The Actor  When the rule R i is activated, one of the R i local action is elected to participate in the global action, based on its quality. The global action triggered: where  -greedy is a function implementing mixed exploration-exploitation strategy.

Tunning vector w  TD error, the improvement measure except in the beginning is a good approximator of the optimal evaluation function. The actor learning rule:

Meta Learning Rule  Update strategie for learning rate:  Every parameter should have its learning rate. (  =1  n)  Every learning rate should be allowed to vary over time. (in order V values to converge)  When the derivative of a parameter have the same sign for several consecutive time steps, its learning rate should be increased.  When the parameter derivative sign alternates for several consecutive time steps, its learning rate should be decreased. Delta-Bar-Delta rule:

Execution Procedure 1. Estimation of evaluation function corresponding to the current state. 2. Computation of the TD error. 3. Tunning of parameter vector v and w. 4. Estimation of the new evaluation function for the current state with new conclusion vector v t+1. 5. Learning rate updating with Delta-Bar-Delta rule. 6. For each activated rule, election of the local action: computation and triggering of the global action U t+1.

Example

Example Cont.  The number of rules is twenty five.  For the sake of simplicity, the discerete actions available are the same for all rules.  The discerete action set:  The reinforcement function:

Results  Performance measure for distance:  Results:

Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.

Similar presentations

Presentation on theme: "Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.

Similar presentations

Presentation on theme: "Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ."— Presentation transcript:

Similar presentations

About project

Feedback