Other NN Models Reinforcement learning (RL)

Name: Other NN Models Reinforcement learning (RL)
Uploaded: 2017-07-24T17:36:47+00:00
Duration: PTM5S11
Channel: Blake Arnold
Description: Other NN Models Reinforcement learning (RL)

Other NN Models Reinforcement learning (RL)
Probabilistic neural networks Support vector machine (SVM)

Reinforcement learning (RL)
Basic ideas: Supervised learning: (delta rule, BP) Samples (x, f(x)) to learn function f(.) precise error can be determined and is used to drive the learning. Unsupervised learning: (competitive, SOM, BM) no target/desired output provided to help learning, learning is self-organized/clustering reinforcement learning: in between the two no target output for input vectors in training samples a judge/critic will evaluate the output good: reward signal (+1) bad: penalty signal (-1)

RL exists in many places
Originated from psychology (conditional reflex) In many applications, it is much easier to determine good/bad, right/wrong, acceptable/unacceptable than to provide precise correct answer/error. It is up to the learning process to improve the system’s performance based on the critic’s signal. Machine learning community, different theories and algorithms major difficulty: credit/blame distribution chess playing: W/L (multi-step) soccer playing: W/L (multi-player)

Principle of RL Let r = +1 reword (good output)
r = -1 penalty (bad output) If r = +1, the system is encouraged to continue what it is doing If r = -1, the system is encouraged not to do what it is doing. Need to search for better output because r = -1 does not indicate what the good output should be. common method is “random search”

ARP: the associative reword-and-penalty
Algorithm for NN RL (Barton and Anandan, 1985) Architecture z(k) critic y(k) input: x(k) output: y(k) stochastic units: z(k) for random search x(k)

Random search by stochastic units zi
or let zi obey a continuous probability distribution function. or let is a random noise, obeys certain distribution. Key: z is not a deterministic function of x, this gives z a chance to be a good output. Prepare desired output (temporary)

Compute the errors at z layer
where E(z(k)) is the expected value of z(k) because z is a random variable How to compute E(z(k)) take average of z over a period of time compute from the distribution, if possible if logistic sigmoid function is used, Training: Delta rule to learn weights for output nodes BP or other methods to modify weights at lower layers

Probabilistic Neural Networks
Purpose: classify a given input pattern x into one of the pre-defined classes by Bayesian decision rule. Suppose there are k predefined classes s1, …sk P(si): prior probability of class si P(x|si): conditional probability of x, given si P(x): probability of x P(si|x): posterior probability of si, given x Example: , the set of all patients si: the set of all patients having disease si x: a description (manifestations) of a patient

P(x|si): prob. patient with disease si will have
description x P(si|x): prob. patient with description x will have disease si. by Bayes’ theorem:

2. Estimate probabilities
- Training exemplars: the jth exemplar belonging to si - Priors can be obtained either by experts’ estimate or calculated from exemplars - Conditionals are estimated according to Parzen estimator: - closely related to radial basis function of Gaussian

3. PNN architecture: feed forward of 4 layers
input layer decision layer class layer exemplar layer Exemplar layer: RBF nodes, one per exemplar, centered on Class layer: connecting to all exemplars belonging to that class si, Decision layer: picks up winner based on If necessary training to adjust weights for upper layers

4. Comments: Classification by Bayes’ rule Fast classification
Fast learning Guaranteed to approach the Bayes’ optimal decision surface provided that the class probability density functions are smooth and continuous. Trade nodes for time( not good with large training samples) The probabilistic density function to be represented must be smooth and continuous.

Other NN Models Reinforcement learning (RL)

Similar presentations

Presentation on theme: "Other NN Models Reinforcement learning (RL)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Other NN Models Reinforcement learning (RL)

Similar presentations

Presentation on theme: "Other NN Models Reinforcement learning (RL)"— Presentation transcript:

Similar presentations

About project

Feedback