Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 On-Line Learning with Recycled Examples: A Cavity Analysis Peixun Luo and K. Y. Michael Wong Hong Kong University of Science and Technology.

Similar presentations


Presentation on theme: "1 On-Line Learning with Recycled Examples: A Cavity Analysis Peixun Luo and K. Y. Michael Wong Hong Kong University of Science and Technology."— Presentation transcript:

1 1 On-Line Learning with Recycled Examples: A Cavity Analysis Peixun Luo and K. Y. Michael Wong Hong Kong University of Science and Technology

2 2 Inputs: ξ j, j = 1,..., N Weights: J j j = 1, …, N Activation: y = J·ξ Output: S = f(y) Formulation JjJj y

3 3 Given p = αN examples with inputs: ξ j μ j = 1,..., N, μ = 1, …, p outputs: y μ generated by a teacher network Learning is done by defining a risk function and minimizing it by gradient descent. The Learning of a Task JjJj y

4 4 *Define a cost function in terms of the examples. E = Σ μ E μ + regularization terms *On-line learning: At time t, draw an example σ(t) and: ΔJ j ~ Gradient with respect to σ(t) + weight decay *Batch learning: At time t, ΔJ j ~ Average gradient with respect to all examples + weight decay Learning Dynamics

5 5 Batch vs On-line Batch learningOn-line learning Same batch of examples for all steps An independent example per step Simple dynamics: no sequence dependence Complex dynamics: sequence dependence Small stepwise changes of examples Giant boosts of examples stepwise Previous analysis: possible Previous analysis: limited to infinite sets Stable but inefficientEfficient but less stable

6 6 *It has been applied to many complex systems. *It has been applied to steady-state properties of learning. *It uses a self-consistency argument to consider what happens when a set of p examples is expanded to p + 1examples. *The central quantity is the cavity activation, which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0). *Since the original network has no information about example 0, the cavity activation obeys a random distribution (e.g. a Gaussian). *Now suppose the network incorporates example 0 at time s. The activation is no longer random. The Cavity Method

7 7 *The cavity activation diffuses randomly. *The generic activation, receiving a stimulus at time s, is no longer random. *The background examples also adjust due to the newcomer. *Assuming that the background adjustments are small, we can use linear response theory to superpose the effects due to all previous times s. Linear Response time stimulation time s X(t) h(t) random diffusion

8 8 *For batch learning: Generic activation of an example at time t = cavity activation of the example at time t + integrate s (Green’s function from time s to t) x(gradient term at time s). *For on-line learning: Generic activation of an example at time t = cavity activation of the example at time t + summation s (Green’s function from time s to t) x(gradient term at time s). The learning instants s are Poisson distributed. Useful Equations

9 9 Simulation Results generic activation (with giant boosts) (line) cavity activation from theory (dots) simulation with example removed learning instants (Poisson distributed) theory and simulations agree!

10 10 Further Development training error generalization error theory and simulations agree!

11 11 Critical Learning Rate (1) theory and simulations agree! critical learning rate at which learning diverges other approximations

12 12 Critical Learning Rate (2) theory and simulations agree! critical learning rate at which learning diverges other approximations

13 13 Average Learning theory and simulations agree! generalization error drops when the dynamics is averaged over monitoring periods

14 14 *We have analysed the dynamics of on-line learning with recycled examples using the cavity approach. *Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning. *Theory and simulations agree well on: the evolution of the training and generalization errors, the critical learning rate at which learning diverges, the performance of average learning. *Future: to develop a Monte Carlo sampling procedure for multilayer networks. Conclusion


Download ppt "1 On-Line Learning with Recycled Examples: A Cavity Analysis Peixun Luo and K. Y. Michael Wong Hong Kong University of Science and Technology."

Similar presentations


Ads by Google