B. Stochastic Neural Networks

Slides:

Advertisements

Similar presentations

Bioinspired Computing Lecture 16

Advertisements

Lecture 10 Boltzmann machine

Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.

Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.

Simulated Annealing Premchand Akella. Agenda Motivation The algorithm Its applications Examples Conclusion.

Laurent Itti: CS564 - Brain Theory and Artificial Intelligence

CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)

Tuomas Sandholm Carnegie Mellon University Computer Science Department

CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.

CS 678 –Relaxation and Hopfield Networks1 Relaxation and Hopfield Networks Totally connected recurrent relaxation networks Bidirectional weights (symmetric)

Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.

Basic Models in Neuroscience Oren Shriki 2010 Associative Memory 1.

MAE 552 – Heuristic Optimization

Simulated Annealing 10/7/2005.

1 Simulated Annealing Terrance O ’ Regan. 2 Outline Motivation The algorithm Its applications Examples Conclusion.

EDA (CS286.5b) Day 7 Placement (Simulated Annealing) Assignment #1 due Friday.

Neural Networks Chapter 2 Joost N. Kok Universiteit Leiden.

MAE 552 – Heuristic Optimization Lecture 6 February 4, 2002.

Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.

Optimization via Search CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.

By Rohit Ray ESE 251.  Most minimization (maximization) strategies work to find the nearest local minimum  Trapped at local minimums (maxima)  Standard.

Elements of the Heuristic Approach

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.

1 IE 607 Heuristic Optimization Simulated Annealing.

Chapter 7 Other Important NN Models Continuous Hopfield mode (in detail) –For combinatorial optimization Simulated annealing (in detail) –Escape from local.

Stochastic Optimization and Simulated Annealing Psychology /719 January 25, 2001.

MonteCarlo Optimization (Simulated Annealing) Mathematical Biology Lecture 6 James A. Glazier.

10/6/20151 III. Recurrent Neural Networks. 10/6/20152 A. The Hopfield Network.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Simulated Annealing.

CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.

Mathematical Models & Optimization?

Introduction to Simulated Annealing Study Guide for ES205 Yu-Chi Ho Xiaocang Lin Aug. 22, 2000.

Introduction to Simulated Annealing Study Guide for ES205 Xiaocang Lin & Yu-Chi Ho August 22, 2000.

The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.

Simulated Annealing. Difficulty in Searching Global Optima starting point descend direction local minima global minima barrier to local search.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

An Introduction to Simulated Annealing Kevin Cannons November 24, 2005.

CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

2/28/20161 VIII. Review of Key Concepts. 2/28/20162 Complex Systems Many interacting elements Local vs. global order: entropy Scale (space, time) Phase.

CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.

CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.

Thermal Physics Heat Physics.

Scientific Research Group in Egypt (SRGE)

Simulated Annealing Chapter

Simulated Annealing Premchand Akella.

CSC321 Lecture 18: Hopfield nets and simulated annealing

Heuristic Optimization Methods

Real Neurons Cell structures Cell body Dendrites Axon

By Rohit Ray ESE 251 Simulated Annealing.

Artificial Intelligence (CS 370D)

Maria Okuniewski Nuclear Engineering Dept.

CSE 589 Applied Algorithms Spring 1999

Artificial Neural Networks

OCNC Statistical Approach to Neural Learning and Population Coding ---- Introduction to Mathematical.

Introduction to Simulated Annealing

Constrained Molecular Dynamics as a Search and Optimization Tool

Boltzmann Machine (BM) (§6.4)

Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014

Artificial Intelligence

A Dynamic System Analysis of Simultaneous Recurrent Neural Network

More on HW 2 (due Jan 26) Again, it must be in Python 2.7.

Greg Knowles ECE Fall 2004 Professor Yu Hu Hen

Simulated Annealing & Boltzmann Machines

CSC 578 Neural Networks and Deep Learning

Stochastic Methods.

Presentation transcript:

B. Stochastic Neural Networks Part 3B: Stochastic Neural Networks 4/25/2017 B. Stochastic Neural Networks (in particular, the stochastic Hopfield network) 4/25/2017

Trapping in Local Minimum Part 3B: Stochastic Neural Networks 4/25/2017 Trapping in Local Minimum 4/25/2017

Escape from Local Minimum Part 3B: Stochastic Neural Networks 4/25/2017 Escape from Local Minimum 4/25/2017

Escape from Local Minimum Part 3B: Stochastic Neural Networks 4/25/2017 Escape from Local Minimum 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Motivation Idea: with low probability, go against the local field move up the energy surface make the “wrong” microdecision Potential value for optimization: escape from local optima Potential value for associative memory: escape from spurious states because they have higher energy than imprinted states 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 The Stochastic Neuron h s(h) Q is unit step function (Flake, p. 310-11) 4/25/2017

Properties of Logistic Sigmoid Part 3B: Stochastic Neural Networks 4/25/2017 Properties of Logistic Sigmoid As h  +, s(h)  1 As h  –, s(h)  0 s(0) = 1/2 4/25/2017

Logistic Sigmoid With Varying T Part 3B: Stochastic Neural Networks 4/25/2017 Logistic Sigmoid With Varying T b in 0, 1, …, 20 T in 0.05, …, 0.1, infinity T varying from 0.05 to  (1/T =  = 0, 1, 2, …, 20) 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Logistic Sigmoid T = 0.5 Slope at origin = 1 / 2T 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Logistic Sigmoid T = 0.01 Note that this is virtually a threshold 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Logistic Sigmoid T = 0.1 Slightly softened threshold 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Logistic Sigmoid T = 1 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Logistic Sigmoid T = 10 Nearly random state assignment, with slight bias determined by local field 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Logistic Sigmoid T = 100 Virtually independent of local field 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Pseudo-Temperature Temperature = measure of thermal energy (heat) Thermal energy = vibrational energy of molecules A source of random motion Pseudo-temperature = a measure of nondirected (random) change Logistic sigmoid gives same equilibrium probabilities as Boltzmann-Gibbs distribution Other rules (such as Metropolis algorithm) also give B-G distr. (HKP App.) At high temp., B. distr has uniform pref for all states, regardless of energy; at low temp, only the minimum energy states have a nonzero prob (Haykin 316) 4/25/2017

Transition Probability Part 3B: Stochastic Neural Networks 4/25/2017 Transition Probability Interpret the final formula 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Stability Are stochastic Hopfield nets stable? Thermal noise prevents absolute stability But with symmetric weights: 4/25/2017

Does “Thermal Noise” Improve Memory Performance? Part 3B: Stochastic Neural Networks 4/25/2017 Does “Thermal Noise” Improve Memory Performance? Experiments by Bar-Yam (pp. 316-20): n = 100 p = 8 Random initial state To allow convergence, after 20 cycles set T = 0 How often does it converge to an imprinted pattern? 4/25/2017

Probability of Random State Converging on Imprinted State (n=100, p=8) Part 3B: Stochastic Neural Networks 4/25/2017 Probability of Random State Converging on Imprinted State (n=100, p=8) Since will never reach stable state when T > 0, evolved for 20 steps at T = 1/b and then 5 at T = 0. (Bar-Yam 316-20; fig. 2.2.8) T = 1 / b 4/25/2017 (fig. from Bar-Yam)

Probability of Random State Converging on Imprinted State (n=100, p=8) Part 3B: Stochastic Neural Networks 4/25/2017 Probability of Random State Converging on Imprinted State (n=100, p=8) (Bar-Yam 316-20; fig. 2.2.8) 4/25/2017 (fig. from Bar-Yam)

Analysis of Stochastic Hopfield Network Part 3B: Stochastic Neural Networks 4/25/2017 Analysis of Stochastic Hopfield Network Complete analysis by Daniel J. Amit & colleagues in mid-80s See D. J. Amit, Modeling Brain Function: The World of Attractor Neural Networks, Cambridge Univ. Press, 1989. The analysis is beyond the scope of this course 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Phase Diagram (D) all states melt (C) spin-glass states desired retrieval states (with possibility of some wrong bits) + spin-glass states retrieval states are global minima still have retrieval states, but SG states are lower in energy (more stable) net has many stable states, but they are all SG states SG states melt; only solution of mean-field equations is <Si> = 0 Outside of AB, it’s not useful as a memory change in properties are discontinuous (first-order phase xition) m (ratio of avg value of state to an imprinted patn) drops from 0.97 to 0 except on T axis (at a = 0, T = 1), where it’s continuous (second-order) The mixture states exist in a region shaped like AB, but smaller in area because they are higher energy The most stable mixtures (3-mixtures) extend to T = 0.46, a = 0.03 Click on legend to go to conceptual diagrams Position mouse over Go arrow to skip to next slide (HKP 39-41; fig. from Haykin fig. 8.14) (A) imprinted = minima (B) imprinted, but s.g. = min. 4/25/2017 (fig. from Domany & al. 1991)

Conceptual Diagrams of Energy Landscape Part 3B: Stochastic Neural Networks 4/25/2017 Conceptual Diagrams of Energy Landscape Position mouse over return arrow to go back to previous (HKP fig. 2.18) (fig. from Hertz & al. Intr. Theory Neur. Comp.) 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Phase Diagram Detail Below the TR curve: replica symmetry breakdown within the retrieval states, a fraction of neurons freeze in a spin-glass fashion Has little practical effect (but illustrates intricacies of behavior) (Haykin 313; fig. based on Haykin fig. 8.14) 4/25/2017 (fig. from Domany & al. 1991)

Part 3B: Stochastic Neural Networks 4/25/2017 Simulated Annealing (Kirkpatrick, Gelatt & Vecchi, 1983) Science 220: 671-80. 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Dilemma In the early stages of search, we want a high temperature, so that we will explore the space and find the basins of the global minimum In the later stages we want a low temperature, so that we will relax into the global minimum and not wander away from it Solution: decrease the temperature gradually during search [Could do Bar-Yam Question 2.2.2 results next (BY 316-20)] 4/25/2017

Quenching vs. Annealing Part 3B: Stochastic Neural Networks 4/25/2017 Quenching vs. Annealing Quenching: rapid cooling of a hot material may result in defects & brittleness local order but global disorder locally low-energy, globally frustrated Annealing: slow cooling (or alternate heating & cooling) reaches equilibrium at each temperature allows global order to emerge achieves global low-energy state Examples: metal or glass, heated to melting point 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Multiple Domains global incoherence local coherence Arrows might represent crystallization axes The lowest energy is most stable, and occurs when axes are aligned Red lines indicate mismatched/weak binding, and therefore likely defects 4/25/2017

Moving Domain Boundaries Part 3B: Stochastic Neural Networks 4/25/2017 Moving Domain Boundaries Note that the elevated temperature melts the domain boundaries and allows them to move 4/25/2017

Effect of Moderate Temperature Part 3B: Stochastic Neural Networks 4/25/2017 Effect of Moderate Temperature 4/25/2017 (fig. from Anderson Intr. Neur. Comp.)

Effect of High Temperature Part 3B: Stochastic Neural Networks 4/25/2017 Effect of High Temperature 4/25/2017 (fig. from Anderson Intr. Neur. Comp.)

Effect of Low Temperature Part 3B: Stochastic Neural Networks 4/25/2017 Effect of Low Temperature 4/25/2017 (fig. from Anderson Intr. Neur. Comp.)

Part 3B: Stochastic Neural Networks 4/25/2017 Annealing Schedule Controlled decrease of temperature Should be sufficiently slow to allow equilibrium to be reached at each temperature With sufficiently slow annealing, the global minimum will be found with probability 1 Design of schedules is a topic of research Asymptotic cvg proved by Gemen & Geman (1984) They give sufficient bounds on slowness [Tk  T0 / log(1 + k)], but it’s too slow to be practical (Haykin 317) 4/25/2017

Typical Practical Annealing Schedule Part 3B: Stochastic Neural Networks 4/25/2017 Typical Practical Annealing Schedule Initial temperature T0 sufficiently high so all transitions allowed Exponential cooling: Tk+1 = aTk typical 0.8 < a < 0.99 at least 10 accepted transitions at each temp. Final temperature: three successive temperatures without required number of accepted transitions Accepted transitions (this is the Metropolis algorithm approach, which Kirkpatrick & al., used): consider a random transition DE  0  accept transition DE > 0  accept with prob exp(-DE / T) (Haykin 318) 4/25/2017

Part 3B: Stochastic Neural Networks 4/25/2017 Summary Non-directed change (random motion) permits escape from local optima and spurious states Pseudo-temperature can be controlled to adjust relative degree of exploration and exploitation 4/25/2017

Hopfield Network for Task Assignment Problem Six tasks to be done (I, II, …, VI) Six agents to do tasks (A, B, …, F) They can do tasks at various rates A (10, 5, 4, 6, 5, 1) B (6, 4, 9, 7, 3, 2) etc What is the optimal assignment of tasks to agents? 4/25/2017

NetLogo Implementation of Task Assignment Problem Run TaskAssignment.nlogo 4/25/2017

Additional Bibliography Part 3B: Stochastic Neural Networks 4/25/2017 Additional Bibliography Anderson, J.A. An Introduction to Neural Networks, MIT, 1995. Arbib, M. (ed.) Handbook of Brain Theory & Neural Networks, MIT, 1995. Hertz, J., Krogh, A., & Palmer, R. G. Introduction to the Theory of Neural Computation, Addison-Wesley, 1991. Part IV 4/25/2017