Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Similar presentations


Presentation on theme: "Introduction to Neural Networks John Paxton Montana State University Summer 2003."— Presentation transcript:

1 Introduction to Neural Networks John Paxton Montana State University Summer 2003

2 Chapter 7: A Sampler Of Other Neural Nets Optimization Problems Common Extensions Adaptive Architectures Neocognitron

3 I. Optimization Problems Travelling Salesperson Problem. Map coloring. Job shop scheduling. RNA secondary structure.

4 Advantages of Neural Nets Can find near optimal solutions. Can handle weak (desirable, but not required) constraints.

5 TSP Topology Each row has 1 unit that is on Each column has 1 unit that is on City A City B City C 1 st 2 nd 3rd

6 Boltzmann Machine Hinton, Sejnowski (1983) Can be modelled using Markov chains Uses simulated annealing Each row is fully interconnected Each column is fully interconnected

7 Architecture u i,j connected to u k,j+1 with –d i,k u i1 connected to u kn with -d ik U 11 U n1 U nn U 1n b -p

8 Algorithm 1.Initialize weights b, p p > b p > greatest distance between cities Initialize temperature T Initialize activations of units to random binary values

9 Algorithm 2.while stopping condition is false, do steps 3 – 8 3.do steps 4 – 7 n 2 times (1 epoch) 4.choose i and j randomly 1 <= i, j <= n u ij is candidate to change state

10 Algorithm 5.Compute c = [1 – 2u ij ]b +  u km (-p) where k <> i, m <> j 6.Compute probability to accept change a = 1 / (1 + e (-c/T) ) 7.Accept change if random number [0..1] < a. If change, u ij = 1 – u ij 8.Adjust temperature T =.95T

11 Stopping Condition No state change for a specified number of epochs. Temperature reaches a certain value.

12 Example T(0) = 20 ½ units are on initially b = 60 p = 70 10 cities, all distances less than 1 200 or fewer epochs to find stable configuration in 100 random trials

13 Other Optimization Architectures Continuous Hopfield Net Gaussian Machine Cauchy Machine –Adds noise to input in attempt to escape from local minima –Faster annealing schedule can be used as a consequence

14 II. Extensions Modified Hebbian Learning –Find parameters for optimal surface fit of training patterns

15 Boltzmann Machine With Learning Add hidden units 2-1-2 net below could be used for simple encoding/decoding (data compression) x1x1 x2x2 z1z1 y2y2 y1y1

16 Simple Recurrent Net Learn sequential or time varying patterns Doesn’t necessarily have steady state output input units context units hidden units output units

17 Architecture x1x1 xnxn cpcp c1c1 zpzp z1z1 ymym y1y1

18 Simple Recurrent Net f(c i (t)) = f(z i (t-1)) f(c i (0)) = 0.5 Can use backpropagation Can learn string of characters

19 Example: Finite State Automaton 4 x i 4 y i 2 z i 2 c i BEGIN A B END

20 Backpropagation In Time Rumelhart, Williams, Hinton (1986) Application: Simple shift register x1 x2 z1 y2 y1 x2 x1 1 (fixed)

21 Backpropagation Training for Fully Recurrent Nets Adapts backpropagation to arbitrary connection patterns.

22 III. Adaptive Architectures Probabilistic Neural Net (Specht 1988) Cascade Correlation (Fahlman, Lebiere 1990)

23 Probabilistic Neural Net Builds its own architecture as training progresses Chooses class A over class B if h A c A f A (x) > h B c B f B (x) c A is the cost of classifying an example as belonging to A when it belongs to B h A is the a priori probability of an example belonging to class A

24 Probabilistic Neural Net f A (x) is the probability density function for class A, f A (x) is learned by the net z A1 : pattern unit, f A : summation unit x1x1 xnxn z Bk z B1 z Aj z A1 fBfB fAfA y

25 Cascade Correlation Builds own architecture while training progresses Tries to overcome slow rate of convergence by other neural nets Dynamically adds hidden units (as few as possible) Trains one layer at a time

26 Cascade Correlation Stage 1 x0x0 x1x1 x2x2 y2y2 y1y1

27 Cascade Correlation Stage 2 (fix weights into z 1 ) x0x0 x1x1 x2x2 y2y2 y1y1 z1z1

28 Cascade Correlation Stage 3 (fix weights into z2) x0x0 x1x1 x2x2 y2y2 y1y1 z1z1 z2z2

29 Algorithm 1.Train stage 1. If error is not acceptable, proceed. 2.Train stage 2. If error is not acceptable, proceed. 3.Etc.

30 IV. Neocognitron Fukushima, Miyako, Ito (1983) Many layers, hierarchical Very spare and localized connections Self organizing Supervised learning, layer by layer Recognizes handwritten 0, 1, 2, 3, … 9, regardless of position and style

31 Architecture Layer# of ArraysSize Input119 2 S1 / C112 / 819 2 / 11 2 S2 / C238 / 2211 2 / 7 2 S3 / C332 / 307 2 / 7 2 S4 / C416 / 103 2 / 1 2

32 Architecture S layers respond to patterns C layers combine results, use larger field of view For example S 1 1 responds to 0 0 0 1 1 1 0 0 0

33 Training Progresses layer by layer S 1 connections to C 1 are fixed C 1 connections to S 2 are adaptable A V 2 layer is introduced between C 1 and S 2, V 2 is inhibatory C 1 to V 2 connections are fixed V 2 to S 2 connections are adaptable


Download ppt "Introduction to Neural Networks John Paxton Montana State University Summer 2003."

Similar presentations


Ads by Google