Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Introduction to Neural Networks John Paxton Montana State University Summer 2003

Chapter 4: Competition Force a decision (yes, no, maybe) to be made. Winner take all is a common approach. Kohonen learning w j (new) = w j (old) +  (x – w j (old)) w j is closest weight vector, determined by Euclidean distance.

MaxNet Lippman, 1987 Fixed-weight competitive net. Activation function f(x) = x if x > 0, else 0. Architecture a1a1 a2a2 -- 1 1

Algorithm 1.w ij = 1 if i = j, otherwise –  2.a j (0) = s i, t = 0. 3.a j (t+1) = f[a j (t) –  *  k<>j a k (t)] 4.go to step 3 if more than one node has a non-zero activation Special Case: More than one node has the same maximum activation.

Example s 1 =.5, s 2 =.1,  =.1 a 1 (0) =.5, a 2 (0) =.1 a 1 (1) =.49, a 2 (1) =.05 a 1 (2) =.485, a 2 (2) =.001 a 1 (3) =.4849, a 2 (3) = 0

Mexican Hat Kohonen, 1989 Contrast enhancement Architecture (w 0, w 1, w 2, w 3 ) w 0 (x i -> x i ), w 1 (x i+1 -> x i and x i-1 ->x i ) x i-3 x i-2 x i-1 xixi x i+1 x i+2 x i+3 0 - + + + - 0

Algorithm 1.initialize weights 2.x i (0) = s i 3.for some number of steps do 4.x i (t+1) = f [  w k x i+k (t) ] 5.x i (t+1) = max(0, x i (t))

Example x 1, x 2, x 3, x 4, x 5 radius 0 weight = 1 radius 1 weight = 1 radius 2 weight = -.5 all other radii weights = 0 s = (0.5 1.5 0) f(x) = 0 if x < 0, x if 0 <= x <= 2, 2 otherwise

Example x(0) = (0.5 1.5 1) x 1 (1) = 1(0) + 1(.5) -.5(1) = 0 x 2 (1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25 x 3 (1) = -.5(0) + 1(.5) + 1(1) + 1(.5) -.5(0) = 2.0 x 4 (1) = 1.25 x 5 (1) = 0

Why the name? Plot x(0) vs. x(1) x 1 x 2 x 3 x 4 x 5 210210

Hamming Net Lippman, 1987 Maximum likelihood classifier The similarity of 2 vectors is taken to be n – H(v 1, v 2 ) where H is the Hamming distance Uses MaxNet with similarity metric

Architecture Concrete example: x1x1 x2x2 x3x3 y2y2 y1y1 MaxNet

Algorithm 1.w ij = s i (j)/2 2.n is the dimensionality of a vector 3.y in.j =  x i w ij + (n/2) 4.select max(y in.j ) using MaxNet

Example Training examples: (1 1 1), (-1 -1 -1) n = 3 y in.1 = 1(.5) + 1(.5) + 1(.5) + 1.5 = 3 y in.2 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0 These last 2 quantities represent the Hamming distance They are then fed into MaxNet.

Kohonen Self-Organizing Maps Kohonen, 1989 Maps inputs onto one of m clusters Human brains seem to be able to self organize.

Architecture x1x1 ymym y1y1 xnxn

Neighborhoods Linear 3 2 1 # 1 2 3 Rectangular 2 2 2 2 2 2 1 1 1 2 2 1 # 1 2 2 1 1 1 2 2 2 2 2 2

Algorithm 1.initialize w ij 2.select topology of y i 3.select learning rate parameters 4. while stopping criteria not reached 5.for each input vector do 6.compute D(j) =  (w ij – x i ) 2 for each j

Algorithm. 7.select minimum D(j) 8.update neighborhood units w ij (new) = w ij (old) +  [x i – w ij (old)] 9.update  10.reduce radius of neighborhood at specified times

Example Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1 1) into two clusters  (0) =.6  (t+1) =.5 *  (t) random initial weights.2.8.6.4.5.7.9.3

Example Present (1 1 0 0) D(1) = (.2 – 1) 2 + (.6 – 1) 2 + (.5 – 0) 2 + (.9 – 0) 2 = 1.86 D(2) =.98 D(2) wins!

Example w i2 (new) = w i2 (old) +.6[x i – w i2 (old)].2.92 (bigger).6.76(bigger).5.28(smaller).9.12(smaller) This example assumes no neighborhood

Example After many epochs 01(1 1 0 0) -> category 2 0.5(0 0 0 1) -> category 1.50(1 0 0 0) -> category 2 10(0 0 1 1) -> category 1

Applications Grouping characters Travelling Salesperson Problem –Cluster units can be represented graphically by weight vectors –Linear neighborhoods can be used with the first and last cluster units connected

Learning Vector Quantization Kohonen, 1989 Supervised learning There can be several output units per class

Architecture Like Kohonen nets, but no topology for output units Each y i represents a known class x1x1 ymym y1y1 xnxn

Algorithm 1.Initialize the weights (first m training examples, random) 2.choose  3.while stopping criteria not reached do (number of iterations,  is very small) 4.for each training vector do

Algorithm 5.find minimum || x – w j || 6.if minimum is target class w j (new) = w j (old) +  [x – w j (old)] else w j (new) = w j (old) –  [x – w j (old)] 7.reduce 

Example (1 1 -1 -1) belongs to category 1 (-1 -1 -1 1) belongs to category 2 (-1 -1 1 1) belongs to category 2 (1 -1 -1 -1) belongs to category 1 (-1 1 1 -1) belongs to category 2 2 output units, y 1 represents category 1 and y 2 represents category 2

Example Initial weights (where did these come from? 1-1 1-1 -1-1 -11  =.1

Example Present training example 3, (-1 -1 1 1). It belongs to category 2. D(1) = 16 = (1 + 1) 2 + (1 + 1) 2 + (-1 -1) 2 + (-1-1) 2 D(2) = 4 Category 2 wins. That is correct!

Example w2(new) = (-1 -1 -1 1) +.1[(-1 -1 1 1) - (-1 -1 -1 1)] = (-1 -1 -.8 1)

Issues How many y i should be used? How should we choose the class that each y i should represent? LVQ2, LVQ3 are enhancements to LVQ that modify the runner-up sometimes

Counterpropagation Hecht-Nielsen, 1987 There are input, output, and clustering layers Can be used to compress data Can be used to approximate functions Can be used to associate patterns

Stages Stage 1: Cluster input vectors Stage 2: Adapt weights from cluster units to output units

Stage 1 Architecture x1x1 xnxn zpzp z1z1 ymym y1y1 w 11 v 11

Stage 2 Architecture x* 1 x* n zjzj y* m y* 1 t j1 v j1

Full Counterpropagation Stage 1 Algorithm 1.initialize weights,  2.while stopping criteria is false do 3.for each training vector pair do 4.minimize ||x – w j || + ||y – v j || w j (new) = w j (old) +  [x – w j (old)] v j (new) = v j (old) +  [y-v j (old)] 5.reduce 

Stage 2 Algorithm 1.while stopping criteria is false 2.for each training vector pair do 3.perform step 4 above 4.t j (new) = t j (old) +  [x – t j (old)] v j (new) = v j (old) +  [y – v j (old)]

Partial Example Approximate y = 1/x [0.1, 10.0] 1 x unit 1 y unit 10 z units 1 x* unit 1 y* unit

Partial Example v 11 =.11, w 11 = 9.0 v 12 =.14, w 12 = 7.0 … v 10,1 = 9.0, w 10,1 =.11 test.12, predict 9.0. In this example, the output weights will converge to the cluster weights.

Forward Only Counterpropagation Sometimes the function y = f(x) is not invertible. Architecture (only 1 z unit active) x1x1 xnxn zpzp z1z1 ymym y1y1

Stage 1 Algorithm 1.initialize weights,  (.1),  (.6) 2.while stopping criteria is false do 3.for each input vector do 4.find minimum || x – w|| w(new) = w(old) +  [x – w(old)] 5.reduce 

Stage 2 Algorithm 1.while stopping criteria is false do 2.for each training vector pair do 3.find minimum || x – w || w(new) = w(old) +  [x – w(old)] v(new) = v(old) +  [y – v(old)] 4.reduce  Note: interpolation is possible.

Example y = f(x) over [0.1, 10.0] 10 z i units After phase 1, z i = 0.5, 1.5, …, 9.5. After phase 2, z i = 5.5, 0.75, …, 0.1

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Similar presentations

Presentation on theme: "Introduction to Neural Networks John Paxton Montana State University Summer 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Similar presentations

Presentation on theme: "Introduction to Neural Networks John Paxton Montana State University Summer 2003."— Presentation transcript:

Similar presentations

About project

Feedback