Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Similar presentations


Presentation on theme: "Introduction to Neural Networks John Paxton Montana State University Summer 2003."— Presentation transcript:

1 Introduction to Neural Networks John Paxton Montana State University Summer 2003

2 Chapter 4: Competition Force a decision (yes, no, maybe) to be made. Winner take all is a common approach. Kohonen learning w j (new) = w j (old) +  (x – w j (old)) w j is closest weight vector, determined by Euclidean distance.

3 MaxNet Lippman, 1987 Fixed-weight competitive net. Activation function f(x) = x if x > 0, else 0. Architecture a1a1 a2a2 -- 1 1

4 Algorithm 1.w ij = 1 if i = j, otherwise –  2.a j (0) = s i, t = 0. 3.a j (t+1) = f[a j (t) –  *  k<>j a k (t)] 4.go to step 3 if more than one node has a non-zero activation Special Case: More than one node has the same maximum activation.

5 Example s 1 =.5, s 2 =.1,  =.1 a 1 (0) =.5, a 2 (0) =.1 a 1 (1) =.49, a 2 (1) =.05 a 1 (2) =.485, a 2 (2) =.001 a 1 (3) =.4849, a 2 (3) = 0

6 Mexican Hat Kohonen, 1989 Contrast enhancement Architecture (w 0, w 1, w 2, w 3 ) w 0 (x i -> x i ), w 1 (x i+1 -> x i and x i-1 ->x i ) x i-3 x i-2 x i-1 xixi x i+1 x i+2 x i+3 0 - + + + - 0

7 Algorithm 1.initialize weights 2.x i (0) = s i 3.for some number of steps do 4.x i (t+1) = f [  w k x i+k (t) ] 5.x i (t+1) = max(0, x i (t))

8 Example x 1, x 2, x 3, x 4, x 5 radius 0 weight = 1 radius 1 weight = 1 radius 2 weight = -.5 all other radii weights = 0 s = (0.5 1.5 0) f(x) = 0 if x < 0, x if 0 <= x <= 2, 2 otherwise

9 Example x(0) = (0.5 1.5 1) x 1 (1) = 1(0) + 1(.5) -.5(1) = 0 x 2 (1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25 x 3 (1) = -.5(0) + 1(.5) + 1(1) + 1(.5) -.5(0) = 2.0 x 4 (1) = 1.25 x 5 (1) = 0

10 Why the name? Plot x(0) vs. x(1) x 1 x 2 x 3 x 4 x 5 210210

11 Hamming Net Lippman, 1987 Maximum likelihood classifier The similarity of 2 vectors is taken to be n – H(v 1, v 2 ) where H is the Hamming distance Uses MaxNet with similarity metric

12 Architecture Concrete example: x1x1 x2x2 x3x3 y2y2 y1y1 MaxNet

13 Algorithm 1.w ij = s i (j)/2 2.n is the dimensionality of a vector 3.y in.j =  x i w ij + (n/2) 4.select max(y in.j ) using MaxNet

14 Example Training examples: (1 1 1), (-1 -1 -1) n = 3 y in.1 = 1(.5) + 1(.5) + 1(.5) + 1.5 = 3 y in.2 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0 These last 2 quantities represent the Hamming distance They are then fed into MaxNet.

15 Kohonen Self-Organizing Maps Kohonen, 1989 Maps inputs onto one of m clusters Human brains seem to be able to self organize.

16 Architecture x1x1 ymym y1y1 xnxn

17 Neighborhoods Linear 3 2 1 # 1 2 3 Rectangular 2 2 2 2 2 2 1 1 1 2 2 1 # 1 2 2 1 1 1 2 2 2 2 2 2

18 Algorithm 1.initialize w ij 2.select topology of y i 3.select learning rate parameters 4. while stopping criteria not reached 5.for each input vector do 6.compute D(j) =  (w ij – x i ) 2 for each j

19 Algorithm. 7.select minimum D(j) 8.update neighborhood units w ij (new) = w ij (old) +  [x i – w ij (old)] 9.update  10.reduce radius of neighborhood at specified times

20 Example Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1 1) into two clusters  (0) =.6  (t+1) =.5 *  (t) random initial weights.2.8.6.4.5.7.9.3

21 Example Present (1 1 0 0) D(1) = (.2 – 1) 2 + (.6 – 1) 2 + (.5 – 0) 2 + (.9 – 0) 2 = 1.86 D(2) =.98 D(2) wins!

22 Example w i2 (new) = w i2 (old) +.6[x i – w i2 (old)].2.92 (bigger).6.76(bigger).5.28(smaller).9.12(smaller) This example assumes no neighborhood

23 Example After many epochs 01(1 1 0 0) -> category 2 0.5(0 0 0 1) -> category 1.50(1 0 0 0) -> category 2 10(0 0 1 1) -> category 1

24 Applications Grouping characters Travelling Salesperson Problem –Cluster units can be represented graphically by weight vectors –Linear neighborhoods can be used with the first and last cluster units connected

25 Learning Vector Quantization Kohonen, 1989 Supervised learning There can be several output units per class

26 Architecture Like Kohonen nets, but no topology for output units Each y i represents a known class x1x1 ymym y1y1 xnxn

27 Algorithm 1.Initialize the weights (first m training examples, random) 2.choose  3.while stopping criteria not reached do (number of iterations,  is very small) 4.for each training vector do

28 Algorithm 5.find minimum || x – w j || 6.if minimum is target class w j (new) = w j (old) +  [x – w j (old)] else w j (new) = w j (old) –  [x – w j (old)] 7.reduce 

29 Example (1 1 -1 -1) belongs to category 1 (-1 -1 -1 1) belongs to category 2 (-1 -1 1 1) belongs to category 2 (1 -1 -1 -1) belongs to category 1 (-1 1 1 -1) belongs to category 2 2 output units, y 1 represents category 1 and y 2 represents category 2

30 Example Initial weights (where did these come from? 1-1 1-1 -1-1 -11  =.1

31 Example Present training example 3, (-1 -1 1 1). It belongs to category 2. D(1) = 16 = (1 + 1) 2 + (1 + 1) 2 + (-1 -1) 2 + (-1-1) 2 D(2) = 4 Category 2 wins. That is correct!

32 Example w2(new) = (-1 -1 -1 1) +.1[(-1 -1 1 1) - (-1 -1 -1 1)] = (-1 -1 -.8 1)

33 Issues How many y i should be used? How should we choose the class that each y i should represent? LVQ2, LVQ3 are enhancements to LVQ that modify the runner-up sometimes

34 Counterpropagation Hecht-Nielsen, 1987 There are input, output, and clustering layers Can be used to compress data Can be used to approximate functions Can be used to associate patterns

35 Stages Stage 1: Cluster input vectors Stage 2: Adapt weights from cluster units to output units

36 Stage 1 Architecture x1x1 xnxn zpzp z1z1 ymym y1y1 w 11 v 11

37 Stage 2 Architecture x* 1 x* n zjzj y* m y* 1 t j1 v j1

38 Full Counterpropagation Stage 1 Algorithm 1.initialize weights,  2.while stopping criteria is false do 3.for each training vector pair do 4.minimize ||x – w j || + ||y – v j || w j (new) = w j (old) +  [x – w j (old)] v j (new) = v j (old) +  [y-v j (old)] 5.reduce 

39 Stage 2 Algorithm 1.while stopping criteria is false 2.for each training vector pair do 3.perform step 4 above 4.t j (new) = t j (old) +  [x – t j (old)] v j (new) = v j (old) +  [y – v j (old)]

40 Partial Example Approximate y = 1/x [0.1, 10.0] 1 x unit 1 y unit 10 z units 1 x* unit 1 y* unit

41 Partial Example v 11 =.11, w 11 = 9.0 v 12 =.14, w 12 = 7.0 … v 10,1 = 9.0, w 10,1 =.11 test.12, predict 9.0. In this example, the output weights will converge to the cluster weights.

42 Forward Only Counterpropagation Sometimes the function y = f(x) is not invertible. Architecture (only 1 z unit active) x1x1 xnxn zpzp z1z1 ymym y1y1

43 Stage 1 Algorithm 1.initialize weights,  (.1),  (.6) 2.while stopping criteria is false do 3.for each input vector do 4.find minimum || x – w|| w(new) = w(old) +  [x – w(old)] 5.reduce 

44 Stage 2 Algorithm 1.while stopping criteria is false do 2.for each training vector pair do 3.find minimum || x – w || w(new) = w(old) +  [x – w(old)] v(new) = v(old) +  [y – v(old)] 4.reduce  Note: interpolation is possible.

45 Example y = f(x) over [0.1, 10.0] 10 z i units After phase 1, z i = 0.5, 1.5, …, 9.5. After phase 2, z i = 5.5, 0.75, …, 0.1


Download ppt "Introduction to Neural Networks John Paxton Montana State University Summer 2003."

Similar presentations


Ads by Google