Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Network Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

Similar presentations


Presentation on theme: "Neural Network Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University."— Presentation transcript:

1

2 Neural Network Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University

3 TRU-COMP3710 Intro to Machine Learning2 Course Outline Part I – Introduction to Artificial Intelligence Part II – Classical Artificial Intelligence Part III – Machine Learning Introduction to Machine Learning Neural Networks Probabilistic Reasoning and Bayesian Belief Networks Artificial Life: Learning through Emergent Behavior Part IV – Advanced Topics

4 A new sort of computer What are (everyday) computer systems good at... and not so good at? How about humans? Good atNot so good at Rule-based systems: doing what the programmer wants them to do Dealing with noisy data Dealing with unknown environment data Massive parallelism Fault tolerance Adapting to circumstances

5 Neural Network4 Reference Artificial Intelligence Illuminated, Ben Coppin, Jones and Bartlett Illuminated Series Many tutorials from the Internet Neural Networks with Java http://fbim.fh-regensburg.de/~saj39122/jfroehl/diplom/e-main.html Joone – Java Object Oriented Neural Engine http://www.jooneworld.com./ http://www.ai-junkie.com/ai-junkie.html http://www.willamette.edu/~gorr/classes/cs449/intro.html http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html …

6 Neural Network5 Learning Outcomes Train a perceptron with a training data set. Update the weights in a feed-forward network with error backpropagation. Update the weights in a feed-forward network with error backpropagation and delta rule. Implement a feed-forward network for a simple pattern recognition application. List the two examples of recurrent network. List applications of Hopfield netorks, Bidirectional Associative Memories and Kohonen Maps. Update the weights in a Hebbian network.

7 Neural Network6 Chapter Contents 1. Biological Neurons Biological Neurons 2. How to model biological neurons – artificial neurons How to model biological neurons – artificial neurons 3. The first neural network – Perceptrons The first neural network – Perceptrons 4. How to overcome the problems in Perceptron networks – Multilayer Neural Networks How to overcome the problems in Perceptron networks – Multilayer Neural Networks Feed-forward, Backpropagation, Backpropagation with Delta Rule 5. Can an ANN remember? – Recurrent Networks Can an ANN remember? – Recurrent Networks Hopfield Networks Bidirectional Associative Memories How to learn without using training data set 6. Kohonen Maps Kohonen Maps 7. Hebbian Learning Hebbian Learning 8. Fuzzy Neural Networks Fuzzy Neural Networks 9. Evolving Neural Networks Evolving Neural Networks

8 Neural Network7 1. Biological Neurons The human brain is made up of about 100 billions of simple processing units – neurons. -> parallelism; emergent behavior;... Inputs are received on dendrites, and if the input levels are over a threshold, the neuron fires, passing a signal through the axon to the synapse which then connects to another neuron. The human brain has a property known as plasticity, which means that neurons can change the nature and number of their connections to other neurons in response to events that occur. In this way, the brain is able to learn. [Q] How to model? Topics

9 Neural Network8 2. Artificial Neurons Artificial neurons are based on biological neurons. McCulloch and Pitts (1943) Each neuron in the network receives one or more inputs. An activation function is applied to the inputs, which determines the output of the neuron – the activation level. Weights associated to synapses

10 Neural Network9 Three typical activation functions.

11 Neural Network10 A typical activation function, called step function, works as follows: Each previous node i has a weight, w i associated with it. The sum of all the weights is 1. The input from previous node i is x i. t is the threshold. So if the weighted sum of the inputs to the neuron is above the threshold, then the neuron fires. Y x1:w1x1:w1 xn:wnxn:wn

12 Neural Network11 There is no central processing or control mechanism The entire network is involved in every piece of computation that takes place. The processing time in each artificial neuron is small, not like real biological neuron. Parallelism of massive number of neurons The weight associated with each connection (equivalent to a synapse in the biological brain) can be changed in response to particular sets of inputs and events. In this way, an artificial neural network is able to learn. [Q] How and when to change the weights? Topics

13 Neural Network12 3. Perceptrons A perceptron is a single neuron that classifies a set of inputs into one of two categories (usually 1 or -1). Rosenblatt (1958)

14 Neural Network13 A perceptron is a single neuron that classifies a set of inputs into one of two categories (usually 1 or -1). Rosenblatt (1958) If the inputs are in the form of a grid, a perceptron can be used to recognize visual images of shapes. The perceptron usually uses a step function, which returns 1 if the weighted sum of inputs exceeds a threshold, and 0 otherwise.

15 Neural Network14 The perceptron is trained as follows: First, random weights (usually between –0.5 and 0.5) are given. An item of training data is presented. Output classification is observed. If the perceptron mis-classifies it, the weights are modified according to the following: a is the learning rate, between 0 and 1. e is the size of the error Perceptron training rule for e 0 if the output is correct Positive if the output is too low Negative if the output is too high The train continues until all errors become zero; Then weights are not changed anymore.

16 Neural Network15 Example of logical OR for two inputs, with t = 0 and a = 0.2 Random initial weights The training data Expected output is 0 (= 0 OR 0). Output is correct. The weights are not changed.

17 Neural Network16 For the next training data

18 Neural Network17 Epochx1x1 x2x2 Expected YActual YErrorw1w1 w2w2 100000-0.20.4 101110-0.20.4 1101010 1111100 2000000 2011100 2101010.20.4 2111100.20.4 3000000.20.4 3011100.20.4 3101100.20.4 3111100.20.4

19 Neural Network18 [Q] Can you test the previous system with 0.1, 1 -0.7, 0.3 0, -0.2 [Q] Can you develop a system for the Boolean AND operation? [Q] What is learning in Perceptrons ?

20 Neural Network19 Perceptrons can only classify linearly separable functions. (AND and OR) The first of the following graphs shows a linearly separable function (OR). The second is not linearly separable (Exclusive-OR). Demo – Perceptron Learning Applet [Q] How to improve? Topics

21 Neural Network20 4. Multilayer Neural Networks Multilayer neural networks can classify a range of functions, including non linearly separable ones. Each input layer neuron connects to all neurons in the hidden layer (or layers). The neurons in the hidden layer connect to all neurons in the output layer. A feed-forward network [Q] How to train? Not cellular automata Neurons work synchronously

22 Neural Network21 Demo – XOR, AND, OR XOR can be written: y = (x 1   x 2 )  (  x 1  x 2 ) Therefore, y = (x 1  x 2 )   (x 1  x 2 ) Split: y 1 = x 1  x 2 y 2 =  (x 1  x 2 ) y = y 1  y 2

23 Neural Network22 Backpropagation Multilayer neural networks learn in the same way as Perceptrons. But it take a too long time to train. [Q] Why? There are many more weights, and it is important to assign credit (or blame) correctly when changing weights. Backpropagation networks use the sigmoid activation function, not a step function, as it is easy to differentiate:

24 Neural Network23 For node j, X j is the input y j is the output n is the number of inputs to node j  j is the threshold for j After values are fed forward through the network, errors are fed back to modify the weights in order to train the network. For each node, we calculate an error gradient. j w jk w ij k yjyj yjyj xixi

25 Neural Network24 For a node k in the output layer, the error e k is the difference between the desired output and the actual output. The error gradient for k is: Now the weights are updated as follows: Similarly, for a node j in the hidden layer:  is the learning rate, (a positive number below 1) Known as gradient descent j w jk w ij k yjyj yjyj xixi

26 Neural Network25 j w jk w ij k yjyj yjyj xixi l w jl ykyk ylyl

27 Neural Network26 Not likely appear to occur in the human brain Too slow learning. With some simple problems it can take hundreds or even thousands of epochs to reach a satisfactorily low level of error. [Q] Why? Weights are changed too easily. Can we use a sort of the idea of Simulated Annealing? [Q] How to improve?

28 Neural Network27 Backpropagation with Delta Rule Generalized delta rule: Inclusion of momentum, the extent to which a weight was changed on the previous iteration Hyperbolic tangent instead of sigmoid activation function …

29 Neural Network28 Backpropagation Advantages: It works! Relatively fast Downsides: Requires a training set Can be slow Probably not biologically realistic Alternatives to Backpropagation: Hebbian learning Not successful in feed-forward nets Reinforcement learning Only limited success Artificial evolution More general, but can be even slower than backpropagation.

30 Example: Voice Recognition Task: Learn to discriminate between two different voices saying “Hello” [Q] How to implement? Data Sources Steve Simpson David Raubenheimer Format Frequency distribution (60 bins)

31 Network architecture Feed forward network 60 inputs (one for each frequency bin) 6 hidden nodes 2 outputs (0-1 for “Steve”, 1-0 for “David”) [Q] What is the total number of nodes?

32 Presenting the data Steve David

33 Presenting the data (untrained network) Steve David 0.43 0.26 0.73 0.55

34 Calculate error Steve David |0.43 – 0| = 0.43 |0.26 –1|= 0.74 |0.73 – 1|= 0.27 |0.55 – 0|= 0.55

35 Backpropagation error and adjust weights Steve David 0.43 – 0 = 0.43 0.26 – 1= 0.74 0.73 – 1= 0.27 0.55 – 0= 0.55 1.17 (overall error) 0.82

36 Repeat process (sweep) for all training pairs Present data Calculate error Backpropagation error Adjust weights Repeat process multiple times

37 Presenting the data (trained network) Steve David 0.01 0.99 0.01

38 Results – Voice Recognition Performance of trained network Discrimination accuracy between known “Hello”s 100% Discrimination accuracy between new “Hello”’s 100%

39 Results – Voice Recognition (ctnd.) Network has learnt to generalise from original data Networks with different weight settings can have same functionality Trained networks ‘concentrate’ on lower frequencies Network is robust against non-functioning nodes

40 Applications of Feed-forward nets Pattern recognition Character recognition Face Recognition Sonar mine/rock recognition (Gorman & Sejnowksi, 1988) Navigation of a car (Pomerleau, 1989) Stock-market prediction Pronunciation (NETtalk) (Sejnowksi & Rosenberg, 1987) Topics

41 Neural Network40 5. Recurrent Networks Feed-forward networks do not have memory. [Q] What does it mean? Acyclic Once a feed-forward network is trained, its state is fixed and does not alter as new input data is presented to it. Recurrent networks, or also called feedback networks, can have arbitrary connections between nodes in any layer, even backward from output nodes to input nodes. The internal state can alter as sets of input data are presented to it –> a memory. [Q] What does it mean? Or called memory units

42 Neural Network41 Biological nervous systems show high levels of recurrency (but feed- forward structures exist too). Recurrent networks can be used to solve problems where the solution depends on previous inputs as well as current inputs (e.g. predicting stock market movements). Inputs are fed through the network, including feeding data back from outputs to inputs, and repeat this process until the values of the outputs do not change – a state of equilibrium or stability. The stable values of the network are called as fundamental memories. But, it is not always the case that a recurrent network reaches a stable state. Recurrent networks are also called attractor networks.

43 Neural Network42 A Hopfield Network is a recurrent network. Use a sign activation function: If a neuron receives a 0 as an input it does not change state – in other words, it continues to output the same previous value. Weights are usually represented as matrices. Hopfield Networks

44 Neural Network43 Three states to learn: [Q] How many input values? Then, the learning step is The three states will be the stable states for the network. ?

45 Neural Network44 Output vector

46 Neural Network45 Output vector Let Then, Y 4 ? Let Then Applied again, then. [Q] What does this mean?

47 Neural Network46 Three steps Training weights with the attractor states as inputs– a storage or memorization stage Testing Using the network in acting as a memory to retrieve data from its memory The network is trained to represent a set of attractors, or stable states. Any input usually will be mapped to an output state that is the attractor closest to the input. The measure of distance is the Hamming distance that measures the number of elements of the vectors that differ Hence, the Hopfield network is a memory that usually maps an input vector to the memorized vector whose Hamming distance from the input vector is least. In fact, not always converge to the state closest to the original input.

48 Neural Network47 [Q] Applications of Hopfield networks? Pattern recognition [Q] How? Demo Pattern recognition: http://www.cbu.edu/~pong/ai/hopfield/hopfieldapplet.html http://www.cbu.edu/~pong/ai/hopfield/hopfieldapplet.html [Q] How to implement the above demo? [Q] How is the pattern recognition using Hopfield networks different from the one that uses the backpropagation network in the seminar?

49 Neural Network48 A Hopfield network is autoassociative – it can only associate an item with itself or a similar one. However, the human brain is fully associative, or heteroassociative, which means one item is able to cause the brain to recall an entirely different item. E.g., Fall makes me think of autumn color. [Q] How to improve Hopfield networks?

50 Neural Network49 Bidirectional Associative Memories A BAM (Bidirectional Associative Memory) is a heteroassociative memory: Like the brain, it can learn to associate one item from one set with another completely unrelated item in another set. Similar in structure to the Hopfield network The network consists of two fully connected layers of nodes. Every node in one layer is connected to every node in the other layer, not to any node in the same layer. In a Hopfield network, there is only one layer and all nodes are interconnected. The BAM is guaranteed to produce a stable output for any given inputs and for any training data.

51 Neural Network50 The BAM uses a sign activation function. Two sets of data are to be learned, so that when an item from set X is presented to the network, it will recall a corresponding item from set Y. The weights matrix is

52 Neural Network51 [Q] Can you prove it?

53 Neural Network52 [Q] Applications of BAM ? Pattern recognition Demo Pattern recognition: http://www.cbu.edu/~pong/ai/bam/bamapplet.htmlhttp://www.cbu.edu/~pong/ai/bam/bamapplet.html Topics

54 Neural Network53 6. Kohonen Maps Also called SOM (Self-Organizing Feature Map)SOM An unsupervised learning system. There is no training. Finding the natural structure inherent in the input data The objective of a Kohonen network is to map input vectors (patterns) of arbitrary dimension N onto a discrete map with 1, 2 or 3 dimensions. Patterns close to one another in the input space should be close to one another in the map: they should be topologically ordered.

55 Neural Network54 Two layers of nodes: an input layer and a cluster (output) layer. Uses competitive learning, using winner-take-all: Two layers Input and cluster The number of input nodes is equal to the size of input vectors. The number of cluster nodes is equal to the number of clusters to find. Input node is connected to every node in the cluster node. Every input is compared with the weight vectors of each node in the cluster node. In the cluster layer, the node that most closely matches the input fires. The node is called the winner. This is the clustering of the input. Euclidean distance is used. The winning node has its weight vector modified to be closer to the input vector.

56 Neural Network55 Learning process: initialize the weights for each cluster unit loop until weight changes are negligible for each input pattern present the input pattern find the winning cluster unit (i.e., the most similar one to the input pattern) find all units in the neighborhood of the winner update the weight vectors for all those units reduce the size of neighborhoods if required

57 Neural Network56 The cluster node for which d j is the smallest is the winner Update of the weights of the winner => The weights become similar to the input, i.e., the winner becomes similar to the input. The learning rate decreases over time. In fact, a neighborhood of neurons around the winner are usually updated together. The radius defining the neighbor decreases over time. The training phase terminates when the modification of weights becomes very small.

58 Neural Network57 It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character.K-means High dimensional data to 3D -> Applications? Clustering Dimension reduction; visualization of high dimensional data Demo http://www.superstable.net/sketches/som/ Topics

59 Neural Network58 7. Hebbian Learning Hebb’s law: “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased”. Hence, if two neurons that are connected together fire at the same time, the weights of the connection between them is strengthened. Conversely, if the neurons fire at different times, the weight of the connection between them is decreased. i j w ij xixi yjyj

60 Neural Network59 The activity product rule is used to modify the weights of a connection between two nodes i and j that fire at the same time: where  is the learning rate; x i is the input to node j from node i and y j is the output of node j. Hebbian networks usually also use a forgetting factor, which decreases the weight of the connection between the two nodes if they fire at different times. i j w ij xixi yjyj

61 Neural Network60 More interesting demo BrainyAliens Topics

62 Neural Network61 8. Fuzzy Neural Networks Weights are fuzzy sets Takagi-Sugeno fuzzy rules Topics

63 Neural Network62 9. Evolving Neural Networks Neural networks can be susceptible to local maxima. Evolutionary methods (genetic algorithms) can be used to determine the starting weights for a neural network, thus avoiding these kinds of problems.

64 Neural Network63 Other Learning Methods Clustering The K-Means The Fuzzy C-Means Classification Naïve Bayes classifier Support vector machine Decision tree Reinforcement learning … I will be back! Topics


Download ppt "Neural Network Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University."

Similar presentations


Ads by Google