Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Similar presentations


Presentation on theme: "Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms."— Presentation transcript:

1 Artificial Neural Networks

2 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms & the -Rule Multi Layer Nets The Backpropagation Algorithm Example Application: Recognition of Faces More Network Architectures Application Areas of ANNs

3 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Model: The Brain A complex learning system with simple learning units: the neurons. A network of ~ neurons where each of the neurons has ~ connections. Transmission time of a neuron: ~ (speed versus flexibility) Observation: face recognition time = ~ parallelism.

4 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Goals of ANNs Learning instead of programming Learning complex functions with simple learning units Parallel computation (e.g. layer model) Network parameter shall be automatically found by a learning algorithm An ANN black box. output inpu t

5 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning When are ANNs used? Input instances are described as a vector of discrete or real values The output of a target function is a single value or a vector of discrete or real valued attributes Input data contains noise Target function unknown or difficult to describe output input

6 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Perceptron (as a NN Unit) (1/2) A linear unit with threshold.

7 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Perceptron (as a NN Unit) (2/2)

8 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Geometrical Classification (Decision Surface) A perceptron can classify only linear separable training data. We need networks of these units linear separable Ex. OR-Function not linear separable Ex. XOR-Function

9 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Perceptron Learning Rule (1/2) Training of a perceptron = Learning the best hypothesis, which classifies all training data A hypothesis = a vector of weights

10 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Perceptron Learning Rule (2/2) Idea: 1. Initialise the weights with random values 2. Apply the perceptron iterative to each training example and modify the weights according to the learning rule where: t : target output o: actual output : the learning rate 3. Step 2 is repeated for all training examples until all of them are correctly classified.

11 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Perceptron-Learning Rule: Convergence The perceptron learning rule converges if: The training examples are linear separable and is chosen small enough (e.g. 0.1). Intuitive explanation:

12 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Gradient Descend Algorithm & the -Rule (1/5) Better: the -Rule converges even if the training examples are not linear separable. Idea: Use the gradient descend algorithm to search for the best hypothesis in hypothesis space. The best hypothesis is the one which maximally minimises the square error. Basis of the backpropagation algorithm.

13 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Gradient Descend Algorithm & the -Rule (2/5) Because of steadiness the -learning rule is applied on a linear unit instead of on the perceptron. Linear unit: The square error to be minimised:,where: 1 D: set of training examples : target output of example d : computed output of example d

14 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Geometric Interpretation: H-Space, error function (e.g. 2-dimensional). The Gradient Descend Algorithm & the -Rule (3/5)

15 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Gradient Descend Algorithm & the -Rule (4/5) Gradient: Learning Rule: Derivation

16 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Standard methode: Do until termination criterion is satisfied 1. Initialise 2. For all Compute o For all 3. For all The Gradient Descend Algorithm & the -Rule (5/5)

17 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The -rule Stochastic methode: Do until termination criterion is satisfied 1. Initialise 2. For all Compute o For all the Rule

18 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Remarks Advantages of the stochastic approximation of the gradient: quicker convergence (incremental update of the weights). less likely to stuck in a local minimum.

19 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Remarks Single perceptrons learn only linear separable training data. We need multi layer networks of several 'neurons'. Example: the XOR problem: 0.5 x1 x x not linear separable x2

20 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning XOR-Function

21 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Supervised Learning Backpropagation NN Since 1985 the BP algorithm has become one of the widely spread and successful learning algorithms for NNs. Idea: The minimum of the error function of a learning function is searched by descending in direction of the gradient. The vector of weights which minimises the error in the network is seen as the solution of the learning problem. So the gradient of the error function must exist for all points inside the weight space. must be differentiable

22 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Learning in Backpropagation Networks The sigmoid unit: Properties of the sigmoid unit: with

23 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Definitions used by the BP Algorithm Hidden units Output units Input units Backpropagation i j : input from node i to unit j : weight of the j th input to unit I outputs: set of output units : output of unit i : target output of unit i : error term of unit n

24 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning The Backpropagation Algorithm Initialise all weights to small random numbers Until termination criterion satisfied do For each training example do 1. Compute the network's outcome 2. For each output unit k 3. For each hidden unit h 4. Update each network weight where

25 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Derivation of the BP Algorithm For each training example d: where i j Output units Input units Hidden units ( weighted sum of inputs for unit j) withand

26 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Derivation of the BP Algorithm Output layer: Hidden layer: Downstream(j): the set of units whose immediate inputs include the output of unit j And therefore

27 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Derivation of the BP Algorithm (Explanation)

28 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Convergence of the BP Algorithm Generalisation to arbitrary acyclic directed network architectures is simple. In practice it works well, but it sometimes sticks in a local but not always global minimum introduction of a momentum ( escape routes ) : Disadvantage: global minima can be left out by this jumping ! Training can take thousands of iterations slow (accelerated by momentum). Over-fitting versus adaptability of the NN.

29 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Example: Recognition of Faces Given: 32 photos of 20 persons, in different positions: Direction of view: right, left, up or straight. With and without sunglasses. Expression: happy, sad, neutral...

30 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Goal: Classification of the photos concerning the direction of view Preparation of the input: Rastering the photos acceleration of the learning process Input vector = the grayscale values of the 30 * 32 pixels. Output vector = (left, straight, right, up). Solution = max(left, right, up, straight). e.g. o = (0.9, 0.1, 0.1, 0.1) looking to the left Example: Recognition of Faces

31 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Recognition of the direction of view

32 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Recurrent Neural Networks They are directed cyclic networks with memory Outputs at time t = Inputs at time t+1 The cycles allow to feed results back into the network. (+) They are more expressive than acyclic networks (-) Training of recurrent networks is expensive. In some cases recurrent networks can be trained using a variant of the Backpropagation algorithm. Example: Forecast of the next stock market prices y(t+1), based on the current indicator x(t) and the last indicator x(t-1).

33 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning x(t-1) b x(t) c(t) y(t+1) x(t) x(t-2) c(t) c(t-1) c(t-2) Feedforward network Recurrent network Recurrent network (unfolded in time) Recurrent NNs


Download ppt "Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms."

Similar presentations


Ads by Google