Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at Neural Networks Intelligent Systems – Lecture 13.

Similar presentations


Presentation on theme: "Www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at Neural Networks Intelligent Systems – Lecture 13."— Presentation transcript:

1 www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at Neural Networks Intelligent Systems – Lecture 13

2 www.sti-innsbruck.at 1)Motivation 1)Technical solution Definitions Explanations Procedures Illustrations by short examples 2)Illustration by a larger example 3)Extensions (Work only sketched)

3 www.sti-innsbruck.at Motivation A main motivation behind neural networks is the fact that symbolic rules do not reflect reasoning processes performed by humans. Biological neural systems can capture highly parallel computations based on representations that are distributed over many neurons. They learn and generalize from training data; no need for programming it all... They are very noise tolerant – better resistance than symbolic systems. In summary: neural networks can do whatever symbolic or logic systems can do, and more; in practice it is not that obvious however.

4 www.sti-innsbruck.at Motivation Neural networks are stong in: –Learning from a set of examples –Optimizing solutions via constraints and cost functions –Classification: grouping elements in classes –Speech recognition, pattern matching –Non-parametric statistical analysis and regressions

5 www.sti-innsbruck.at Introduction: What are Neural Networks? Neural networks are networks of neurons as in the real biological brain. Neurons are highly specialized cells that transmit impulses within animals to cause a change in a target cell such as a muscle effector cell or glandular cell. The axon, is the primary conduit through which the neuron transmits impulses to neurons downstream in the signal chain Humans: 10 11 neurons of > 20 types, 10 14 synapses, 1ms- 10ms cycle time Signals are noisy “spike trains” of electrical potential

6 www.sti-innsbruck.at Introduction: What are Neural Networks? What we refer to Neural Networks in the course are thus mostly Artificial Neural Networks (ANN). ANN are approximation of biological neural networks and are built of physical devices, or simulated on computers. ANN are parallel computational entities that consist of multiple simple processing units that are connected in specific ways in order to perform the desired tasks. Remember: ANN are computationally primitive approximations of the real biological brains.

7 www.sti-innsbruck.at Comparison & Contrast Neural networks vs. classical symbolic computing 1.Sub-symbolic vs. Symbolic 2.Non-modular vs. Modular 3.Distributed Representation vs. Localist representation 4.Bottom-up vs. Top Down 5.Parallel processing vs. Sequential processing In reality however, it can be observed that the distinctions become increasingly less obvious!

8 www.sti-innsbruck.at McCulloch-Pitts „Unit“ – Artificial Neurons Output is a „squashed“ linear function of the inputs: A gross oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do.

9 www.sti-innsbruck.at Activation Functions (a) is a step function or threshold function (b) is a sigmoid function 1/(1+e -x ) Changing the bias weight w 0,i moves the threshold location

10 www.sti-innsbruck.at Perceptron McCulloch-Pitts neurons can be connected together in any desired way to build an artificial neural network. A construct of one input layer of neurons that feed forward to one output layer of neurons is called Perceptron. Figure of simple perceptron

11 www.sti-innsbruck.at Example: Logical Functions McCulloch and Pitts: every boolean function can be implemented with a artificial neuron.

12 www.sti-innsbruck.at Example: Finding Weights for AND Operation

13 www.sti-innsbruck.at Neural Network Structures Mathematically artificial neural networks are represented by weighted directed graphs. In more practical terms, a neural network has activations flowing between processing units via one-way connections. There are three common artificial neural network architectures known: –Single-Layer Feed-Forward (Perceptron) –Multi-Layer Feed-Forward –Recurrent Neural Network

14 www.sti-innsbruck.at Single-Layer Feed-Forward A Single-Layer Feed-Forward Structure is a simple perceptron, and has thus –one input layer –one output layer –NO feed-back connections Feed-forward networks implement functions, have no internal state (of course also valid for multi-layer perceptrons).

15 www.sti-innsbruck.at Multi-Layer Feed-Forward Multi-Layer Feed-Forward Structures have: –one input layer –one output layer –one or MORE hidden layers of processing units The hidden layers are between the input and the output layer, and thus hidden from the outside world: no input from the world, not output to the world.

16 www.sti-innsbruck.at Recurrent Network Recurrent networks have at least one feedback connection: –They have thus directed cycles with delays: they have internal state (like flip flops), can oscillate, etc. –The response to an input depends on the initial state which may depend on previous inputs – can model short-time memory –Hopfield networks have symmetric weights (W ij = W ji ) –Boltzmann machines use stochastic activation functions, ≈ MCMC in Bayes nets

17 www.sti-innsbruck.at Feed-forward example Feed-forward network = a parameterized family of nonlinear functions: Adjust weights changes the function: this is where learning occurs!

18 www.sti-innsbruck.at Single-layer feed-forward neural networks: perceptrons Output units all operate separately – no shared weights (the study can be limited to single output perceptrons!) Adjusting weights moves the location, orientation, and steepness of cliff

19 www.sti-innsbruck.at Expressiveness of perceptrons (1) Consider a perceptron with g = step function (Rosenblatt, 1957, 1960) – can model a Boolean function - classification Can represent AND, OR, NOT, majority (weights: n/2, 1, 1,...), etc. compactly, but not XOR Represents a linear separator in input space

20 www.sti-innsbruck.at Threshold perceptrons can represent only linearly separable functions (i.e. functions for which such a separation hyperplane exists) Limited expressivity, but there exists an algorithm that will fit a threshold perceptron to any linealy separable training set. x2x2 x1x1 0 0

21 www.sti-innsbruck.at Perceptron learning (1) Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is Perform optimization search by gradient descent Simple weight update rule –positive error ⇒ increase network output ⇒ increase weights on positive inputs, decrease on negative inputs Algorithm runs training examples through the net once at a time and adjusts weights after each example. Such a cycle is called epoch

22 www.sti-innsbruck.at Perceptron learning (2) Perceptron learning rule converges to a consistent function for any linearly separable data set Perceptron learns majority function easily, DTL is hopeless DTL learns restaurant function easily, perceptron cannot represent it

23 www.sti-innsbruck.at Multilayer perceptrons Layers are usually fully connected; numbers of hidden units typically chosen by hand Hidden layers enlarge the space of hypotheses that the network can represent Learning done by back-propagation algorithm  errors are back-propagated from the output layer to the hidden layers

24 www.sti-innsbruck.at Expressiveness of MLPs (1) 2 layers can represent all continuous functions 3 layers can represent all functions Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface The required number of hidden units grows exponentially with the number of inputs (2 n /n for all boolean functions)

25 www.sti-innsbruck.at Expressiveness of MLPs (2) The required number of hidden units grows exponentially with the number of inputs (2 n /n for all boolean functions) –more hidden units – more bumps –single, sufficiently large hidden layer – any continuous function of the inputs with arbitrary accuracy –two layers – discontinuous functions –for any particular network structure, harder to characterize exactly which functions can be represented and which ones cannot

26 www.sti-innsbruck.at Example MLP Build a hidden layer network for the restaurant problem (see lecture 7, Learning from observations) –10 attributes describing each example A multilayer neural network suitable for the restaurant problem: –one hidden layer with four hidden units –10 input units Restaurant problem: decide whether to wait for a table at a restaurant, based on the following attributes: Alternate: is there an alternative restaurant nearby? Bar: is there a comfortable bar area to wait in? Fri/Sat: is today Friday or Saturday? Hungry: are we hungry? Patrons: number of people in the restaurant (None, Some, Full) Price: price range ($, $$, $$$) Raining: is it raining outside? Reservation: have we made a reservation? Type: kind of restaurant (French, Italian, Thai, Burger) WaitEstimate: estimated waiting time (0-10, 10-30, 30- 60, >60)

27 www.sti-innsbruck.at Learning Learning is one of the most powerful features of neural networks Neurons adapt the weights and connections between neurons (the axons in the biological neural network) so that the final output activitation reflects the correct answer.

28 www.sti-innsbruck.at Back-propagation learning (1) Output layer: same as for single-layer perceptron, where Hidden layer: back-propagate the error from the output layer: Update rule for weights in hidden layer: (Most neuroscientists deny that back-propagation occurs in the brain)

29 www.sti-innsbruck.at Back-propagation learning (2) At each epoch, sum gradient updated for all examples Training curve for 100 restaurant examples converges to a perfect fit to the training data Typical problems: slow convergence, local minima

30 www.sti-innsbruck.at Back-propagation learning (3) Learning curve for MLP with 4 hidden units (as in our restaurant example): MLPs are quite good for complex pattern recognition tasks, but resulting hypotheses cannot be understood easily

31 www.sti-innsbruck.at Kernel machines New family of learning methods: support vector machines (SVMs) or, more generally, kernel machines Combine the best of single-layer networks and multi-layer networks –use an efficient training algorithm –represent complex, nonlinear functions Techniques –kernel machines find the optimal linear separator i.e., the one that has the largest margin between it and the positive examples on one side and the negative examples on the other –quadratic programming optimization problem

32 www.sti-innsbruck.at Application Example (1) Many applications of MLPs: speech, driving, handwriting, fraud detection, etc. Example: handwritten digit recognition –automated sorting of mail by postal code –automated reading of checks –data entry for hand-held computers

33 www.sti-innsbruck.at Application Example (2) 2 examples of handwritten digits: –Top row: digits 0-9, easy to identify –Bottom row: digits 0-9, more difficult to identify Different learning approaches applied to examples: –3-nearest neighbor classifier = 2,4% error –single-hidden-layer neural network: 400-300-10 unit MLP (i.e., 400 input units, 300 hidden units, 10 output units) = 1,6% error –specialized neural networks (called „LeNet“): MLPs with 3 hidden layers with 768,192 and 30 units = 0.9% error –Current best (kernel machines, vision algorithms) ≈ 0,6 % error Humans are estimated to have an error rate of about 0,2% on this problem –not extensively tested though –Experiences from United States Postal Service: human errors of 2,5%

34 www.sti-innsbruck.at Summary Most brains have lots of neurons, each neuron approximates linear-threshold unit Perceptrons (one-layer networks) are insufficiently expressive Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation Many applications: speech, driving, handwriting, etc. Engineering, cognitive modeling, and neural system modeling subfields have largely diverged


Download ppt "Www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at Neural Networks Intelligent Systems – Lecture 13."

Similar presentations


Ads by Google