Presentation on theme: "Slides from: Doug Gray, David Poole"— Presentation transcript:
1Slides from: Doug Gray, David Poole Neural NetworksSlides from: Doug Gray, David Poole
2What is a Neural Network? Information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process informationA method of computing, based on the interaction of multiple connected processing elements
3What can a Neural Net do? Compute a known function Approximate an unknown functionPattern RecognitionSignal ProcessingLearn to do any of the above
4Basic ConceptsA Neural Network generally maps a set of inputs to a set of outputsNumber of inputs/outputs is variableThe Network itself is composed of an arbitrary number of nodes with an arbitrary topology
5Basic Concepts Definition of a node: A node is an element which performs the functiony = fH(∑(wixi) + Wb)ConnectionNode
6Properties Inputs are flexible any real valuesHighly correlated or independentTarget function may be discrete-valued, real-valued, or vectors of discrete or real valuesOutputs are real numbers between 0 and 1Resistant to errors in the training dataLong training timeFast evaluationThe function produced can be difficult for humans to interpret
7Perceptrons Basic unit in a neural network Linear separator Parts N inputs, x1 ... xnWeights for each input, w1 ... wnA bias input x0 (constant) and associated weight w0Weighted sum of inputs, y = w0x0 + w1x wnxnA threshold function (activation function), i.e 1 if y > 0, -1 if y <= 0
8Σ Diagram w1 x1 x0 w0 x2 w2 Threshold . 1 if y >0 -1 otherwise y = Σ wixixnwn
9Typical Activation Functions F(x) = 1 / (1 + e –x)Using a nonlinear function which approximates a linear threshold allows a network to approximate nonlinear functions
10Simple Perceptron Binary logic application fH(x) = u(x) [linear threshold]Wi = random(-1,1)Y = u(W0X0 + W1X1 + Wb)Now how do we train it?
11Basic Training Perception learning rule ΔWi = η * (D – Y) * Xi η = Learning RateD = Desired OutputAdjust weights based on how well the current weights match an objective
12Logic Training Expose the network to the logical OR operation Update the weights after each epochAs the output approaches the desired output for all cases, ΔWi will approach 0X0X1D1
17Axiomatizing the Network The values of the attributes are real numbers.Thirteen parameters w0; … ;w12 are real numbers.The attributes h1 and h2 correspond to the values ofhidden units.There are 13 real numbers to be learned. The hypothesis space is thus a 13-dimensional real space.Each point in this 13-dimensional space corresponds to a particular logic program that predicts a value for reads given known, new, short, and home
20Neural Network Learning Aim of neural network learning: given a set of examples, find parameter settings that minimize the error.Back-propagation learning is gradient descent search through the parameter space to minimize the sum-of-squares error.
21Backpropagation Learning Inputs:A network, including all units and their connectionsStopping CriteriaLearning Rate (constant of proportionality of gradient descent search)Initial values for the parametersA set of classified training dataOutput: Updated values for the parameters
22Backpropagation Learning Algorithm Repeatevaluate the network on each example given the current parameter settingsdetermine the derivative of the error for each parameterchange each parameter in proportion to its derivativeuntil the stopping criteria is met
24Bias in neural networks and decision trees It’s easy for a neural network to represent “at least two of I1, …, Ik are true”:w0 w1 wkThis concept forms a large decision tree.Consider representing a conditional: “If c then a else b”:Simple in a decision tree.Needs a complicated neural network to represent(c ^ a) V (~c ^ b).
25Neural Networks and Logic Meaning is attached to the input and output units.There is no a priori meaning associated with the hidden units.What the hidden units actually represent is something that’s learned.