Presentation on theme: "9/1/2014Neural Networks1. 9/1/2014Neural Networks2 Artificial Neural Network (ANN) oNeural network -- “a machine that is designed to model the way in."— Presentation transcript:
9/1/2014Neural Networks2 Artificial Neural Network (ANN) oNeural network -- “a machine that is designed to model the way in which the brain performs a particular task or function of interest” (Haykin, 1994, pg. 2). –Uses massive interconnection of simple computing cells (neurons or processing units). –Acquires knowledge thru learning. –Modify synaptic weights of network in orderly fashion to attain desired design objective. oAttempts to use ANNs since 1950’s. –Abandoned by most by 1970’s.
9/1/2014Neural Networks3 Artificial Intelligence (AI) o“A field of study that encompasses computational techniques for performing tasks that apparently require intelligence when performed by humans” (Tanimoto, 1990). –Goal to increase our understanding of reasoning, learning, & perceptual processes. oKnowledge representation. oSearch.Fundamental Issues oPerception & inference.
9/1/2014Neural Networks4 Traditional AI vs. Neural Networks Traditional AI: oPrograms brittle & overly sensitive to noise. oPrograms either right or fail completely. –Human intelligence much more flexible (guessing). ohttp://www- ai.ijs.si/eliza/eliza.htmlhttp://www- ai.ijs.si/eliza/eliza.html Neural Networks: oCapture knowledge in large # of fine-grained units. oMore potential for partially matching noisy & incomplete data. oKnowledge is distributed uniformly across network. oModel for parallelism – each neuron is independent unit. oSimilar to human brains?
9/1/2014Neural Networks9 Human Brain o“… a highly complex, nonlinear, and parallel computer (information-processing system). It has the capability to organize its structural constituents, know as neurons, so as to perform certain computations (e.g., pattern recognition, perception, and motor control) many times faster than the fastest digital computer in existence today.” (Haykin, 1999, Neural Networks: A Comprehensive Foundation, pg. 1).
9/1/2014Neural Networks10 Approaches to Studying Brain oKnow enough neuroscience to understand why computer models make certain approximations. –Understand when approximations are good & when bad. oKnow tools of formal analysis for models. –Some simple mathematics. –Access to simulator or ability to program. oKnow enough cognitive science to have some idea of about what the system is supposed to do.
9/1/2014Neural Networks11 Why Build Models? “… a model is simply a detailed theory.” 1.Explicitness – constructing model of theory & implementing it as computer program requires great level of detail. 2.Prediction – difficult to predict consequences of model due to interactions between different parts of model. –Connectionist models are non-linear. 3.Discover & test new experiments & novel situations. 4.Practical reasons why difficult to test theory in real world. –Systematically vary parameters thru full range of possible values. 5.Help understand why a behavior might occur. Simulations open for direct inspections explanation of behavior.
9/1/2014Neural Networks12 Simulations As Experiments oEasy to do simulations, but difficult to do them well. oRunning a good simulation like running good experiment. 1.Clearly articulated problem (goal). 2.Well-defined hypothesis, design for testing hypothesis, & plan how to the results. –Hypothesis from current issues in literature. –E.g., test predictions, replicate observed behaviors, test theory of behavior. 3.Task, stimulus representations & network architectures must be defined.
9/1/2014Neural Networks13 What kinds of problems can ANNs help us understand? oBrain of newborn child contains billions of neurons –But child can’t perform many cognitive functions. oAfter a few years of receiving continuous streams of signals from outside world via sensory systems, –Child can see, understand language & control movements of body. oBrain discovers, without being taught, how to make sense of signals from world. oHow??? oWhere do you start?
Neural Networks (ACM) oWeb spam detection by probability mapping graphSOMs and graph neural networks oNo-reference quality assessment of JPEG images by using CBP neural networks oAn Embedded Fingerprints Classification System based on Weightless Neural Networks oForecasting Portugal global load with artificial neural networks o2006 Special issue: Neural network forecasts of the tropical Pacific sea surface temperatures oDevelopmental learning of complex syntactical song in the Bengalese finch: A neural network model oNeural networks in astronomy 9/1/2014Neural Networks15
9/1/2014Neural Networks16 Artificial & Biological Neural Networks oBuild intelligent programs using models that parallel structure of neurons in human brain. oNeurons – cell body with dendrites & axon. –Dendrites receive signals from other neurons. –When combined impulses exceed threshold, neuron fires & impulse passes down axon. –Branches at end of axon form synapses with dendrites of other neurons. Excitatory or inhibitory.
9/1/2014Neural Networks17 Do Neural Networks Mimic Human Brain? o“It is not absolutely necessary to believe that neural network models have anything to do with the nervous system, … o… but it helps. oBecause, if they do, we are able to use a large body of ideas, experiments, and facts from cognitive science and neuroscience to design, construct, and test networks.” (Anderson, 1997, p. 1)
9/1/2014Neural Networks18 Neural Networks Abstract From the Details of Real Neurons oConductivity delays are neglected. oNet input is calculated as weighted sum of input signals. oNet input is transformed into an output signal via a simple function (e.g., a threshold function). oOutput signal is either discrete (e.g., 0 or 1) or it is a real-valued number (e.g., between 0 and 1).
ANN Features oA series of simple computational elements, called neurons (or nodes, units, cells) oConnections between neurons that carry signals oEach link (connection) between neurons has a weight that can be modified oEach neuron sums the weighted input signals and applies an activation function to determine the output signal (Fausett, 1994). 9/1/2014 Neural Networks 20
9/1/2014Neural Networks21 Neural Networks Are Composed of Nodes & Connections oNodes – simple processing units. –Similar to neurons – receive inputs from other sources. –Excitatory inputs tend to increase neuron’s rate of firing. –Inhibitory inputs tend to decrease neuron’s rate of firing. oFiring rate changes via real- valued number (activation). oInput to node comes from other nodes or from some external source. Fully recurrent network 3-layer feed forward network
9/1/2014Neural Networks22 Connections oInput travels along connection lines. oConnections between different nodes can have different potency (connection strength) in many models. –Strength represented by real-valued number (connection weight). –Input from one node to another is multiplied by connection weight. oIf connection weight is –Negative number – input is inhibitory. –Positive number – input is excitatory.
9/1/2014Neural Networks23 Nodes & Connections Form Various Layers of NN
9/1/2014Neural Networks24 A Single Node/Neuron oInputs to node usually summed ( ). oNet input passed thru activation function ( f(net) ). oProduces node’s activation which is sent to other nodes. oEach input line (connection) represents flow of activity from some other neuron or some external source. f(net) Inputs from other nodes Outputs to other nodes
9/1/2014Neural Networks25 More Complex Model of a Neuron w k1 w k2 … w kp (-) kk x1x2xpx1x2xp ukuk Output y k Activation Function Threshold Summing function Synaptic weights of neuron Input signalsInput signals Linear Combiner Output
9/1/2014Neural Networks26 Add up Net Inputs to Node oEach input (from different nodes) is calculated by multiplying activation value of input node by weight on connection (from input node to receiving node). net i = w ij a j Net input to node i j o = sigma (summation) oi = receiving node oa j = activation on nodes sending to node i ow ij = weights on connection between nodes j & i.
9/1/2014Neural Networks27 Sums (weights * activation) For All Input Nodes net i = w ij a j j oi = 4 (node 4). oj = 3 (3 input nodes into node 4). oadd up w ij * a i for all 3 input nodes. 4 0 1 2
9/1/2014Neural Networks28 Activation Functions : Node Can Do Several Things With Net Input 1.Activation (e.g., output) = Input. (f(net)) is Identity function. Simplest case. 2.Threshold must be achieved before activation occurs. 1.Activation function may be non-linear function of input. Resembles sigmoid. 2.Activation function may be linear. Real neurons
9/1/2014Neural Networks29 Different Types of NN Possible 1.Single layer or multi-layer architectures (Hopfield, Kohonen). 2.Data processing. thru network. oFeedforward. oRecurrent. 3.Variations in nodes. oNumber of nodes. oTypes of connections among nodes in network. 4.Learning algorithms. oSupervised. oUnsupervised (self-organizing). oBack propagation learning (training). 5.Implementation. –Software or hardware.
9/1/2014Neural Networks31 Steps in Designing a Neural Network 1.Arrange neurons in various layers. 2.Decide type of connections among neurons for different layers, as well as among neurons within layer. 3.Decide way a neuron receives input & produces output. 4.Determine strength of connection within network by allowing network to learn appropriate values of connection weights via training data set.
Activation Functions 1.Identity function: f(x) = x for all x 2.Binary step function: f(x) = 1 if x >= θ; f(x) = 0 if x < θ 3.Continuous log-sigmoid function (Logistic function): f(x) = 1/[1 + exp(-σx)] 9/1/2014Neural Networks32
9/1/2014Neural Networks33 –a i = activation (output) of node i –net i = net activation flowing into node i –e = exponential oWhat output of node will be for any given net input. oGraph of relationship (next slide). Sigmoid Activation Function
9/1/201434 Sigmoid Activation Function Often Used for Nodes in NN oFor wide range of inputs (> 4.0 & < -4.0), nodes exhibit all or nothing. –Output max. value of 1(on). –Output min. value of 0 (off). oWithin range of –4.0 to 4.0, nodes show greater sensitivity. –Output capable of making fine discriminations between different inputs. oNon-linear response is at heart of what makes these networks interesting. nothing or all
9/1/2014Neural Networks35 oWhat will be the activation of node 2, assuming the input you just calculated? oIf node 2 receives input of 1.25, activation of 0.777. oActivation function scales from 0.0 to 1.0. oWhen net input = 0.0, net output is exact mid-range of possible activation (0.5). Negative inputs
9/1/2014Neural Networks36 Example 2-Layered Feedforward Network : Step Thru Process oNeural network consists of collection of nodes. –Number & arrangement of nodes defines network architecture. oExample 2-layered feedforward. –2 layers (input, output). –no intra-level connections. –no recurrent connections. –single connection into input nodes & out of output nodes. oVery simplified in comparison to biological neural network! 2-layered feedforward network a2 a0a1 w20 w21 Output nodes Input nodes
9/1/2014Neural Networks37 oEach input node has certain level of activity associated with it. –2 input nodes (a0, a1). –2 output nodes (a2, a3). oLook at one output unit (a2). –Receives input from a0 & a1 via independent connections. –Amount depends on activation values of input nodes (a0 & a1) and weights (w20, w21). oFor this network, activity flows in 1 direction along connections. –e.g., w20 w02 –w02 doesn’t exist oTotal input to node 2 (a2) = w20a0 + w21a1. a2 a0a1 w20 w21 wij = 20 when i = 0 & j = 2
9/1/2014Neural Networks38 Exercise 1.1 oWhat is the input received by node 2? oNet input for node 2 = (1.0 * 0.75) + (1.0 * 0.5) = 1.25 oNet input alone doesn’t determine activity of output node. oMust know activation function of node. oAssume nodes have activation functions shown in EQ 1.2 (& Fig. 1.3). oNext slide shows sample inputs & activations produced - assuming logistic activation function. a2 11 0.75 0.5
9/1/2014Neural Networks39 Bias Node (Default Activation) oIn absence of any input (i.e. input = 0.), nodes have output of 0.5. oUseful to allow nodes to have default activation. –Node is “off” (output 0.0) in absence of input. –Or can have default state where node is “on”. oAccomplish this by adding node to network which receives no inputs, but is always fully activated & outputs 1.0 (bias node). –Node can be connected to any node in network. –Often connected to all nodes except input nodes. –Allow weights on connections from this node to receiving nodes to be different.
9/1/2014Neural Networks40 oGuarantees that all receiving nodes have some input even if all other nodes are off. oSince output of bias node is always 1.0, input it sends to any other node is 1.0 * w ij (value of weight itself). oOnly need one bias node per network. oSimilar to giving each node a variable threshold. –large negative bias == node is off (activation close to 0.0) unless gets sufficient positive input from other sources to compensate. –large positive bias == receiving node is on & requires negative input from other nodes to turn it off. oUseful to allow individual nodes to have different defaults.
9/1/2014Neural Networks41 Learning From Experience Changing of neural networks connection weights (training) causes network to learn solution to a problem. Strength of connection between neurons stored as weight-value for specific connection. System learns new knowledge by adjusting these connection weights.
9/1/201442 Three Training Methods for NN 1.Unsupervised learning – hidden neurons must find a way to organize themselves without help from outside. No sample outputs provided to network against which it can measure its predictive performance for given vector of inputs. Learning by doing.
2. Supervised Learning (Reinforcement) oworks on reinforcement from outside. Connections among neurons in hidden layer randomly arranged, then reshuffled as network told how close it is to solution. Requires teacher -- training set of data or observer who grades performance of network results. Both unsupervised & supervised suffer from relative slowness & inefficiency relying on random shuffling to find proper connection weights. 9/1/2014Neural Networks43
3. Back Propagation oNetwork given reinforcement for how it is doing on task plus information about errors is used to adjust connections between layers. –Proven highly successful in training of multilayered neural nets. –Form of supervised learning. 9/1/2014Neural Networks44
9/1/2014Neural Networks46 McCulloch-Pitts (1943) Neuron 1.Activity of neuron is an “all-or-none” process. 2.Certain fixed number of synapses must be excited within period of latent addition to excite neuron at any time. oNumber is independent of previous activity & position of neuron. 3.Only significant delay within nervous system is synaptic delay. 4.Activity of any inhibitory synapse absolutely prevents excitation of neuron at that time. 5.Structure of net does not change with time.
McColloch-Pitts Neuron oFiring within a neuron is controlled by a fixed threshold (θ). obinary step function: f(x) = 1 if x >= θ; f(x) = 0 if x < θ. oWhat happens here if θ = 2? 9/1/2014Neural Networks47
McColloch-Pitts Neuron AND PQP ^ Q (P and Q) TTT TFF FTF FFF 9/1/2014Neural Networks48 Threshold = 2 Does a2 fire?
McColloch-Pitts Neuron OR PQP V Q (P or Q) TTT TFT FTT FFF 9/1/2014Neural Networks49 Threshold = 2 Does a2 fire?
Hebb : The Organization of Behavior (1949) oWhen an axon of cell A is near enough to excite a cell B & repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A ’ s efficiency, as one of the cells firing B, is increased. ” oIf neuron receives input from another neuron & if both highly active, weight between neurons should be strengthened. –Specific synaptic change (Hebb synapse) which underlies learning. oResult was interconnections between large, diffuse set of cells, in different parts of brain called “ cell assemblies. ” oChanges suggested by Rochester et al. (1956) make more practical model. 9/1/2014Neural Networks53
Hebb’s Rule: Associative learning “Cells that fire together, wire together” o w ij = a i a j –where change in weight = product of activations of nodes that are connected to it. o w ij = ηa i a j –where η is the learning rate oUnsupervised learning oSuccess at learning some patterns –it only learns these patterns (e.g., pair-wise correlations). There will be times when want ANN to learn to associate a pattern with some desired behaviors even when there is no pair-wise correlation 9/1/2014Neural Networks54
9/1/2014CS 271 Ch. 455 Pros & Cons of Hebbian Learning oKnown biological mechanisms that might use Hebbian Learning. oProvides reasonable answer to “ where does teacher info for learning process come from? ” –Lots of useful info in correlated activity. –System just needs to look for patterns. oAll it can learn is pair-wise correlations. oMay need to learn to associate patterns with desired behaviors even if patterns aren ’ t pair-wise. –Hebb rule can ’ t do this.
9/1/2014CS 271 Ch. 456 Perceptron Convergence Procedures (PCP) oVariations of Hebb ’ s Rule from 1960s. –Perceptron (Rosenblatt, 1958). –Widrow-Hoff rule is similar to PCP (1960). oStart with network of units with connections initialized with random weights. oTake target set of input/output patterns & adjust weights automatically so at end of training weights yield correct outputs for any input. –Network should generalize to produce correct output for input patterns it hasn ’ t seen during training. ogradient descent rule, Delta rule or Adaline rule
Widrow-Hoff Rule ostarts with connections initialized with random weights and one input pattern is presented to the network. oFor each input pattern, the network’s actual output is compared to the target output for that pattern. 9/1/2014Neural Networks58 Figure 18: Supervised (Delta Rule) vs. Unsupervised (Perceptron) Learning (www.willamette.edu/~gorr/classes/cs449/ Classification/delta.html)www.willamette.edu/~gorr/classes/cs449/ Classification/delta.html
oAny discrepancy (error) used as basis for changing weights on input connections & changing output node’s threshold for activation. oHow much weights are changed depends on error produced & activation from given input. –Correction is proportional to error signal multiplied by value of activation given by derivative of transfer function. –Using derivative allows making finely tuned corrections when activation is near its extreme values (minimum or maximum) & larger corrections when activation is in middle range. oGoal of the Widrow-Hoff Rule is to minimize error on output unit by apportioning credit & blame to the input nodes. oOnly works for simple, 2-layer networks (I/O units). 9/1/2014Neural Networks59
Using Similarity oBasic principle that drives learning oAllows generalization of behaviors because similar inputs tend to yield similar outputs. o11110000 vs. 11110001 o“make” and “bake” “made” and “baked” oCats and tigers oSimilarity is generally a good rule of thumb, but not in every case. oHebbian networks & basic, 2-layer PCP networks can only learn to generalize on basis of physical similarity 9/1/2014Neural Networks60
9/1/2014CS 271 Ch. 461 2-layer Perceptron Can ’ t Solve Problem of Boolean XOR oIf want output to be true (1). –At least 1 input must be 1 & at least 1 weight must be large enough so when multiplied, output node turns on. oFor patterns (00 & 11) want 0 so set weights to 0. oFor patterns (01 & 10), need weights from either input large enough so 1 input alone activates output. oContradictory requirements -- no set of weights allows output to come on if either input on & keeps it off if both are on! Node 0Node 1XOR 000 101 011 110 w 21 a0a0 a1a1 a2a2 w20w20
9/1/2014CS 271 Ch. 462 Vectors oVector -- collection of numbers or point in space. oCan think of inputs in XOR example as 2-D space. –With each number indicating how far out along the dimension the point is located. oJudge similarity of 2 vectors by Euclidean distance in space. –Pairs of patterns furthest apart & most dissimilar (00 & 11) are ones need to group together for XOR function. 1 10 0 0,11,1 0,01,0
9/1/2014CS 271 Ch. 463 oI/O weights impose linear decision bound on input space. –Patterns which fall on 1 side of decision line classified differently than patterns on other side. oWhen groups of inputs can ’ t be separated by line, no way for unit to discriminate between categories. –Problems called non-linearly separable. oWhat ’ s needed are hidden units & learning algorithms that can handle more than one layer. 1 10 0 0,11,1 0,01,0 1 10 0 0,11,1 0,01,0 1 10 0 0,11,1 0,01,0 AND OR XOR
CS 271 Ch. 464 Solving the XOR Problem : Allow Internal Representation oAdd extra node(s) between I & XOR problem solved. o“ Hidden ” units equivalent to internal representations & aren ’ t seen by world. –Very powerful -- networks have internal representations that capture more abstract, functional relationships. oInputs (sensors), outputs (motor effectors) & hidden (inter-neurons). oInput similarity still important. –All things being equal, physical resemblance of inputs exerts strong pressure to induce similar responses.
CS 271 Ch. 465 Hidden Units & XOR Problem o(a) what input looks like to network showing intrinsic similarity structure of inputs. oInput vectors are passed through weights between inputs & hidden units (multiplied); transforms (folds) input space to produce (b). o(b) 2 most distinct patterns (11, 00) are close in hidden space. oWeights to output unit can impose linear decision bound & classify output (c). 1 10 0 0,11,1 0,01,0 1 10 0 0,11,1 0,0 1,0 1,0 1,1 0,1 0,0 Input 1 (a) Hidden unit 1 (b) Output (c)
9/1/2014CS 271 Ch. 466 Hidden Units Used to Construct Internal Representations of External World oHidden units make it possible for network to treat physically similar inputs as different, as needed. –Transform input representations to more abstract kinds of representations. –Solve difficult problems like XOR. oHowever, being able to solve problem, just means that some set of weights exist -- in principle. –Network must be able to learn these weights! oReal challenge is how to train networks! –One solution -- backpropagation of error.
9/1/2014CS 271 Ch. 467 Earlier Laws (PCP) Can ’ t Handle Hidden Layers Since Don ’ t Know How to Change Weights To Them oPCP & others work well for weights leading to outputs since have target for Output & can calculate weight changes. oProblem occurs when have hidden units -- how to change weights from inputs to hidden units? –With these algorithms must know how much error is already apparent at level of Hidden before Output is activated. –Don ’ t have predefined target for H, so can ’ t say what their activation levels should be. –Can ’ t specify error at this level of network.
Hopfield oRecurrent ANN oThey are guaranteed to converge to a local minimum, but convergence to one of the stored patterns is not guaranteed ohttp://www.cbu.edu/~ pong/ai/hopfield/hopfi eldapplet.html 9/1/2014Neural Networks68
9/1/2014Neural Networks69 Backpropagation of Error AKA Generalized Delta Rule. (δ) ( Rummelhart, Hinton & Williams, 1986) oBegin with network which has been assigned initial weights drawn at random. –Usually from uniform distribution with mean of 0.0 & some user-defined upper & lower bounds ( ±1.0). oUser has set of training data in form of input/output pairs. oGoal of training -- learn single set of weights such that any input pattern will produce correct output pattern. –Desired if weights allow network to generalize to novel data not seen during training.
9/1/2014Neural Networks70 Backprop oExtremely powerful learning tool. –Applied over wide range of domains. oProvides very general framework for learning. –Implements gradient descent search in space of possible network weights to minimize network error. oWhat counts as error is up to modeler. –Usually squared difference between target & actual output, but any quantity that is affected by weights may be minimized.
9/1/2014CS 271 Ch. 471 Backprop Training Takes 4 Steps 1.Select I/O pattern (usually at random). 2.Compare network ’ s output with desired output (teacher pattern) on node-by-node basis & calculate error for each output node. 3.Propagate error info backwards in network from output to hidden. 4.Adjust weights on connections to reduce errors.
9/1/2014CS 271 Ch. 472 1. Select I/O pattern oPattern usually selected at random. oInput pattern used to activate network & activation values for output nodes are calculated. oCan have additional nodes between I/O ( “ hidden ” ). oSince weights selected at random, outputs generated at start are typically not those that go with input pattern.
9/1/2014CS 271 Ch. 473 2: Calculate Delta ( ip ) Error (EQ 1.3) ip = (t ip - o ip ) f ’ (net ip ) = (t ip - o ip ) o ip (1-o ip ) o( ip ) = difference in value between target for node i on training pattern p (target ip ) and oactual output for that node on that pattern (o ip ) omultiplied by derivative of output node ’ s activation function given its input. –f ’ (net ip ) = slope of activation function. –EQ 1.2, Fig. 1.3 -- steepest around middle of function where net input closest to 0.
9/1/2014CS 271 Ch. 474 oFor large values of net input to node (+ & -), derivative is small. –( ip ) will be small. –Net input to node tends to be large when connections feeding into it are strong. oWeak connections tend to yield small input to node. –Activation function is large & ( ip ) can be large.
9/1/2014CS 271 Ch. 475 Error Weight x Weight y Ideal weight vector Delta vector Current weight vector Weight Changes in the Delta Rule New weight vector
9/1/2014CS 271 Ch. 476 Gradient Descent Learning Rule oMoves weight vector from current position on bowl to new position closer to minimum error by falling down the negative gradient of the bowl. oNot guaranteed to find correct answer. –Always goes down hill & may get stuck in local minimum. oUse momentum to “ push ” changes in same direction & possibly keep network from getting stuck.
77 Backprop: Calculate Weight Adjustments oKnow, for each output node, how far off target value is. oMust adjust weights on connections that feed into it to reduce error. –Want to change weight on connections from every node j coming into current node i so that can reduce error on pattern. error weights Learning rate
9/1/2014CS 271 Ch. 478 oPartial derivative – rate of change. –May be other variables, but they ’ re being held constant. oMeasures how quantity on top changes when quantity on bottom is changed. –i.e., how is error (E) affected by changing weights (w)? oIf know this, know how to change weight to decrease error. –i.e., to decrease discrepancy between what network outputs & what we want it to output.
9/1/2014CS 271 Ch. 479 oPartial derivative is bell shaped for sigmoidal curves (threshold function). –Large values are in the mid-range. oContributes to stability of network – as outputs approach 0 or 1, only small changes occur. oHelps compensate for excessive blame attached to hidden nodes. o( ) = Learning Rate. oConvert partial derivative in EQ 1.4 to EQ 1.5.
9/1/2014CS 271 Ch. 480 Backprop: Delta Rule (EQ 1.5) w ij = ip o jp oMake changes small -- learning rate ( ) set to less than 1.0 so that changes aren ’ t too drastic. –Change in weight depends on error have for unit ( ip ). oTake output into account (o jp ) since node ’ s error is related to how much (mis)information it has received from another node. If node is highly active & contributed lots to current activation, then responsible for much of current error. If node inactive to unit i, won ’ t contribute to i ’ s error.
9/1/2014CS 271 Ch. 481 Delta Rule continued o ip reflects error on unit i for input pattern p. –Difference between target & output. –Also includes partial derivative (EQ 1.4). oCalculate errors on all output nodes & weight changes on connections coming into them. –Don ’ t yet make any changes. w ij = ip o jp
9/1/2014CS 271 Ch. 482 3. Propagate error info backwards from output to hidden –Assume shared blame of hidden unit on basis of: –What errors on O unit H unit is activating and –Strength of connection between H & each O it connects to. Move to hidden layer(s), if any, & use EQ 1.5 to change weights leading into hidden units from below. –Can ’ t use EQ 1.3 to compute H nodes ’ errors since no given target to make comparison with. –H nodes “ inherit ” errors of all nodes they ’ ve activated. –If nodes activated by H unit have large errors, then H unit shares blame.
9/1/2014CS 271 Ch. 483 oCalculate error by summing up errors of nodes it activates multiplied by weight between nodes since it will have effect. –i = hidden node –p = current pattern –k indexes output node feeding back to hidden node. –derivative of hidden unit ’ s activation function multiplied in. oContinues iteratively down thru network (backpropagation of error)…
9/1/2014CS 271 Ch. 484 4. Adjust weights on connections to reduce errors oWhen reach layer above input layer (no incoming weights), actually impose the weight changes. Error flow
9/1/2014CS 271 Ch. 485 Backprop Pros & Cons oExtremely powerful learning tool that is applied over wide range of domains. oProvides very general framework for learning. –Implements gradient descent search. oWhat counts as error is up to modeler. –Usually squared difference between target & actual output. –Any quantity that is affected by weights may be minimized. oRequires large # presentations of input dat to learn. oEach presentation requires 2 passes thru network (forward & backward). oEach pass is complex computationally.
9/1/2014Neural Networks88 3 Ways Developmental Models Handle Change 1.Development results from working out predetermined behaviors. Change is the triggering of innate knowledge. 2.Change is inductive learning. Learning involves copying or internalizing behaviors present in the environment. 3.Change arises through interaction of maturational factors, under genetic control, and environment. Progress in neurosciences. Computational framework good for exploring & modeling.
9/1/2014Neural Networks89 Biologically-Oriented Connectionism (Elman, et al) 1.We think it is critical to pay attention to what is known about genetic basis for behavior & about developmental neuroscience. 2.At level of computational & modeling, believe it is important to understand sorts of computations that can plausibly be carried out in neural systems. 3.We take a broad view of biology which includes concern for evolutionary basis for behavior. 4.A broader biological perspective emphasizes adaptive aspects of behaviors & recognizes that to understand adaptation requires attention to environment.
9/1/2014Neural Networks90 Connectionist Models oCognitive functions performed by system that computes with simple neuron-like elements, acting in parallel, on distributed representations. 1.Have precisely matched data from human subject experiments. –Measure speed of reading words – depends on frequency of word & regularity of pronunciation pattern. (E.g., GAVE, HAVE). Similar pattern (humans – latency, NN – errors). Fig. P.1 on pg. 3 (McLeod, Plunkett, Rolls)
9/1/2014Neural Networks92 2.Connectionist models can predict results. –Suggest areas of investigation –E.g., U-shape learning or Over-generalization problems when kids learn past tense of verbs (WENT – GOED) suggests linguistic development occurs in stages. –NN model produced over-regularization errors. –Fig. P.2. (McLeod, Plunkett, Rolls)
9/1/2014Neural Networks94 3.Connectionist models have suggested solutions to some of the oldest problems in cognitive science. E.g., face recognition from various angles. View invariance – respond to one particular face (regardless of view) & not the other faces. E.g., face 3 in Fig. P.3. (McLeod, Plunkett, Rolls)
9/1/2014Neural Networks98 Task oWhen train network, want it to produce some behavior. oTask – behavior that are training network to do. –E.g., associate present tense form of verb with past tense form. oTask must be precisely defined – for class of networks we’re dealing with – learning correct output for a given input. –Set of input stimuli. –Correct output is paired with each input. Training Environment
9/1/2014Neural Networks99 Implications of Defining the Task oMust conceptualize behavior in terms of inputs & outputs. –May need abstract notion of input & output. –E.g., associate 2 forms of verb – neither is really input for other. oTeach network task by example, not by explicit rule. –If successful, network learns underlying relationship between input & output by induction. –Can’t assume network has learned generalization we assume underlies behavior – may have learned some other behavior! Eg., tanks.
1980s Pentagon trained NN to recognize tanks 9/1/2014Neural Networks100
9/1/2014Neural Networks101 Implications - 2 oNature of training data is extremely important for learning. –The more data you give a network, the better. –With too little data, may make bad generalization. –Quality counts too!! – structure of environment influences outcome. oSome tasks more convincing/more effective/more informative than others to demonstrate a point. –Is info represented in teacher (output) plausibly available to human learners? –E.g., children? See task on next slide.
9/1/2014Neural Networks102 Two Ways to Teach Network to Segment Sounds into Words 1.Expose network to sequences of sounds (present one at time, in order, with no breaks between words). Train network to produce “yes” when sequence makes word. Explicitly learns about words from info where words start. 2.Train network on different task – given same sequences of sounds as input, but task is to predict next sound. At beginning of word, network makes many mistakes. As it hears more of word, prediction error declines until end of word. Learns about words implicitly as indirect consequence of task. oFirst approach -- gives away secret by directly teaching task (boundary info) which is NOT how children learn.
9/1/2014Neural Networks103 Network Architectures : Number & Arrangement of Nodes in Network 1.Single-layer feedforward networks -- input layer that projects onto output layer of neurons in one direction. 2.Multilayer feedforward network -- has 1+ hidden layers that intervene between external input & network output.
9/1/2014Neural Networks104 Network Architectures : Number & Arrangement of Nodes in Network 3.Recurrent network -- has at least 1 feedback loop. 4.Lattice structure -- 1-D, 2-D or greater arrays of neurons with output neurons arranged in rows & columns.
9/1/2014Neural Networks105 Most Neural Networks Consists of 3 Layers
9/1/2014Neural Networks106 6 Different Types of Connections Used Between Layers (Inter-layer Connections) 1.Fully connected. Each neuron on first layer is connected to every neuron on second layer. 2.Partially connected. Neuron of first layer does not have to be connected to all neurons on second layer. 3.Feed forward. Neurons on first layer send their output to neurons on second layer, but receive no input back from neurons on second layer.
9/1/2014Neural Networks107 4.Bi-directional (recurrent)..Another set of connections carrying output of neurons of second layer into neurons of first layer. 5.Hierarchical. Neurons of lower layer may only communicate with neurons on next level of layer. 6.Resonance.Layers have bi-directional connections. –Can continue sending messages across connections number of times until certain condition is achieved.
9/1/2014Neural Networks108 How to Select Correct Network Architectures oAny task can be solved by some neural network (in theory) – not any neural network can solve any task. oNumber & arrangement of nodes defines network architecture. oTextbook uses: 1) feedforward. 2) simple recurrent networks. o# nodes depends on task & how I/O are represented. –E.g., if images input in 100x100 dot array -- 10,000 I nodes. oSelection of architecture reflects modeler’s theory about what info processing is required for task.
9/1/2014Neural Networks109 Analysis 1.Train network on task. 2.Evaluate network’s performance & try to understand basis for performance. oNeed to anticipate kinds of tests before training! Ways to evaluate network performance: 1.Global error. 2.Individual pattern error. 3.Analyzing weights & internal representations.
9/1/2014Neural Networks110 Evaluate Network Performance: Global Error oDuring training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce. oSimulator reports this error on-line -- sum it over number of patterns. –As learning occurs, error should decline & reach 0. oIf network is trained on task in which same input can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0.
9/1/2014Neural Networks111 Evaluate Network Performance: Individual Pattern Error oGlobal error can be misleading. –If have large # of patterns to learn, global error may be low even if some patterns are not learned correctly. –These may be the interesting patterns. oAlso may want to create special test stimuli not presented to network during training. –Generalize to novel cases? –What has network learned? oHelps discover what generalizations have been created from a finite data set.
9/1/2014Neural Networks112 Evaluate Network Performance: Analyzing Weights & Internal Representations 1.Hierarchical clustering of hidden unit activations. 2.Principal component analysis & projection pursuit. 3.Activation patterns in conjunction with actual weights.
9/1/2014Neural Networks113 Hierarchical Clustering of Hidden Unit Activations oPresent test patterns to network after training. oPatterns produce activations on hidden units which record & tag -- vectors in multi-dimensional space. oClustering looks at similarity structure of space. oInputs treated as similar by network produce internal representations that are similar. oProduces tree format of inner-pattern distance. oCan’t examine space directly -- difficult to visualize high-dimensional spaces.
9/1/2014Neural Networks114 Principal Component Analysis & Project Pursuit oUsed to identify interesting lower-dimensional slices from hierarchical clustering. oMove viewing perspective around in this space.
9/1/2014Neural Networks115 Activation Patterns in Conjunction With Actual Weights oWhen look at activation patterns, only look at part of what network “knows.” oNetwork manipulates & transforms info via connections between nodes. oExamine connections & weights to see how transformations are being carried out. oHinton diagrams can be used -- weights shown as colored squares with color & size of square representing magnitude & sign of connection.
9/1/2014Neural Networks117 Hinton Diagram. White = positive weight. Black = negative weight. Area of box proportional to absolute value of corresponding weight.
9/1/2014Neural Networks118 What Do We Learn From a Simulation? oAre the simulations framed in such way that clearly address some issue? oAre the task & stimuli appropriate for points being made? oDo you feel you’ve learned something from the simulation?
9/1/2014Neural Networks119 Uses of Neural Networks oPrediction -- Use input values to predict some output. E.g. pick best stocks, predict weather, identify cancer risk people. oClassification -- Use input values to determine classification. E.g. is input letter A; is blob of video data a plane & what kind? oData association -- Recognize data that contains errors. E.g. identify characters when scanner is not working properly. oData Conceptualization -- Analyze inputs so that grouping relationships can be inferred. E.g. extract from database names most likely to buy product. o Data Filtering -- Smooth an input signal. E.g. take the noise out of a telephone signal.
9/1/2014Neural Networks120 Send In The Robots http://www.spacedaily.com/news/robot-01b.html by Annie Strickler and Patrick Barry for NASA Science News Pasadena - May 29, 2001for NASA Science News oAs a project scientist specializing in artificial intelligence at NASA's Jet Propulsion Laboratory (JPL), Ayanna is part of a team that applies creative energy to a new generation of space missions - - planetary and moon surface explorations led by autonomous robots capable of "thinking" for themselves. oNearly all of today's robotic space probes are inflexible in how they respond to the challenges they encounter (one notable exception is Deep Space 1, which employs artificial intelligence technologies). They can only perform actions that are explicitly written into their software or radioed from a human controller on Earth. oWhen exploring unfamiliar planets millions of miles from Earth, this "obedient dog" variety of robot requires constant attention from humans. In contrast, the ultimate goal for Ayanna and her colleagues is "putting a robot on Mars and walking away, leaving it to work without direct human interaction."
9/1/2014Neural Networks121 o"We want to tell the robot to think about any obstacle it encounters just as an astronaut in the same situation would do," she says. "Our job is to help the robot think in more logical terms about turning left or right, not just by how many degrees." … oTo do this, Ayanna rely on 2 concepts in field of artificial intelligence: "fuzzy logic" & "neural networks." … oNeural networks also have ability to learn from experience. This shouldn't be too surprising, since design of neural networks mimics way brain cells process information. o"Neural networks allow you to associate general input to a specific output," Ayanna says. "When someone sees four legs and hears a bark (the input), their experience lets them know it is a dog (the output)." This feature of neural networks will allow a robot pioneer to choose behaviors based on the general features of its surroundings, much like humans do. “
9/1/2014Neural Networks122 oBy combining these two technologies, Ayanna and her colleagues at JPL hope to create a robot "brain" that can learn on its own how to expertly traverse the alien terrains of other planets. oSuch a brainy 'bot might sound more like the science fiction fantasies of children's comics than a real NASA project, but Ayanna thinks the sci-fi flavor of the project contributes to its importance for space exploration. oAyanna -- who wanted to be television's "Bionic Woman" when she was young, and later decided she wanted to try to build her instead -- says she believes that the flights of imagination common in childhood translate into adult scientific achievement. o"I truly believe science fiction drives real science forward," she says. "You must have imagination to go to the next level."
9/1/2014Neural Networks123 Learning to Use tlearn oDefine task. oDefine architecture. oSetting up simulator. –Configuration (.cf) file. –Data (.data) file. –Teach (.teach) file. oCheck architecture. oRun simulation. –Global error. –Pattern error. oExamine weights. –Role of start state. –Role of learning state. oTry: –Logical Or. –Exclusive Or.
9/1/2014Neural Networks124 Define Task oTrain neural network to map Boolean functions AND, OR, EXCLUSIVE OR (XOR). oBoolean functions take set of inputs (1, 0) & decide if given input falls into positive or negative category. oInput & output activation values of nodes in network with 2 input units & 1 output unit. oNetworks simple & relatively easy to construct for task. oMany of problems encounter with this task have direct implications for more complex problems.
9/1/2014126 Define Architecture for AND Function o4 input patterns & 2 distinct outputs. –Each input pattern has 2 activation values. –Each output has single activation. –For every input pattern, have well-defined output. oUse simple feedforward network with 2 I units & 1 O unit. Single Layer Perceptron – 1 layer of weights. w 21 a0a0 a1a1 a2a2 w 20
9/1/2014Neural Networks128 1.Network menu – New Project option. 2.New project dialogue box appears. 3.Select directory or folder in which to save your project files. Use N: Drive! 4.Call project and. All files associated with project should have same name (any name you want). 5.Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture. –and.teach – defines output patterns to network, how many & format. –and.data – defines input patterns to network, how many & format. –and.cf – used to define # nodes in network & initial pattern of connectivity between nodes before training.
9/1/2014Neural Networks129 Info Stored in.cf,.data &.teach Files oCan use editor of tlearn. oOr text editor or word processor. –Must save files in ASCII format (text). oEnter data for and.cf file. –Follow upper- & lower-case distinctions, spaces & colons. –Use delete or backspace keys to correct errors. oFile Save command in tlearn.
9/1/2014Neural Networks130 1 AND 1 = 1 0 AND 0 = 0 0 AND 1 = 0 1 AND 0 = 0 OUTPUT INPUT CONFIGURATION
9/1/2014Neural Networks131 Key to setting up simulator. Describes configuration of network. Conforms to fairly rigid format. 3 sections: NODES: CONNECTIONS: SPECIAL
9/1/2014Neural Networks132 NODES: NODES:Beginning of nodes section nodes = 1# units in network (not input) inputs = 2# input units (counted separately) outputs = 1# output units in network output node is 1identifies output unit – only 1 non- input node in network. Start at 1. oInputs don’t count as nodes. oOutput nodes are. oSpaces are critical.
9/1/2014Neural Networks133 CONNECTIONS: CONNECTIONS :Beginning of section groups = 0How many groups of connections are constrained to have same value. 1 from i1-i2Indicates node 1 (output) receives input from 2 input units. Input units given prefix i. 1 from 0Node 0 is bias unit which is always on. So node 1 has a bias. oAll connections in a group are identical strength. –groups = 0 is common.
9/1/2014Neural Networks134 o from provides info about connections – is comma-separated list of node # with dashes indicating that intermediate node # are included. –1 from i1-i2 –Contains no spaces. –Nodes numbered counting from 1. oInputs are numbered, counting from 1, with i prefix. oNode 0 always outputs a 1 & serves as bias node. –If biases are desired, connections must be specified from node 0 to specific other nodes. –1 from 0
9/1/2014Neural Networks135 SPECIAL: SPECIAL:Beginning of section selected = 1Which units selected for special printout. Output node (1) is selected. weight_limit = 1.00Sets start weights (from I to O & biases to O) randomly in range of +/- 0.5. oOptional lines can specify if : –linear = some nodes linear –bipolar = values range from –1 to 1 –selected = nodes selected for special printout
9/1/2014Neural Networks136 Data (.data) File oDefines input patterns presented to tlearn. oFirst line is either: –distributed (normal) – set of vectors with i values. –localist (only few numbers of many input lines are non-zero). oSecond line is integer specifying number of input vectors to follow. oRemainder of file consists of input. –Integers or floating-point numbers.
9/1/2014Neural Networks138 Teach (.teach) File oRequired whenever learning is to be performed. oFirst line: distributed (normal) localist (only few of many target values nonzero). oInteger specifying # output vectors to follow. oOrdering of output pattern matches ordering of corresponding input patterns in.data file. oIn normal (distributed), each output vector contains o floating point or integer numbers. –o = number of outputs in network. –can use * instead of a floating point number to indicate “don’t care”.
9/1/2014Neural Networks140 Checking the Architecture oIf typed in info to and.cf, and.data & and.teach files correctly should have no problems. otlearn offers check of and.cf by displaying picture of network architecture. –Displays menu, Network Architecture option. –Can change how see nodes, but doesn’t change contents of network configuration file. oGet error message if mistake in syntax of training files. oDoesn’t not find incorrect entries in data!!
9/1/2014Neural Networks142 Running the Simulation oSpecify 3 input files (.cf,.data,.teach) & save them. oSpecify parameters for tlearn to determine initial start state of network, learning rate, & momentum. oNetwork menu, training options.
9/1/2014Neural Networks143 o# training sweeps before stop –training sweep is 1 presentation of input pattern causing activation to propagate thru network & appropriate weight adjustments to be carried out. oOrder in which patterns are presented to network determined by : –train sequentially – presents patterns in order they appear in.data &.teach files. –train randomly – presents patterns in random order. oLearning Rate – determines how fast weights are changed in response to a given error signal. –set to 0.100 oMomentum –discussed later. –set to 0.0
9/1/2014Neural Networks144 oInitial state of network determined by weight values assigned to connections before training starts. –.cf file specifies weight_limit oWeights assigned according to random seed indicated by number next to Seed with: button. –Select any number you like. –Simulation can be replicated using the same random seed – initial start weights of network are identical & patterns are sampled in same random order. oSeed randomly – computer selects random seed. oBoth Seed with & Seed randomly select set of random start weights within the limits specified by weight_limit parameter.
9/1/2014Neural Networks145 Train the Network oOnce set training options, select Train the network from Network menu. oGet tlearn Status display. –# sweeps –Abort, dump current state in weights file. –Iconify – clear screen for other tasks while tlearn runs in background.
9/1/2014Neural Networks146 Has the Network Solved the Problem? 1.Examine global error produced at output nodes averaged across patterns. 2.Examine response of network to individual input patterns. 3.Analyzing weights & internal representations.
9/1/2014Neural Networks147 Examine Global Error oDuring training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce. oSimulator reports this error on-line -- sum it over a number of patterns. –As learning occurs, error should decline & reach 0. oIf network is trained on task in which same input can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0. oError calculated by subtracting actual response from desired (target) response. oValue of discrepancy is either: –Positive if target greater than actual output. –Negative if actual output is greater than target output.
9/1/2014Neural Networks148 Root Mean Square (RMS) Error oGlobal error – average error across 4 pairs at a given point in training. otlearn provides Root Mean Square error (RMS) to prevent cancellation of positive & negative numbers. –Average of the squared errors for all patterns. –Returns square root of average.
9/1/2014Neural Networks149 Tracks RMS error throughout training (every 100 sweeps). Error decreases as training continues … after 1000 sweeps RMS error = 0.35. –Average output error = 0.35 –Output off target by approx. 0.35 averaged across 4 patterns. AND Network
9/1/2014Neural Networks150 oEquation 3.1 –k indicates number of input patterns (4 for AND) –o k is vector of output activations produced by input pattern k –number of elements in vector corresponds to number of output nodes. e.g., in this case (AND), only one output node so vector contains only 1 element. –vector t k specifies desired or target activations for input pattern k. oWith 1000 sweeps & 4 input patterns, network sees each pattern 250 approximately.
9/1/2014Neural Networks151 oGiven RMS error = 0.35, has the network learned the AND function? –Depends on how define acceptable level of error. oActivation function of output unit is sigmoid function (EQ 1.2). –Activation curve never reaches 1.0 or 0.0 –Net input to node would need to be ± infinity. –Always some residual finite error. oSo what level of error is acceptable? No right answer. –Can say all outputs be within 0.1 of target. –Can round off activation values & ones closest to 1.0 are correct if target is 1.0.
9/1/2014Neural Networks152 Has Network Solved Problem? oRMS error = 0.35. Solved? oDepends on how define acceptable level of error. –Can’t always use just global error. –Network may have low RMS, but hasn’t solved all input patterns correctly. Exercise 3.3 1.How many times has network seen each input pattern after 1000 sweeps through training set? 2.How small must RMS error be before we can say network has solved problem?
9/1/2014Neural Networks153 Pattern Error – Verify Network Has Learned oRMS error is the average error across 4 patterns. oIs error uniformly distributed across different patterns or have some patterns been correctly learned while others are not?? oVerify network has learned from Network menu –Presents each input pattern to network once & observes resulting output node activation. –Compare output activations with teacher signal in.teach file.
9/1/2014Neural Networks154 oOutput window indicates file and.1000.wts as specification of state of network. oUsed and.data training patterns to verify network performance. oCompare activation values to target activations in and.teach file. oHas the network solved Boolean AND?
9/1/2014Neural Networks155 Pattern Error – Node Activities oActivation levels indicated by squares. –Large white = high activations. –Small white = low activations. –Grey = inactive node.
9/1/2014Neural Networks156 Individual Pattern Error Global error can be misleading. –If have large # of patterns to learn, global error may be low even if some patterns are not learned correctly. –These may be the interesting patterns. Also may want to create special test stimuli not presented to network during training. –Generalize to novel cases? –What has network learned? Helps discover what generalizations have been created from a finite data set.
9/1/2014Neural Networks157 Pattern Error: Present each Input Pattern Just Once Select Verify network has learned from Network menu. Presents each input pattern to network just once. E.g., for AND function, should do 4 sweeps (1 per each training input). Observe resulting output node activations. Compare output activations with teacher signal in.teach file.
9/1/2014Neural Networks158 Output window indicates file and.1000.wts as specification of state of network. Used and.data training patterns to verify network performance. Compare activation values to target activations in and.teach file. Has the network solved Boolean AND? AND Network
9/1/2014Neural Networks159 Calculate Actual RMS Error Value & Compare it to Value Plotted (Boolean AND) Inpu t OutputRound Off Targe t Squared Error 000.09900.0098 100.29400.0864 010.30100.0906 110.62011.1444 RMS Error =.2877 Sqrt(.3312/4)
9/1/2014Neural Networks160 Pattern Error – Node Activities Activation levels indicated by squares. –Large white = high activations. –Small white = low activations. –Grey = inactive node.
9/1/2014Neural Networks161 Examine Weights oInput activations transmitted to other nodes along modifiable connections. oPerformance of network determined by strength of connections (weight values). 1.Display menu, Connection Weights (Hinton diagram). –white (positive) –black (negative) –size reflects absolute size of connection bias node/first input/second input
9/1/2014Neural Networks162 oAll rectangles in first column code values of connection from bias node. oRectangles in 2 nd column code connections from 1 st input unit. oAcross columns – higher numbered nodes (from.cf) oRows in each column identify destination nodes of connection. –higher numbered rows indicate higher numbered destination nodes. –Only one node in this example receives inputs (output node) – only one that receives incoming connections.
9/1/2014Neural Networks163 oHinton diagram provides clues how network solves Boolean AND. –Bias has strong negative connection to output node. –2 input nodes have moderately sized positive connections to output node. –One active node by itself can’t provide enough activation to overcome strong negative bias. –Two active input nodes together can overcome negative bias. –Output node only turns on if both input nodes are active!
9/1/2014Neural Networks164 Role of Start State oNetwork solved Boolean AND starting with particular set of random weights & biases. oUse different random seed (Training options) to wipe out learning that has occurred … oCan resume training beyond the specified number of sweeps using the Resume training option. oStart states can have dramatic impact on way network attempts to solve a problem & on final solution. –Training networks with different random seeds is like running subjects on experiments.
9/1/2014Neural Networks165 Role of Learning Rate oLearning rate determines proportion of error signal which is used to change weights in network. –Large learning rates lead to big weight changes. –Small learning rates lead to small weight changes. oTo examine effect of learning rate on performance, run simulation so that learning rate is only factor changed. –Start with same random weights & biases. oModelers often use small learning rate to avoid large weight changes. –Large weight changes can be disruptive (learning is undone). –Large weight changes can be counter-productive when network is close to a solution!
9/1/2014Neural Networks166 1.Network menu – New Project option. New project dialogue box appears. 2.Select directory or folder in which to save your project files. Use N: Drive! 3.Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture (.teach,.data, &.cf). 4.Check architecture. 5.Specify training option parameters to determine initial start state of network, learning rate, & momentum. 6.Train network (from Network menu). 7.Determine if network has learned task by checking error rates, examine response to individual patterns, etc. Steps To Building Neural Network in tlearn
9/1/2014Neural Networks167 Bias Node First Input Second Input AND Network : Hinton Diagram
9/1/2014Neural Networks168 Hinton Diagram. White = positive weight. Black = negative weight. Area of box proportional to absolute value of corresponding weight.
9/1/2014Neural Networks169 Logical AND Network Implemented With 2 I & 1 O oOutput unit on (value close to 1.0) when both inputs 1.0. Otherwise off. oWith large - weight from bias unit to output, off by default. oMake weights from input nodes to output large enough that if both nodes are present, net input is great enough to turn output on. –Neither input by itself is large enough to overcome negative bias. Node 0 is bias unit which is always on. So node 1 has a bias.
9/1/2014Neural Networks170 Hinton Diagram Example
9/1/2014Neural Networks171 Weights File in tlearn otlearn keeps up-to-date record of network’s state in weights file. oSaved to disk at regular intervals & at end of training. oLists all connections in network grouped according to received node. oIn and.cf file only 1 receiving node is specified (output node 1).
9/1/2014Neural Networks172 o1st # represents weight on connections from bias node to output node (-2.204). o2nd # (1.328) shows connection from 1 st input node to output. o3rd # (1.36) shows connection from 2 nd input node to output node. oFinal number (0.000) shows connection from output node itself – non-existent due to feedforward nature.
9/1/2014Neural Networks173 Resume Training oCan continue network training by Resume training option on the Network menu. –Extend training by # sweeps & adjust error display to accommodate extra training sweeps. oDoes the RMS error decrease significantly?
9/1/2014Neural Networks174 Several Different Ways to Analyze Weights & Examine Internal Representations 1.Hierarchical clustering of hidden unit activations. 2.Principal component analysis & projection pursuit. 3.Activation patterns in conjunction with actual weights. Examine these methods in detail later in semester!
9/1/2014Neural Networks175 1 - Hierarchical Clustering of Hidden Unit Activations Present test patterns to network after training. Patterns produce activations on hidden units which record & tag -- vectors in multi-dimensional space. Clustering looks at similarity structure of space. Inputs treated as similar by network produce internal representations that are similar. Produces tree format of inner-pattern distance. Can’t examine space directly -- difficult to visualize high- dimensional spaces.
9/1/2014Neural Networks176 2 - Principal Component Analysis & Project Pursuit Used to identify interesting lower-dimensional slices from hierarchical clustering. Move viewing perspective around in this space.
9/1/2014Neural Networks177 3 - Activation Patterns In Conjunction With Actual Weights When look at activation patterns, only look at part of what network “knows.” Network manipulates & transforms info via connections between nodes. Examine connections & weights to see how transformations are being carried out. Hinton diagrams can be used -- weights shown as colored squares with color & size of square representing magnitude & sign of connection.
9/1/2014Neural Networks178 Has Network Solved AND Problem? RMS error = 0.35. Solved? Depends on how define acceptable level of error. –Can’t always use just global error. –Network may have low RMS, but hasn’t solved all input patterns correctly. Exercise 3.3 1.How many times has network seen each input pattern after 1000 sweeps through training set? 2.How small must RMS error be before we can say network has solved problem? Exercise 3.4 1.Compare exact value of RMS to plotted value.
9/1/2014Neural Networks179 What Do We Learn From a Simulation? Are the simulations framed in such way that clearly address some issue? Are the task & stimuli appropriate for points being made? Do you feel you’ve learned something from the simulation?
9/1/2014Neural Networks180 Logical OR oWhat type of network architecture? o2 input, 1 output + bias node oTry the OR network (pg. 57- 62). Input Output Activations Activations (Node 3) Node 0Node 1ANDORXOR 00000 01011 10011 11110
9/1/2014Neural Networks182 Exclusive OR oCreate third project called xor and try the exclusive OR function with input layer and output layer.
9/1/2014Neural Networks183 Neural Network Simulation Software : tlearn, Membrain oSimulations allow examination of how model solved problem. oSimulator needs to be told: –Network architecture. –Training data. –Learning rate & other parameters. oSimulator: –Creates network. –Performs training. –Reports results. oYou can examine results.
9/1/2014Neural Networks184 Tlearn Software 1.Copy win_tlearn.exe from disk or R: drive to N: drive. 2.Double-click on file to begin installation. 3.Executable is called tlearn. ohttp://www.columbia.edu/cu/psychology/courses/ 3205/tlearn/ To download Adobe Acrobat PDF version: ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/Tle arnManual.pdf ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/Tle arnManual.pdf