Presentation is loading. Please wait.

Presentation is loading. Please wait.

4/6/2017 Neural Networks.

Similar presentations


Presentation on theme: "4/6/2017 Neural Networks."— Presentation transcript:

1 4/6/2017 Neural Networks

2 Artificial Neural Network (ANN)
Neural network -- “a machine that is designed to model the way in which the brain performs a particular task or function of interest” (Haykin, 1994, pg. 2). Uses massive interconnection of simple computing cells (neurons or processing units). Acquires knowledge thru learning. Modify synaptic weights of network in orderly fashion to attain desired design objective. Attempts to use ANNs since 1950’s. Abandoned by most by 1970’s. 4/6/2017 Neural Networks

3 Artificial Intelligence (AI)
“A field of study that encompasses computational techniques for performing tasks that apparently require intelligence when performed by humans” (Tanimoto, 1990). Goal to increase our understanding of reasoning, learning, & perceptual processes. Knowledge representation. Search. Fundamental Issues Perception & inference. 4/6/2017 Neural Networks

4 Traditional AI vs. Neural Networks
Programs brittle & overly sensitive to noise. Programs either right or fail completely. Human intelligence much more flexible (guessing). Neural Networks: Capture knowledge in large # of fine-grained units. More potential for partially matching noisy & incomplete data. Knowledge is distributed uniformly across network. Model for parallelism – each neuron is independent unit. Similar to human brains? 4/6/2017 Neural Networks

5 Artificial Neural Networks
Biologically Inspired Computing Parallel Distributed Processing Artificial Neural Networks Machine Learning Algorithms Neural Networks Connectionism Natural Intelligent Systems Neuro-computing

6 Handwriting Neural Network
4/6/2017 Neural Networks

7 http://www.manifestation.com/neurotoys/eliza.php3 4/6/2017
Neural Networks

8 NETtalk (Sejnowski & Rosenberg)
4/6/2017 Neural Networks

9 Human Brain “… a highly complex, nonlinear, and parallel computer (information-processing system). It has the capability to organize its structural constituents, know as neurons, so as to perform certain computations (e.g., pattern recognition, perception, and motor control) many times faster than the fastest digital computer in existence today.” (Haykin, 1999, Neural Networks: A Comprehensive Foundation, pg. 1). 4/6/2017 Neural Networks

10 Approaches to Studying Brain
Know enough neuroscience to understand why computer models make certain approximations. Understand when approximations are good & when bad. Know tools of formal analysis for models. Some simple mathematics. Access to simulator or ability to program. Know enough cognitive science to have some idea of about what the system is supposed to do. 4/6/2017 Neural Networks

11 Why Build Models? “… a model is simply a detailed theory.”
Explicitness – constructing model of theory & implementing it as computer program requires great level of detail. Prediction – difficult to predict consequences of model due to interactions between different parts of model. Connectionist models are non-linear. Discover & test new experiments & novel situations. Practical reasons why difficult to test theory in real world. Systematically vary parameters thru full range of possible values. Help understand why a behavior might occur. Simulations open for direct inspections  explanation of behavior. 4/6/2017 Neural Networks

12 Simulations As Experiments
Easy to do simulations, but difficult to do them well. Running a good simulation like running good experiment. Clearly articulated problem (goal). Well-defined hypothesis, design for testing hypothesis, & plan how to the results. Hypothesis from current issues in literature. E.g., test predictions, replicate observed behaviors, test theory of behavior. Task, stimulus representations & network architectures must be defined. 4/6/2017 Neural Networks

13 What kinds of problems can ANNs help us understand?
Brain of newborn child contains billions of neurons But child can’t perform many cognitive functions. After a few years of receiving continuous streams of signals from outside world via sensory systems, Child can see, understand language & control movements of body. Brain discovers, without being taught, how to make sense of signals from world. How??? Where do you start? 4/6/2017 Neural Networks

14 NN Applications http://www-cs-faculty. stanford
Character recognition Image compression Stock market prediction Traveling salesman problem Medicine, electronic noise, loan applications 4/6/2017 Neural Networks

15 Neural Networks (ACM) Web spam detection by probability mapping graphSOMs and graph neural networks No-reference quality assessment of JPEG images by using CBP neural networks An Embedded Fingerprints Classification System based on Weightless Neural Networks Forecasting Portugal global load with artificial neural networks 2006 Special issue: Neural network forecasts of the tropical Pacific sea surface temperatures Developmental learning of complex syntactical song in the Bengalese finch: A neural network model Neural networks in astronomy 4/6/2017 Neural Networks

16 Artificial & Biological Neural Networks
Build intelligent programs using models that parallel structure of neurons in human brain. Neurons – cell body with dendrites & axon. Dendrites receive signals from other neurons. When combined impulses exceed threshold, neuron fires & impulse passes down axon. Branches at end of axon form synapses with dendrites of other neurons. Excitatory or inhibitory. 4/6/2017 Neural Networks

17 Do Neural Networks Mimic Human Brain?
“It is not absolutely necessary to believe that neural network models have anything to do with the nervous system, … … but it helps. Because, if they do, we are able to use a large body of ideas, experiments, and facts from cognitive science and neuroscience to design, construct, and test networks.” (Anderson, 1997, p. 1) 4/6/2017 Neural Networks

18 Neural Networks Abstract From the Details of Real Neurons
Conductivity delays are neglected. Net input is calculated as weighted sum of input signals. Net input is transformed into an output signal via a simple function (e.g., a threshold function). Output signal is either discrete (e.g., 0 or 1) or it is a real-valued number (e.g., between 0 and 1). 4/6/2017 Neural Networks

19 4/6/2017 Neural Networks

20 ANN Features A series of simple computational elements, called neurons (or nodes, units, cells) Connections between neurons that carry signals Each link (connection) between neurons has a weight that can be modified Each neuron sums the weighted input signals and applies an activation function to determine the output signal (Fausett, 1994). 4/6/2017 Neural Networks

21 Neural Networks Are Composed of Nodes & Connections
Nodes – simple processing units. Similar to neurons – receive inputs from other sources. Excitatory inputs tend to increase neuron’s rate of firing. Inhibitory inputs tend to decrease neuron’s rate of firing. Firing rate changes via real-valued number (activation). Input to node comes from other nodes or from some external source. Fully recurrent network 3-layer feed forward network 4/6/2017 Neural Networks

22 Connections Input travels along connection lines.
Connections between different nodes can have different potency (connection strength) in many models. Strength represented by real-valued number (connection weight). Input from one node to another is multiplied by connection weight. If connection weight is Negative number – input is inhibitory. Positive number – input is excitatory. 4/6/2017 Neural Networks

23 Nodes & Connections Form Various Layers of NN
4/6/2017 Neural Networks

24 A Single Node/Neuron  f(net) Inputs from other nodes
Outputs to other nodes Inputs to node usually summed (  ). Net input passed thru activation function ( f(net) ). Produces node’s activation which is sent to other nodes. Each input line (connection) represents flow of activity from some other neuron or some external source. 4/6/2017 Neural Networks

25 More Complex Model of a Neuron
wk1 wk2 wkp (-) k x1 x2 xp uk Output yk Activation Function Threshold Summing function Synaptic weights of neuron Input signals Linear Combiner Output 4/6/2017 Neural Networks

26 Add up Net Inputs to Node
Each input (from different nodes) is calculated by multiplying activation value of input node by weight on connection (from input node to receiving node). neti =  wijaj Net input to node i j  = sigma (summation) i = receiving node aj = activation on nodes sending to node i wij = weights on connection between nodes j & i. 4/6/2017 Neural Networks

27 Sums (weights * activation) For All Input Nodes
neti =  wijaj j i = 4 (node 4). j = 3 (3 input nodes into node 4). add up wij * ai for all 3 input nodes. 4 4/6/2017 Neural Networks

28 Activation Functions : Node Can Do Several Things With Net Input
Activation (e.g., output) = Input. (f(net)) is Identity function. Simplest case. Threshold must be achieved before activation occurs. Activation function may be non-linear function of input. Resembles sigmoid. Activation function may be linear. Real neurons 4/6/2017 Neural Networks

29 Different Types of NN Possible
Single layer or multi-layer architectures (Hopfield, Kohonen). Data processing. thru network. Feedforward. Recurrent. Variations in nodes. Number of nodes. Types of connections among nodes in network. Learning algorithms. Supervised. Unsupervised (self-organizing). Back propagation learning (training). Implementation. Software or hardware. 4/6/2017 Neural Networks

30 4/6/2017 Neural Networks

31 Steps in Designing a Neural Network
Arrange neurons in various layers. Decide type of connections among neurons for different layers, as well as among neurons within layer. Decide way a neuron receives input & produces output. Determine strength of connection within network by allowing network to learn appropriate values of connection weights via training data set. 4/6/2017 Neural Networks

32 Activation Functions Identity function: f(x) = x for all x
Binary step function: f(x) = 1 if x >= θ; f(x) = 0 if x < θ Continuous log-sigmoid function (Logistic function): f(x) = 1/[1 + exp(-σx)] 4/6/2017 Neural Networks

33 Sigmoid Activation Function
a i = activation (output) of node i net i = net activation flowing into node i e = exponential What output of node will be for any given net input. Graph of relationship (next slide). 4/6/2017 Neural Networks

34 Sigmoid Activation Function Often Used for Nodes in NN
For wide range of inputs (> 4.0 & < -4.0), nodes exhibit all or nothing. Output max. value of 1(on). Output min. value of 0 (off). Within range of –4.0 to 4.0, nodes show greater sensitivity. Output capable of making fine discriminations between different inputs. Non-linear response is at heart of what makes these networks interesting. nothing or all 4/6/2017

35 If node 2 receives input of 1.25, activation of 0.777.
What will be the activation of node 2, assuming the input you just calculated? If node 2 receives input of 1.25, activation of Activation function scales from 0.0 to 1.0. When net input = 0.0, net output is exact mid-range of possible activation (0.5). Negative inputs 4/6/2017 Neural Networks

36 Example 2-Layered Feedforward Network : Step Thru Process
Neural network consists of collection of nodes. Number & arrangement of nodes defines network architecture. Example 2-layered feedforward. 2 layers (input, output). no intra-level connections. no recurrent connections. single connection into input nodes & out of output nodes. Very simplified in comparison to biological neural network! a2 a0 a1 w20 w21 Output nodes Input nodes 2-layered feedforward network 4/6/2017 Neural Networks

37 Each input node has certain level of activity associated with it.
2 input nodes (a0, a1). 2 output nodes (a2, a3). Look at one output unit (a2). Receives input from a0 & a1 via independent connections. Amount depends on activation values of input nodes (a0 & a1) and weights (w20, w21). For this network, activity flows in 1 direction along connections. e.g., w20  w02 w02 doesn’t exist Total input to node 2 (a2) = w20a0 + w21a1. wij = 20 when i = 0 & j = 2 4/6/2017 Neural Networks

38 What is the input received by node 2?
Exercise 1.1 What is the input received by node 2? Net input for node 2 = (1.0 * 0.75) + (1.0 * 0.5) = 1.25 Net input alone doesn’t determine activity of output node. Must know activation function of node. Assume nodes have activation functions shown in EQ 1.2 (& Fig. 1.3). Next slide shows sample inputs & activations produced - assuming logistic activation function. a2 1 0.75 0.5 4/6/2017 Neural Networks

39 Bias Node (Default Activation)
In absence of any input (i.e. input = 0.), nodes have output of 0.5. Useful to allow nodes to have default activation. Node is “off” (output 0.0) in absence of input. Or can have default state where node is “on”. Accomplish this by adding node to network which receives no inputs, but is always fully activated & outputs 1.0 (bias node). Node can be connected to any node in network. Often connected to all nodes except input nodes. Allow weights on connections from this node to receiving nodes to be different. 4/6/2017 Neural Networks

40 Only need one bias node per network.
Guarantees that all receiving nodes have some input even if all other nodes are off. Since output of bias node is always 1.0, input it sends to any other node is 1.0 * wij (value of weight itself). Only need one bias node per network. Similar to giving each node a variable threshold. large negative bias == node is off (activation close to 0.0) unless gets sufficient positive input from other sources to compensate. large positive bias == receiving node is on & requires negative input from other nodes to turn it off. Useful to allow individual nodes to have different defaults. 4/6/2017 Neural Networks

41 Learning From Experience
Changing of neural networks connection weights (training) causes network to learn solution to a problem. Strength of connection between neurons stored as weight-value for specific connection. System learns new knowledge by adjusting these connection weights. 4/6/2017 Neural Networks

42 Three Training Methods for NN
Unsupervised learning – hidden neurons must find a way to organize themselves without help from outside. No sample outputs provided to network against which it can measure its predictive performance for given vector of inputs. Learning by doing. 4/6/2017

43 2. Supervised Learning (Reinforcement)
works on reinforcement from outside. Connections among neurons in hidden layer randomly arranged, then reshuffled as network told how close it is to solution. Requires teacher -- training set of data or observer who grades performance of network results. Both unsupervised & supervised suffer from relative slowness & inefficiency relying on random shuffling to find proper connection weights. 4/6/2017 Neural Networks

44 3. Back Propagation Network given reinforcement for how it is doing on task plus information about errors is used to adjust connections between layers. Proven highly successful in training of multilayered neural nets. Form of supervised learning. 4/6/2017 Neural Networks

45 Example Learning Algorithms
Hebb’s Rule -- how physical networks might learn. Perceptron Convergence Procedures (PCP). Widrow-Hoff Learning Rule (1960s). Hopfield. Backpropagation of Error (Generalized Delta Rule). Kohonen’s Learning Laws (not covered here). 4/6/2017 Neural Networks

46 McCulloch-Pitts (1943) Neuron
Activity of neuron is an “all-or-none” process. Certain fixed number of synapses must be excited within period of latent addition to excite neuron at any time. Number is independent of previous activity & position of neuron. Only significant delay within nervous system is synaptic delay. Activity of any inhibitory synapse absolutely prevents excitation of neuron at that time. Structure of net does not change with time. 4/6/2017 Neural Networks

47 McColloch-Pitts Neuron
Firing within a neuron is controlled by a fixed threshold (θ). binary step function: f(x) = 1 if x >= θ; f(x) = 0 if x < θ. What happens here if θ = 2? 4/6/2017 Neural Networks

48 McColloch-Pitts Neuron AND
Threshold = 2 Does a2 fire? P Q P ^ Q (P and Q) T F 4/6/2017 Neural Networks

49 McColloch-Pitts Neuron OR
Threshold = 2 Does a2 fire? P Q P V Q (P or Q) T F 4/6/2017 Neural Networks

50 McColloch-Pitts Neuron XOR
Threshold = 2 Does a2 fire? P Q P XOR Q (P xor Q) T F 4/6/2017 Neural Networks

51 McColloch-Pitts Neuron AND NOT
Did you get weights of 2 for w20 and -1 for w21? 4/6/2017 Neural Networks

52 McColloch-Pitts Neuron
No learning algorithms 4/6/2017 Neural Networks

53 Hebb : The Organization of Behavior (1949)
When an axon of cell A is near enough to excite a cell B & repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” If neuron receives input from another neuron & if both highly active, weight between neurons should be strengthened. Specific synaptic change (Hebb synapse) which underlies learning. Result was interconnections between large, diffuse set of cells, in different parts of brain called “cell assemblies.” Changes suggested by Rochester et al. (1956) make more practical model. 4/6/2017 Neural Networks

54 Hebb’s Rule: Associative learning “Cells that fire together, wire together”
wij = ai aj where change in weight = product of activations of nodes that are connected to it. wij = ηai aj where η is the learning rate Unsupervised learning Success at learning some patterns it only learns these patterns (e.g., pair-wise correlations). There will be times when want ANN to learn to associate a pattern with some desired behaviors even when there is no pair-wise correlation 4/6/2017 Neural Networks

55 Pros & Cons of Hebbian Learning
Known biological mechanisms that might use Hebbian Learning. Provides reasonable answer to “where does teacher info for learning process come from?” Lots of useful info in correlated activity. System just needs to look for patterns. All it can learn is pair-wise correlations. May need to learn to associate patterns with desired behaviors even if patterns aren’t pair-wise. Hebb rule can’t do this. 4/6/2017 CS 271 Ch. 4

56 Perceptron Convergence Procedures (PCP)
Variations of Hebb’s Rule from 1960s. Perceptron (Rosenblatt, 1958). Widrow-Hoff rule is similar to PCP (1960). Start with network of units with connections initialized with random weights. Take target set of input/output patterns & adjust weights automatically so at end of training weights yield correct outputs for any input. Network should generalize to produce correct output for input patterns it hasn’t seen during training. gradient descent rule, Delta rule or Adaline rule 4/6/2017 CS 271 Ch. 4

57 4/6/2017 CS 271 Ch. 4

58 Widrow-Hoff Rule starts with connections initialized with random weights and one input pattern is presented to the network. For each input pattern, the network’s actual output is compared to the target output for that pattern. Figure 18: Supervised (Delta Rule) vs. Unsupervised (Perceptron) Learning (www.willamette.edu/~gorr/classes/cs449/Classification/delta.html) 4/6/2017 Neural Networks

59 Only works for simple, 2-layer networks (I/O units).
Any discrepancy (error) used as basis for changing weights on input connections & changing output node’s threshold for activation. How much weights are changed depends on error produced & activation from given input. Correction is proportional to error signal multiplied by value of activation given by derivative of transfer function. Using derivative allows making finely tuned corrections when activation is near its extreme values (minimum or maximum) & larger corrections when activation is in middle range. Goal of the Widrow-Hoff Rule is to minimize error on output unit by apportioning credit & blame to the input nodes. Only works for simple, 2-layer networks (I/O units). 4/6/2017 Neural Networks

60 Using Similarity Basic principle that drives learning
Allows generalization of behaviors because similar inputs tend to yield similar outputs. vs “make” and “bake”  “made” and “baked” Cats and tigers Similarity is generally a good rule of thumb, but not in every case. Hebbian networks & basic, 2-layer PCP networks can only learn to generalize on basis of physical similarity 4/6/2017 Neural Networks

61 2-layer Perceptron Can’t Solve Problem of Boolean XOR
If want output to be true (1). At least 1 input must be 1 & at least 1 weight must be large enough so when multiplied, output node turns on. For patterns (00 & 11) want 0 so set weights to 0. For patterns (01 & 10), need weights from either input large enough so 1 input alone activates output. Contradictory requirements -- no set of weights allows output to come on if either input on & keeps it off if both are on! Node 0 Node 1 XOR 1 w21 a0 a1 a2 w20 4/6/2017 CS 271 Ch. 4

62 Vectors Vector -- collection of numbers or point in space.
1 0,1 1,1 0,0 1,0 Vectors Vector -- collection of numbers or point in space. Can think of inputs in XOR example as 2-D space. With each number indicating how far out along the dimension the point is located. Judge similarity of 2 vectors by Euclidean distance in space. Pairs of patterns furthest apart & most dissimilar (00 & 11) are ones need to group together for XOR function. 4/6/2017 CS 271 Ch. 4

63 I/O weights impose linear decision bound on input space.
1 0,1 1,1 0,0 1,0 1 0,1 1,1 0,0 1,0 1 0,1 1,1 0,0 1,0 AND OR XOR I/O weights impose linear decision bound on input space. Patterns which fall on 1 side of decision line classified differently than patterns on other side. When groups of inputs can’t be separated by line, no way for unit to discriminate between categories. Problems called non-linearly separable. What’s needed are hidden units & learning algorithms that can handle more than one layer. 4/6/2017 CS 271 Ch. 4

64 Solving the XOR Problem : Allow Internal Representation
Add extra node(s) between I &  XOR problem solved. “Hidden” units equivalent to internal representations & aren’t seen by world. Very powerful -- networks have internal representations that capture more abstract, functional relationships. Inputs (sensors), outputs (motor effectors) & hidden (inter-neurons). Input similarity still important . All things being equal, physical resemblance of inputs exerts strong pressure to induce similar responses. CS 271 Ch. 4

65 Hidden Units & XOR Problem
1 0,1 1,1 0,0 1,0 1,1 0,0 1, ,1 0, ,0 Input 1 (a) Hidden unit 1 (b) Output (c) (a) what input looks like to network showing intrinsic similarity structure of inputs. Input vectors are passed through weights between inputs & hidden units (multiplied); transforms (folds) input space to produce (b). (b) 2 most distinct patterns (11, 00) are close in hidden space. Weights to output unit can impose linear decision bound & classify output (c). CS 271 Ch. 4

66 Hidden Units Used to Construct Internal Representations of External World
Hidden units make it possible for network to treat physically similar inputs as different, as needed. Transform input representations to more abstract kinds of representations. Solve difficult problems like XOR. However, being able to solve problem, just means that some set of weights exist -- in principle. Network must be able to learn these weights! Real challenge is how to train networks! One solution -- backpropagation of error. 4/6/2017 CS 271 Ch. 4

67 Can’t specify error at this level of network.
Earlier Laws (PCP) Can’t Handle Hidden Layers Since Don’t Know How to Change Weights To Them PCP & others work well for weights leading to outputs since have target for Output & can calculate weight changes. Problem occurs when have hidden units -- how to change weights from inputs to hidden units? With these algorithms must know how much error is already apparent at level of Hidden before Output is activated. Don’t have predefined target for H, so can’t say what their activation levels should be. Can’t specify error at this level of network. 4/6/2017 CS 271 Ch. 4

68 Hopfield Recurrent ANN
They are guaranteed to converge to a local minimum, but convergence to one of the stored patterns is not guaranteed 4/6/2017 Neural Networks

69 Backpropagation of Error (Rummelhart, Hinton & Williams, 1986)
AKA Generalized Delta Rule. (δ) (Rummelhart, Hinton & Williams, 1986) Begin with network which has been assigned initial weights drawn at random. Usually from uniform distribution with mean of 0.0 & some user-defined upper & lower bounds ( ±1.0). User has set of training data in form of input/output pairs. Goal of training -- learn single set of weights such that any input pattern will produce correct output pattern. Desired if weights allow network to generalize to novel data not seen during training. 4/6/2017 Neural Networks

70 Backprop Extremely powerful learning tool.
Applied over wide range of domains. Provides very general framework for learning. Implements gradient descent search in space of possible network weights to minimize network error. What counts as error is up to modeler. Usually squared difference between target & actual output, but any quantity that is affected by weights may be minimized. 4/6/2017 Neural Networks

71 Backprop Training Takes 4 Steps
Select I/O pattern (usually at random). Compare network’s output with desired output (teacher pattern) on node-by-node basis & calculate error for each output node. Propagate error info backwards in network from output to hidden. Adjust weights on connections to reduce errors. 4/6/2017 CS 271 Ch. 4

72 1. Select I/O pattern Pattern usually selected at random.
Input pattern used to activate network & activation values for output nodes are calculated. Can have additional nodes between I/O (“hidden”). Since weights selected at random, outputs generated at start are typically not those that go with input pattern. 4/6/2017 CS 271 Ch. 4

73 2: Calculate Delta ( ip ) Error (EQ 1.3)
 ip = (tip - oip) f’(netip) = (tip - oip) o ip (1-oip) ( ip ) = difference in value between target for node i on training pattern p (target ip) and actual output for that node on that pattern (oip) multiplied by derivative of output node’s activation function given its input. f’(net ip) = slope of activation function. EQ 1.2, Fig steepest around middle of function where net input closest to 0. 4/6/2017 CS 271 Ch. 4

74 For large values of net input to node (+ & -), derivative is small.
( ip ) will be small. Net input to node tends to be large when connections feeding into it are strong. Weak connections tend to yield small input to node. Activation function is large & ( ip ) can be large. 4/6/2017 CS 271 Ch. 4

75 Weight Changes in the Delta Rule
Error Weight x Weight y Ideal weight vector Delta vector Current weight vector Weight Changes in the Delta Rule New weight vector 4/6/2017 CS 271 Ch. 4

76 Gradient Descent Learning Rule
Moves weight vector from current position on bowl to new position closer to minimum error by falling down the negative gradient of the bowl. Not guaranteed to find correct answer. Always goes down hill & may get stuck in local minimum. Use momentum to “push” changes in same direction & possibly keep network from getting stuck. 4/6/2017 CS 271 Ch. 4

77 Backprop: Calculate Weight Adjustments
Know, for each output node, how far off target value is. Must adjust weights on connections that feed into it to reduce error. Want to change weight on connections from every nodej coming into current node i so that can reduce error on pattern. error weights Learning rate

78 Partial derivative – rate of change.
May be other variables, but they’re being held constant. Measures how quantity on top changes when quantity on bottom is changed. i.e., how is error (E) affected by changing weights (w)? If know this, know how to change weight to decrease error. i.e., to decrease discrepancy between what network outputs & what we want it to output. 4/6/2017 CS 271 Ch. 4

79 Large values are in the mid-range.
Partial derivative is bell shaped for sigmoidal curves (threshold function). Large values are in the mid-range. Contributes to stability of network – as outputs approach 0 or 1, only small changes occur. Helps compensate for excessive blame attached to hidden nodes. () = Learning Rate. Convert partial derivative in EQ 1.4 to EQ 1.5. 4/6/2017 CS 271 Ch. 4

80 Backprop: Delta Rule (EQ 1.5) wij =   ip ojp
Make changes small -- learning rate () set to less than 1.0 so that changes aren’t too drastic. Change in weight depends on error have for unit ( ip ). Take output into account (ojp) since node’s error is related to how much (mis)information it has received from another node. If node is highly active & contributed lots to current activation, then responsible for much of current error. If node inactive to unit i, won’t contribute to i’s error. 4/6/2017 CS 271 Ch. 4

81 Delta Rule continued wij =   ip ojp
 ip reflects error on unit i for input pattern p. Difference between target & output. Also includes partial derivative (EQ 1.4). Calculate errors on all output nodes & weight changes on connections coming into them. Don’t yet make any changes. wij =   ip ojp 4/6/2017 CS 271 Ch. 4

82 3. Propagate error info backwards from output to hidden
Assume shared blame of hidden unit on basis of: What errors on O unit H unit is activating and Strength of connection between H & each O it connects to. Move to hidden layer(s), if any, & use EQ 1.5 to change weights leading into hidden units from below. Can’t use EQ 1.3 to compute H nodes’ errors since no given target to make comparison with. H nodes “inherit” errors of all nodes they’ve activated. If nodes activated by H unit have large errors, then H unit shares blame. 4/6/2017 CS 271 Ch. 4

83 k indexes output node feeding back to hidden node.
Calculate error by summing up errors of nodes it activates multiplied by weight between nodes since it will have effect. i = hidden node p = current pattern k indexes output node feeding back to hidden node. derivative of hidden unit’s activation function multiplied in. Continues iteratively down thru network (backpropagation of error)… 4/6/2017 CS 271 Ch. 4

84 4. Adjust weights on connections to reduce errors
When reach layer above input layer (no incoming weights), actually impose the weight changes. Error flow 4/6/2017 CS 271 Ch. 4

85 Backprop Pros & Cons Extremely powerful learning tool that is applied over wide range of domains. Provides very general framework for learning. Implements gradient descent search. What counts as error is up to modeler. Usually squared difference between target & actual output. Any quantity that is affected by weights may be minimized. Requires large # presentations of input dat to learn. Each presentation requires 2 passes thru network (forward & backward). Each pass is complex computationally. 4/6/2017 CS 271 Ch. 4

86 4/6/2017 Neural Networks

87 Kohonen 4/6/2017 Neural Networks

88 3 Ways Developmental Models Handle Change
Development results from working out predetermined behaviors. Change is the triggering of innate knowledge. Change is inductive learning. Learning involves copying or internalizing behaviors present in the environment. Change arises through interaction of maturational factors, under genetic control, and environment. Progress in neurosciences. Computational framework good for exploring & modeling. 4/6/2017 Neural Networks

89 Biologically-Oriented Connectionism (Elman, et al)
We think it is critical to pay attention to what is known about genetic basis for behavior & about developmental neuroscience. At level of computational & modeling, believe it is important to understand sorts of computations that can plausibly be carried out in neural systems. We take a broad view of biology which includes concern for evolutionary basis for behavior. A broader biological perspective emphasizes adaptive aspects of behaviors & recognizes that to understand adaptation requires attention to environment. 4/6/2017 Neural Networks

90 Connectionist Models Cognitive functions performed by system that computes with simple neuron-like elements, acting in parallel, on distributed representations. Have precisely matched data from human subject experiments. Measure speed of reading words – depends on frequency of word & regularity of pronunciation pattern. (E.g., GAVE, HAVE). Similar pattern (humans – latency, NN – errors). Fig. P.1 on pg. 3 (McLeod, Plunkett, Rolls) 4/6/2017 Neural Networks

91 4/6/2017 Neural Networks

92 Connectionist models can predict results.
Suggest areas of investigation E.g., U-shape learning or Over-generalization problems when kids learn past tense of verbs (WENT – GOED) suggests linguistic development occurs in stages. NN model produced over-regularization errors. Fig. P.2. (McLeod, Plunkett, Rolls) 4/6/2017 Neural Networks

93 4/6/2017 Neural Networks

94 E.g., face recognition from various angles.
Connectionist models have suggested solutions to some of the oldest problems in cognitive science. E.g., face recognition from various angles. View invariance – respond to one particular face (regardless of view) & not the other faces. E.g., face 3 in Fig. P.3. (McLeod, Plunkett, Rolls) 4/6/2017 Neural Networks

95 4/6/2017 Neural Networks

96 4/6/2017 Neural Networks

97 4/6/2017 Neural Networks

98 Task When train network, want it to produce some behavior.
Task – behavior that are training network to do. E.g., associate present tense form of verb with past tense form. Task must be precisely defined – for class of networks we’re dealing with – learning correct output for a given input. Set of input stimuli. Correct output is paired with each input. Training Environment 4/6/2017 Neural Networks

99 Implications of Defining the Task
Must conceptualize behavior in terms of inputs & outputs. May need abstract notion of input & output. E.g., associate 2 forms of verb – neither is really input for other. Teach network task by example, not by explicit rule. If successful, network learns underlying relationship between input & output by induction. Can’t assume network has learned generalization we assume underlies behavior – may have learned some other behavior! Eg., tanks. 4/6/2017 Neural Networks

100 1980s Pentagon trained NN to recognize tanks
4/6/2017 Neural Networks

101 Implications - 2 Nature of training data is extremely important for learning. The more data you give a network, the better. With too little data, may make bad generalization. Quality counts too!! – structure of environment influences outcome. Some tasks more convincing/more effective/more informative than others to demonstrate a point. Is info represented in teacher (output) plausibly available to human learners? E.g., children? See task on next slide. 4/6/2017 Neural Networks

102 Two Ways to Teach Network to Segment Sounds into Words
Expose network to sequences of sounds (present one at time, in order, with no breaks between words). Train network to produce “yes” when sequence makes word. Explicitly learns about words from info where words start. Train network on different task – given same sequences of sounds as input, but task is to predict next sound. At beginning of word, network makes many mistakes. As it hears more of word, prediction error declines until end of word. Learns about words implicitly as indirect consequence of task. First approach -- gives away secret by directly teaching task (boundary info) which is NOT how children learn. 4/6/2017 Neural Networks

103 Network Architectures : Number & Arrangement of Nodes in Network
Single-layer feedforward networks -- input layer that projects onto output layer of neurons in one direction. Multilayer feedforward network -- has 1+ hidden layers that intervene between external input & network output. 4/6/2017 Neural Networks

104 Network Architectures : Number & Arrangement of Nodes in Network
Recurrent network -- has at least 1 feedback loop. Lattice structure -- 1-D, 2-D or greater arrays of neurons with output neurons arranged in rows & columns. 4/6/2017 Neural Networks

105 Most Neural Networks Consists of 3 Layers
4/6/2017 Neural Networks

106 6 Different Types of Connections Used Between Layers (Inter-layer Connections)
Fully connected. Each neuron on first layer is connected to every neuron on second layer. Partially connected. Neuron of first layer does not have to be connected to all neurons on second layer. Feed forward. Neurons on first layer send their output to neurons on second layer, but receive no input back from neurons on second layer. 4/6/2017 Neural Networks

107 Resonance.Layers have bi-directional connections.
Bi-directional (recurrent). .Another set of connections carrying output of neurons of second layer into neurons of first layer. Hierarchical. Neurons of lower layer may only communicate with neurons on next level of layer. Resonance.Layers have bi-directional connections. Can continue sending messages across connections number of times until certain condition is achieved. 4/6/2017 Neural Networks

108 How to Select Correct Network Architectures
Any task can be solved by some neural network (in theory) – not any neural network can solve any task. Number & arrangement of nodes defines network architecture. Textbook uses: 1) feedforward. 2) simple recurrent networks. # nodes depends on task & how I/O are represented. E.g., if images input in 100x100 dot array -- 10,000 I nodes. Selection of architecture reflects modeler’s theory about what info processing is required for task. 4/6/2017 Neural Networks

109 Analysis Train network on task.
Evaluate network’s performance & try to understand basis for performance. Need to anticipate kinds of tests before training! Ways to evaluate network performance: Global error. Individual pattern error. Analyzing weights & internal representations. 4/6/2017 Neural Networks

110 Evaluate Network Performance: Global Error
During training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce. Simulator reports this error on-line -- sum it over number of patterns. As learning occurs, error should decline & reach 0. If network is trained on task in which same input can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0. 4/6/2017 Neural Networks

111 Evaluate Network Performance: Individual Pattern Error
Global error can be misleading. If have large # of patterns to learn, global error may be low even if some patterns are not learned correctly. These may be the interesting patterns. Also may want to create special test stimuli not presented to network during training. Generalize to novel cases? What has network learned? Helps discover what generalizations have been created from a finite data set. 4/6/2017 Neural Networks

112 Evaluate Network Performance: Analyzing Weights & Internal Representations
Hierarchical clustering of hidden unit activations. Principal component analysis & projection pursuit. Activation patterns in conjunction with actual weights. 4/6/2017 Neural Networks

113 Hierarchical Clustering of Hidden Unit Activations
Present test patterns to network after training. Patterns produce activations on hidden units which record & tag -- vectors in multi-dimensional space. Clustering looks at similarity structure of space. Inputs treated as similar by network produce internal representations that are similar. Produces tree format of inner-pattern distance. Can’t examine space directly -- difficult to visualize high-dimensional spaces. 4/6/2017 Neural Networks

114 Principal Component Analysis & Project Pursuit
Used to identify interesting lower-dimensional slices from hierarchical clustering. Move viewing perspective around in this space. 4/6/2017 Neural Networks

115 Activation Patterns in Conjunction With Actual Weights
When look at activation patterns, only look at part of what network “knows.” Network manipulates & transforms info via connections between nodes. Examine connections & weights to see how transformations are being carried out. Hinton diagrams can be used -- weights shown as colored squares with color & size of square representing magnitude & sign of connection. 4/6/2017 Neural Networks

116 4/6/2017 Neural Networks

117 White = positive weight. Black = negative weight.
Hinton Diagram. White = positive weight. Black = negative weight. Area of box proportional to absolute value of corresponding weight. 4/6/2017 Neural Networks

118 What Do We Learn From a Simulation?
Are the simulations framed in such way that clearly address some issue? Are the task & stimuli appropriate for points being made? Do you feel you’ve learned something from the simulation? 4/6/2017 Neural Networks

119 Uses of Neural Networks
Prediction -- Use input values to predict some output. E.g. pick best stocks, predict weather, identify cancer risk people. Classification -- Use input values to determine classification. E.g. is input letter A; is blob of video data a plane & what kind? Data association -- Recognize data that contains errors. E.g. identify characters when scanner is not working properly. Data Conceptualization -- Analyze inputs so that grouping relationships can be inferred. E.g. extract from database names most likely to buy product. Data Filtering -- Smooth an input signal. E.g. take the noise out of a telephone signal. 4/6/2017 Neural Networks

120 Send In The Robots http://www. spacedaily. com/news/robot-01b
Send In The Robots by Annie Strickler and Patrick Barry for NASA Science News Pasadena - May 29, 2001 As a project scientist specializing in artificial intelligence at NASA's Jet Propulsion Laboratory (JPL), Ayanna is part of a team that applies creative energy to a new generation of space missions -- planetary and moon surface explorations led by autonomous robots capable of "thinking" for themselves. Nearly all of today's robotic space probes are inflexible in how they respond to the challenges they encounter (one notable exception is Deep Space 1, which employs artificial intelligence technologies). They can only perform actions that are explicitly written into their software or radioed from a human controller on Earth. When exploring unfamiliar planets millions of miles from Earth, this "obedient dog" variety of robot requires constant attention from humans. In contrast, the ultimate goal for Ayanna and her colleagues is "putting a robot on Mars and walking away, leaving it to work without direct human interaction." 4/6/2017 Neural Networks

121 "We want to tell the robot to think about any obstacle it encounters just as an astronaut in the same situation would do," she says. "Our job is to help the robot think in more logical terms about turning left or right, not just by how many degrees." … To do this, Ayanna rely on 2 concepts in field of artificial intelligence: "fuzzy logic" & "neural networks." … Neural networks also have ability to learn from experience. This shouldn't be too surprising, since design of neural networks mimics way brain cells process information. "Neural networks allow you to associate general input to a specific output," Ayanna says. "When someone sees four legs and hears a bark (the input), their experience lets them know it is a dog (the output)." This feature of neural networks will allow a robot pioneer to choose behaviors based on the general features of its surroundings, much like humans do. “ 4/6/2017 Neural Networks

122 By combining these two technologies, Ayanna and her colleagues at JPL hope to create a robot "brain" that can learn on its own how to expertly traverse the alien terrains of other planets. Such a brainy 'bot might sound more like the science fiction fantasies of children's comics than a real NASA project, but Ayanna thinks the sci-fi flavor of the project contributes to its importance for space exploration. Ayanna -- who wanted to be television's "Bionic Woman" when she was young, and later decided she wanted to try to build her instead -- says she believes that the flights of imagination common in childhood translate into adult scientific achievement. "I truly believe science fiction drives real science forward," she says. "You must have imagination to go to the next level." 4/6/2017 Neural Networks

123 Learning to Use tlearn Define task. Define architecture.
Setting up simulator. Configuration (.cf) file. Data (.data) file . Teach (.teach) file. Check architecture. Run simulation. Global error. Pattern error. Examine weights. Role of start state. Role of learning state. Try: Logical Or. Exclusive Or. 4/6/2017 Neural Networks

124 Define Task Train neural network to map Boolean functions AND, OR, EXCLUSIVE OR (XOR). Boolean functions take set of inputs (1, 0) & decide if given input falls into positive or negative category. Input & output activation values of nodes in network with 2 input units & 1 output unit. Networks simple & relatively easy to construct for task. Many of problems encounter with this task have direct implications for more complex problems. 4/6/2017 Neural Networks

125 Boolean Functions AND, OR, XOR
Input Output Activations Activations (Node 3) Node 0 Node 1 AND OR XOR 4 possible input combinations 22 4/6/2017 Neural Networks

126 Define Architecture for AND Function
4 input patterns & 2 distinct outputs. Each input pattern has 2 activation values. Each output has single activation. For every input pattern, have well-defined output. Use simple feedforward network with 2 I units & 1 O unit. w21 a0 a1 a2 w20 Single Layer Perceptron – 1 layer of weights. 4/6/2017

127 4/6/2017 Neural Networks

128 Network menu – New Project option. New project dialogue box appears.
Select directory or folder in which to save your project files. Use N: Drive! Call project and. All files associated with project should have same name (any name you want). Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture. and.teach – defines output patterns to network, how many & format. and.data – defines input patterns to network, how many & format. and.cf – used to define # nodes in network & initial pattern of connectivity between nodes before training. 4/6/2017 Neural Networks

129 Info Stored in .cf, .data & .teach Files
Can use editor of tlearn. Or text editor or word processor. Must save files in ASCII format (text). Enter data for and.cf file. Follow upper- & lower-case distinctions, spaces & colons. Use delete or backspace keys to correct errors. File Save command in tlearn. 4/6/2017 Neural Networks

130 1 AND 1 = 1 0 AND 0 = 0 0 AND 1 = 0 1 AND 0 = 0 INPUT CONFIGURATION
OUTPUT 4/6/2017 Neural Networks

131 3 sections: NODES: CONNECTIONS: SPECIAL Key to setting up simulator.
Describes configuration of network. Conforms to fairly rigid format. 3 sections: NODES: CONNECTIONS: SPECIAL 4/6/2017 Neural Networks

132 NODES: NODES: Beginning of nodes section
nodes = 1 # units in network (not input) inputs = 2 # input units (counted separately) outputs = 1 # output units in network output node is 1 identifies output unit – only 1 non- input node in network. Start at 1. Inputs don’t count as nodes. Output nodes are < node-list>. Spaces are critical. 4/6/2017 Neural Networks

133 CONNECTIONS: CONNECTIONS : Beginning of section
groups = 0 How many groups of connections are constrained to have same value. 1 from i1-i2 Indicates node 1 (output) receives input from 2 input units. Input units given prefix i. 1 from 0 Node 0 is bias unit which is always on. So node 1 has a bias. All connections in a group are identical strength. groups = 0 is common. 4/6/2017 Neural Networks

134 Nodes numbered counting from 1.
<node-list> from <node-list> provides info about connections <node-list> is comma-separated list of node # with dashes indicating that intermediate node # are included. 1 from i1-i2 Contains no spaces. Nodes numbered counting from 1. Inputs are numbered, counting from 1, with i prefix. Node 0 always outputs a 1 & serves as bias node. If biases are desired, connections must be specified from node 0 to specific other nodes. 1 from 0 4/6/2017 Neural Networks

135 SPECIAL: SPECIAL: Beginning of section
selected = 1 Which units selected for special printout. Output node (1) is selected. weight_limit = 1.00 Sets start weights (from I to O & biases to O) randomly in range of +/ Optional lines can specify if : linear = <node-list> some nodes linear bipolar = <node-list> values range from –1 to 1 selected = <node-list> nodes selected for special printout 4/6/2017 Neural Networks

136 Data (.data) File Defines input patterns presented to tlearn.
First line is either: distributed (normal) – set of vectors with i values. localist (only few numbers of many input lines are non-zero). Second line is integer specifying number of input vectors to follow. Remainder of file consists of input. Integers or floating-point numbers. 4/6/2017 Neural Networks

137 4/6/2017 Neural Networks

138 Teach (.teach) File Required whenever learning is to be performed.
First line: distributed (normal) localist (only few of many target values nonzero). Integer specifying # output vectors to follow. Ordering of output pattern matches ordering of corresponding input patterns in .data file. In normal (distributed), each output vector contains o floating point or integer numbers. o = number of outputs in network. can use * instead of a floating point number to indicate “don’t care”. 4/6/2017 Neural Networks

139 4/6/2017 Neural Networks

140 Checking the Architecture
If typed in info to and.cf, and.data & and.teach files correctly should have no problems. tlearn offers check of and.cf by displaying picture of network architecture. Displays menu, Network Architecture option. Can change how see nodes, but doesn’t change contents of network configuration file. Get error message if mistake in syntax of training files. Doesn’t not find incorrect entries in data!! 4/6/2017 Neural Networks

141 4/6/2017 Neural Networks

142 Running the Simulation
Specify 3 input files (.cf, .data, .teach) & save them. Specify parameters for tlearn to determine initial start state of network, learning rate, & momentum. Network menu, training options. 4/6/2017 Neural Networks

143 # training sweeps before stop
training sweep is 1 presentation of input pattern causing activation to propagate thru network & appropriate weight adjustments to be carried out. Order in which patterns are presented to network determined by : train sequentially – presents patterns in order they appear in .data & .teach files. train randomly – presents patterns in random order. Learning Rate – determines how fast weights are changed in response to a given error signal. set to 0.100 Momentum –discussed later. set to 0.0 4/6/2017 Neural Networks

144 .cf file specifies weight_limit
Initial state of network determined by weight values assigned to connections before training starts. .cf file specifies weight_limit Weights assigned according to random seed indicated by number next to Seed with: button. Select any number you like. Simulation can be replicated using the same random seed – initial start weights of network are identical & patterns are sampled in same random order. Seed randomly – computer selects random seed. Both Seed with & Seed randomly select set of random start weights within the limits specified by weight_limit parameter. 4/6/2017 Neural Networks

145 Train the Network Once set training options, select Train the network from Network menu. Get tlearn Status display. # sweeps Abort, dump current state in weights file. Iconify – clear screen for other tasks while tlearn runs in background. 4/6/2017 Neural Networks

146 Has the Network Solved the Problem?
Examine global error produced at output nodes averaged across patterns. Examine response of network to individual input patterns. Analyzing weights & internal representations. 4/6/2017 Neural Networks

147 Examine Global Error During training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce. Simulator reports this error on-line -- sum it over a number of patterns. As learning occurs, error should decline & reach 0. If network is trained on task in which same input can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0. Error calculated by subtracting actual response from desired (target) response. Value of discrepancy is either: Positive if target greater than actual output. Negative if actual output is greater than target output. 4/6/2017 Neural Networks

148 Root Mean Square (RMS) Error
Global error – average error across 4 pairs at a given point in training. tlearn provides Root Mean Square error (RMS) to prevent cancellation of positive & negative numbers. Average of the squared errors for all patterns. Returns square root of average. 4/6/2017 Neural Networks

149 AND Network Tracks RMS error throughout training (every 100 sweeps).
Error decreases as training continues … after 1000 sweeps RMS error = 0.35. Average output error = 0.35 Output off target by approx averaged across 4 patterns. 4/6/2017 Neural Networks

150 k indicates number of input patterns (4 for AND)
Equation 3.1 k indicates number of input patterns (4 for AND) ok is vector of output activations produced by input pattern k number of elements in vector corresponds to number of output nodes. e.g., in this case (AND), only one output node so vector contains only 1 element. vector tk specifies desired or target activations for input pattern k. With 1000 sweeps & 4 input patterns, network sees each pattern 250 approximately. 4/6/2017 Neural Networks

151 Given RMS error = 0.35, has the network learned the AND function?
Depends on how define acceptable level of error. Activation function of output unit is sigmoid function (EQ 1.2). Activation curve never reaches 1.0 or 0.0 Net input to node would need to be ± infinity. Always some residual finite error. So what level of error is acceptable? No right answer. Can say all outputs be within 0.1 of target. Can round off activation values & ones closest to 1.0 are correct if target is 1.0. 4/6/2017 Neural Networks

152 Has Network Solved Problem?
RMS error = Solved? Depends on how define acceptable level of error. Can’t always use just global error. Network may have low RMS, but hasn’t solved all input patterns correctly. Exercise 3.3 How many times has network seen each input pattern after 1000 sweeps through training set? How small must RMS error be before we can say network has solved problem? 4/6/2017 Neural Networks

153 Pattern Error – Verify Network Has Learned
RMS error is the average error across 4 patterns. Is error uniformly distributed across different patterns or have some patterns been correctly learned while others are not?? Verify network has learned from Network menu Presents each input pattern to network once & observes resulting output node activation. Compare output activations with teacher signal in .teach file. 4/6/2017 Neural Networks

154 Used and.data training patterns to verify network performance.
Output window indicates file and.1000.wts as specification of state of network. Used and.data training patterns to verify network performance. Compare activation values to target activations in and.teach file. Has the network solved Boolean AND? 4/6/2017 Neural Networks

155 Pattern Error – Node Activities
Activation levels indicated by squares. Large white = high activations. Small white = low activations. Grey = inactive node. 4/6/2017 Neural Networks

156 Individual Pattern Error
Global error can be misleading. If have large # of patterns to learn, global error may be low even if some patterns are not learned correctly. These may be the interesting patterns. Also may want to create special test stimuli not presented to network during training. Generalize to novel cases? What has network learned? Helps discover what generalizations have been created from a finite data set. 4/6/2017 Neural Networks

157 Pattern Error: Present each Input Pattern Just Once
Select Verify network has learned from Network menu. Presents each input pattern to network just once. E.g., for AND function, should do 4 sweeps (1 per each training input). Observe resulting output node activations. Compare output activations with teacher signal in .teach file. 4/6/2017 Neural Networks

158 Used and.data training patterns to verify network performance.
AND Network Output window indicates file and.1000.wts as specification of state of network. Used and.data training patterns to verify network performance. Compare activation values to target activations in and.teach file. Has the network solved Boolean AND? 4/6/2017 Neural Networks

159 Calculate Actual RMS Error Value & Compare it to Value Plotted (Boolean AND)
Input Output Round Off Target Squared Error 0.099 .0098 1 0.294 .0864 0.301 .0906 0.620 .1444 RMS Error = .2877 Sqrt(.3312/4) 4/6/2017 Neural Networks

160 Pattern Error – Node Activities
Activation levels indicated by squares. Large white = high activations. Small white = low activations. Grey = inactive node. 4/6/2017 Neural Networks

161 Examine Weights Input activations transmitted to other nodes along modifiable connections. Performance of network determined by strength of connections (weight values). Display menu, Connection Weights (Hinton diagram). white (positive) black (negative) size reflects absolute size of connection bias node/first input/second input 4/6/2017 Neural Networks

162 Rectangles in 2nd column code connections from 1st input unit.
All rectangles in first column code values of connection from bias node. Rectangles in 2nd column code connections from 1st input unit. Across columns – higher numbered nodes (from .cf) Rows in each column identify destination nodes of connection. higher numbered rows indicate higher numbered destination nodes. Only one node in this example receives inputs (output node) – only one that receives incoming connections. 4/6/2017 Neural Networks

163 Hinton diagram provides clues how network solves Boolean AND.
Bias has strong negative connection to output node. 2 input nodes have moderately sized positive connections to output node. One active node by itself can’t provide enough activation to overcome strong negative bias. Two active input nodes together can overcome negative bias. Output node only turns on if both input nodes are active! 4/6/2017 Neural Networks

164 Role of Start State Network solved Boolean AND starting with particular set of random weights & biases. Use different random seed (Training options) to wipe out learning that has occurred … Can resume training beyond the specified number of sweeps using the Resume training option. Start states can have dramatic impact on way network attempts to solve a problem & on final solution. Training networks with different random seeds is like running subjects on experiments. 4/6/2017 Neural Networks

165 Role of Learning Rate Learning rate determines proportion of error signal which is used to change weights in network. Large learning rates lead to big weight changes. Small learning rates lead to small weight changes. To examine effect of learning rate on performance, run simulation so that learning rate is only factor changed. Start with same random weights & biases. Modelers often use small learning rate to avoid large weight changes. Large weight changes can be disruptive (learning is undone). Large weight changes can be counter-productive when network is close to a solution! 4/6/2017 Neural Networks

166 Steps To Building Neural Network in tlearn
Network menu – New Project option. New project dialogue box appears. Select directory or folder in which to save your project files. Use N: Drive! Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture (.teach, .data, & .cf). Check architecture. Specify training option parameters to determine initial start state of network, learning rate, & momentum. Train network (from Network menu). Determine if network has learned task by checking error rates, examine response to individual patterns, etc. 4/6/2017 Neural Networks

167 AND Network : Hinton Diagram
Bias Node First Input Second Input 4/6/2017 Neural Networks

168 White = positive weight. Black = negative weight.
Hinton Diagram. White = positive weight. Black = negative weight. Area of box proportional to absolute value of corresponding weight. 4/6/2017 Neural Networks

169 Logical AND Network Implemented With 2 I & 1 O
Output unit on (value close to 1.0) when both inputs Otherwise off. With large - weight from bias unit to output, off by default. Make weights from input nodes to output large enough that if both nodes are present, net input is great enough to turn output on. Neither input by itself is large enough to overcome negative bias. Node 0 is bias unit which is always on. So node 1 has a bias. 4/6/2017 Neural Networks

170 Hinton Diagram Example
4/6/2017 Neural Networks

171 Weights File in tlearn tlearn keeps up-to-date record of network’s state in weights file. Saved to disk at regular intervals & at end of training. Lists all connections in network grouped according to received node. In and.cf file only 1 receiving node is specified (output node 1). 4/6/2017 Neural Networks

172 2nd # (1.328) shows connection from 1st input node to output.
1st # represents weight on connections from bias node to output node (-2.204). 2nd # (1.328) shows connection from 1st input node to output. 3rd # (1.36) shows connection from 2nd input node to output node. Final number (0.000) shows connection from output node itself – non-existent due to feedforward nature. 4/6/2017 Neural Networks

173 Resume Training Can continue network training by Resume training option on the Network menu. Extend training by # sweeps & adjust error display to accommodate extra training sweeps. Does the RMS error decrease significantly? 4/6/2017 Neural Networks

174 Several Different Ways to Analyze Weights & Examine Internal Representations
Hierarchical clustering of hidden unit activations. Principal component analysis & projection pursuit. Activation patterns in conjunction with actual weights. Examine these methods in detail later in semester! 4/6/2017 Neural Networks

175 1 - Hierarchical Clustering of Hidden Unit Activations
Present test patterns to network after training. Patterns produce activations on hidden units which record & tag -- vectors in multi-dimensional space. Clustering looks at similarity structure of space. Inputs treated as similar by network produce internal representations that are similar. Produces tree format of inner-pattern distance. Can’t examine space directly -- difficult to visualize high-dimensional spaces. 4/6/2017 Neural Networks

176 2 - Principal Component Analysis & Project Pursuit
Used to identify interesting lower-dimensional slices from hierarchical clustering. Move viewing perspective around in this space. 4/6/2017 Neural Networks

177 3 - Activation Patterns In Conjunction With Actual Weights
When look at activation patterns, only look at part of what network “knows.” Network manipulates & transforms info via connections between nodes. Examine connections & weights to see how transformations are being carried out. Hinton diagrams can be used -- weights shown as colored squares with color & size of square representing magnitude & sign of connection. 4/6/2017 Neural Networks

178 Has Network Solved AND Problem? RMS error = 0.35. Solved?
Depends on how define acceptable level of error. Can’t always use just global error. Network may have low RMS, but hasn’t solved all input patterns correctly. Exercise 3.3 How many times has network seen each input pattern after 1000 sweeps through training set? How small must RMS error be before we can say network has solved problem? Exercise 3.4 Compare exact value of RMS to plotted value. 4/6/2017 Neural Networks

179 What Do We Learn From a Simulation?
Are the simulations framed in such way that clearly address some issue? Are the task & stimuli appropriate for points being made? Do you feel you’ve learned something from the simulation? 4/6/2017 Neural Networks

180 Logical OR What type of network architecture?
Input Output Activations Activations (Node 3) Node 0 Node 1 AND OR XOR 1 What type of network architecture? 2 input, 1 output + bias node Try the OR network (pg ). 4/6/2017 Neural Networks

181 4/6/2017 Neural Networks

182 Exclusive OR Create third project called xor and try the exclusive OR function with input layer and output layer. 4/6/2017 Neural Networks

183 Neural Network Simulation Software : tlearn, Membrain
Simulations allow examination of how model solved problem. Simulator needs to be told: Network architecture. Training data. Learning rate & other parameters. Simulator: Creates network. Performs training. Reports results. You can examine results. 4/6/2017 Neural Networks

184 Tlearn Software Copy win_tlearn.exe from disk or R: drive to N: drive.
Double-click on file to begin installation. Executable is called tlearn. To download Adobe Acrobat PDF version: ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/TlearnManual.pdf 4/6/2017 Neural Networks


Download ppt "4/6/2017 Neural Networks."

Similar presentations


Ads by Google