Download presentation
Presentation is loading. Please wait.
1
Artificial Intelligence
Neural Networks
2
What is Learning The word "learning" has many different meanings. It is used, at least, to describe memorizing something learning facts through observation and exploration development of motor and/or cognitive skills through practice organization of new knowledge into general, effective representations
3
Learning Study of processes that lead to self-improvement of machine performance. It implies the ability to use knowledge to create new knowledge or integrating new facts into an existing knowledge structure Learning typically requires repetition and practice to reduce differences between observed and actual performance
4
What is Learning? Herbert Simon: “Learning is any process by which a system improves performance from experience.”
5
Learning Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.
6
Learning & Adaptation ”Modification of a behavioral tendency by expertise.” (Webster 1984) ”A learning machine, broadly defined as any device whose actions are influenced by past experiences.” (Nilsson 1965) ”Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population.” (Simon 1983)
7
Negative Features of Human Learning
Its slow (5-6 years for motor skills years for abstract reasoning) Inefficient Expensive There is no copy process Learning strategy is often a function of knowledge available to learner
8
Applications of ML Learning to recognize spoken words
Learning to drive an autonomous vehicle Learning to classify objects Learning to play world-class backgammon Designing the morphology and control structure of electro-mechanical artefacts
9
Motivating Problems Handwritten Character Recognition
10
Motivating Problems Fingerprint Recognition (e.g., border control)
11
Motivating Problems Face Recognition (security access to buildings etc)
12
Different kinds of learning…
Supervised learning: Someone gives us examples and the right answer for those examples We have to predict the right answer for unseen examples Unsupervised learning: We see examples but get no feedback We need to find patterns in the data Reinforcement learning: We take actions and get rewards Have to learn how to get high rewards
13
Reinforcement learning
Another learning problem, familiar to most of us, is learning motor skills, like riding a bike. We call this reinforcement learning. It's different from supervised learning because no-one explicitly tells you the right thing to do; you just have to try things and see what makes you fall over and what keeps you upright.
14
Learning with a Teacher
supervised learning knowledge represented by a set of input-output examples (xi,yi) minimize the error between the actual response of the learner and the desired response desired response state x Environment Teacher actual response + Learning system - S error signal
15
Unsupervised Learning
self-organized learning no teacher task independent quality measure identify regularities in the data and discover classes automatically state Environment Learning system
16
The red and the black Imagine that we were given all these points, and we needed to guess a function of their x, y coordinates that would have one output for the red ones and a different output for the black ones.
17
What’s the right hypothesis?
In this case, it seems like we could do pretty well by defining a line that separates the two classes.
18
Now, what’s the right hypothesis
Now, what if we have a slightly different configuration of points? We can't divide them conveniently with a line.
19
Now, what’s the right hypothesis
But this parabola-like curve seems like it might be a reasonable separator.
20
Design a Learning System
Step 0: Lets treat the learning system as a black box Learning System Z
21
Design a Learning System
Step 1: Collect Training Examples (Experience). Without examples, our system will not learn (so-called learning from examples) 2 3 6 7 8 9
22
Design a Learning System
Step 2: Representing Experience So, what would D be like? There are many possibilities. Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) X = (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,1,0,0,0,0) X= (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,0,0,0,1,0)
23
Example of supervised learning: classification
We lend money to people We have to predict whether they will pay us back or not People have various (say, binary) features: do we know their Address? do they have a Criminal record? high Income? Educated? Old? Unemployed? We see examples: (Y = paid back, N = not) +a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N Next person is +a, -c, +i, -e, +o, -u. Will we get paid back?
24
Learning by Examples Concept: ”days on which my friend Aldo enjoys his favourite water sports” Task: predict the value of ”Enjoy Sport” for an arbitrary day based on the values of the other attributes Sky Temp Humid Wind Water Fore-cast Enjoy Sport Sunny Rainy Warm Cold Normal High Strong Cool Same Chane Yes No
25
Decision trees high Income? yes no Criminal record? NO yes no NO YES
26
Constructing a decision tree, one step at a time
+a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N Constructing a decision tree, one step at a time address? yes no -a, +c, -i, +e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N criminal? criminal? yes no yes no -a, +c, -i, +e, -o, -u: N -a, +c, +i, -e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N income? yes no Address was maybe not the best attribute to start with… +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N
27
Different approach: nearest neighbor(s)
Next person is -a, +c, -i, +e, -o, +u. Will we get paid back? Nearest neighbor: simply look at most similar example in the training data, see what happened there +a, -c, +i, +e, +o, +u: Y (distance 4) -a, +c, -i, +e, -o, -u: N (distance 1) +a, -c, +i, -e, -o, -u: Y (distance 5) -a, -c, +i, +e, -o, -u: Y (distance 3) -a, +c, +i, -e, -o, -u: N (distance 3) -a, -c, +i, -e, -o, +u: Y (distance 3) +a, -c, -i, -e, +o, -u: N (distance 5) +a, +c, +i, -e, +o, -u: N (distance 5) Nearest neighbor is second, so predict N k nearest neighbors: look at k nearest neighbors, take a vote E.g., 5 nearest neighbors have 3 Ys, 2Ns, so predict Y
28
Neural Networks They can represent complicated hypotheses in high-dimensional continuous spaces. They are attractive as a computational model because they are composed of many small computing units. They were motivated by the structure of neural systems in parts of the brain. Now it is understood that they are not an exact model of neural function, but they have proved to be useful from a purely practical perspective.
29
If…then rules If tear production rate = reduced then recommendation = none If age = young and astigmatic = no then recommendation = soft
30
Approaches to Machine Learning
Numerical approaches Build numeric model with parameters based on successes Structural approaches Concerned with the process of defining relationships by creating links between concepts
31
Learning methods Decision rules: Bayesian network: Neural Network:
If income < $ then reject Bayesian network: P(good | income, credit history,….) Neural Network: Nearest Neighbor: Take the same decision as for the customer in the data base that is most similar to the applicant
32
Classification Assign object/event to one of a given finite set of categories. Medical diagnosis Credit card applications or transactions Fraud detection in e-commerce Worm detection in network packets Spam filtering in Recommended articles in a newspaper Recommended books, movies, music, or jokes Financial investments DNA sequences Spoken words Handwritten letters Astronomical images
33
Problem Solving / Planning / Control
Performing actions in an environment in order to achieve a goal. Solving calculus problems Playing checkers, chess, or backgammon Balancing a pole Driving a car or a jeep Flying a plane, helicopter, or rocket Controlling an elevator Controlling a character in a video game Controlling a mobile robot
34
Another Example: Handwriting Recognition
Positive: This is a letter S: Negative: This is a letter Z: Background concepts: Pixel information Categorisations: (Matrix, Letter) pairs Both positive & negative Task Correctly categorise An unseen example Into 1 of 26 categories
35
History Roots of work on NN are in:
Neurobiological studies (more than one century ago): How do nerves behave when stimulated by different magnitudes of electric current? Is there a minimal threshold needed for nerves to be activated? Given that no single nerve cel is long enough, how do different nerve cells communicate among each other? Psychological studies: How do animals learn, forget, recognize and perform other types of tasks? Psycho-physical experiments helped to understand how individual neurons and groups of neurons work. McCulloch and Pitts introduced the first mathematical model of single neuron, widely applied in subsequent work.
36
History Widrow and Hoff (1960): Adaline
Minsky and Papert (1969): limitations of single-layer perceptrons (and they erroneously claimed that the limitations hold for multi-layer perceptrons) Stagnation in the 70's: Individual researchers continue laying foundations von der Marlsburg (1973): competitive learning and self-organization Big neural-nets boom in the 80's Grossberg: adaptive resonance theory (ART) Hopfield: Hopfield network Kohonen: self-organising map (SOM)
37
Applications Classification: Regression: Pattern association:
Image recognition Speech recognition Diagnostic Fraud detection … Regression: Forecasting (prediction on base of past history) Pattern association: Retrieve an image from corrupted one Clustering: clients profiles disease subtypes
38
Real Neurons Cell structures Cell body Dendrites Axon
Synaptic terminals
39
Non Symbolic Representations
Decision trees can be easily read A disjunction of conjunctions (logic) We call this a symbolic representation Non-symbolic representations More numerical in nature, more difficult to read Artificial Neural Networks (ANNs) A Non-symbolic representation scheme They embed a giant mathematical function To take inputs and compute an output which is interpreted as a categorisation Often shortened to “Neural Networks” Don’t confuse them with real neural networks (in heads)
40
Complicated Example: Categorising Vehicles
Input to function: pixel data from vehicle images Output: numbers: 1 for a car; 2 for a bus; 3 for a tank INPUT INPUT INPUT INPUT OUTPUT = OUTPUT = OUTPUT = OUTPUT=1
41
Real Neural Learning Synapses change size and strength with experience. Hebbian learning: When two connected neurons are firing at the same time, the strength of the synapse between them increases. “Neurons that fire together, wire together.”
42
Neural Network Input Layer Hidden 1 Hidden 2 Output Layer
43
Simple Neuron Inputs f X1 X2 Xn Output W1 W2 Wn
44
Neuron Model A neuron has more than one input x1, x2,..,xm
Each input is associated with a weight w1, w2,..,wm The neuron has a bias b The net input of the neuron is n = w1 x1 + w2 x2+….+ wm xm + b
45
Neuron output The neuron output is y = f (n)
f is called transfer function
46
Transfer Function We have 3 common transfer functions
Hard limit transfer function Linear transfer function Sigmoid transfer function
47
Exercises The input to a single-input neuron is 2.0, its weight is 2.3 and the bias is –3. What is the output of the neuron if it has transfer function as: Hard limit Linear sigmoid
48
Architecture of ANN Feed-Forward networks
Allow the signals to travel one way from input to output Feed-Back Networks The signals travel as loops in the network, the output is connected to the input of the network
49
Learning Rule The learning rule modifies the weights of the connections. The learning process is divided into Supervised and Unsupervised learning
50
Perceptron It is a network of one neuron and hard limit transfer function Inputs f X1 X2 Xn Output W1 W2 Wn
51
Perceptron The perceptron is given first a randomly weights vectors
Perceptron is given chosen data pairs (input and desired output) Preceptron learning rule changes the weights according to the error in output
52
Perceptron The weight-adapting procedure is an iterative method and should reduce the error to zero The output of perceptron is Y = f(n) = f ( w1x1+w2x2+…+wnxn) =f (wixi) = f ( WTX)
53
Perceptron Learning Rule
W new = W old + (t-a) X Where W new is the new weight W old is the old value of weight X is the input value t is the desired value of output a is the actual value of output
54
Example Consider a perceptron that has two real-valued inputs and an output unit. All the initial weights and the bias equal Assume the teacher has said that the output should be 0 for the input: x1 = 5 and x2 = - 3. Find the optimum weights for this problem.
55
Example Covert the classification problem into perceptron neural network model (start w1=1, b=3 and w2=2 or any other values). X1 = [0 2], t1=1 & x2 = [1 0], t2=1 & x3 = [0 –2] , t3=0 & x4=[2 0], t4=0
56
Example Perceptron Example calculation: x1=-1, x2=1, x3=1, x4=-1
S = 0.25*(-1) *(1) *(1) *(-1) = 0 0 > -0.1, so the output from the ANN is +1 So the image is categorised as “bright”
57
The First Neural Neural Networks
AND Function 1 X1 X2 Y Threshold(Y) = 2
58
Simple Networks t = 0.0 y x W = 1.5 W = 1 -1
59
Exercises Design a neural network to recognize the problem of
X1=[2 2] , t1=0 X=[1 -2], t2=1 X3=[-2 2], t3=0 X4=[-1 1], t4=1 Start with initial weights w=[0 0] and bias =0
60
Problems Four one-dimensional data belonging to two classes are
X = [ ] T = [ ] W = [ ]
61
Example -1 +1 - 1
62
Example -1 +1 - 1
63
AND Network This example means we construct a network for AND operation. The network draw a line to separate the classes which is called Classification
64
Perceptron Geometric View
The equation below describes a (hyper-)plane in the input space consisting of real valued m-dimensional vectors. The plane splits the input space into two regions, each of them describing one class. decision region for C1 x2 w1x1 + w2x2 + w0 >= 0 decision boundary C1 x1 C2 w1x1 + w2x2 + w0 = 0
65
Perceptron: Limitations
The perceptron can only model linearly separable classes, like (those described by) the following Boolean functions: AND OR COMPLEMENT It cannot model the XOR. You can experiment with these functions in the Matlab practical lessons.
66
Multi-layers Network Let the network of 3 layers
Input layer Hidden layers Output layer Each layer has different number of neurons
67
Multi layer feed-forward NN
FFNNs overcome the limitation of single-layer NN: they can handle non-linearly separable learning tasks. Input layer Output layer Hidden Layer
68
Types of decision regions
x1 1 x2 w2 w1 w0 Network with a single node 1 x1 x2 Convex region L1 L2 L3 L4 One-hidden layer network that realizes the convex region -3.5
69
Learning rule The perceptron learning rule can not be applied to multi-layer network We use BackPropagation Algorithm in learning process
70
Backprop Back-propagation training algorithm illustrated:
Backprop adjusts the weights of the NN in order to minimize the network total mean squared error. Network activation Error computation Forward Step Error propagation Backward Step
71
Bp Algorithm The weight change rule is
Where is the learning factor <1 Error is the error between actual and trained value f’ is is the derivative of sigmoid function = f(1-f)
72
Delta Rule Each observation contributes a variable amount to the output The scale of the contribution depends on the input Output errors can be blamed on the weights A least mean square (LSM) error function can be defined (ideally it should be zero) E = ½ (t – y)2
73
Calculation of Network Error
Could calculate Network error as Proportion of mis-categorised examples But there are multiple output units, with numerical output So we use a more sophisticated measure: Not as complicated as it looks Square the difference between target and observed Squaring ensures we get a positive number Add up all the squared differences For every output unit and every example in training set
74
Example For the network with one neuron in input layer and one neuron in hidden layer the following values are given X=1, w1 =1, b1=-2, w2=1, b2 =1, =1 and t=1 Where X is the input value W1 is the weight connect input to hidden W2 is the weight connect hidden to output b1 and b2 are bias t is the training value
75
Momentum in Backpropagation
For each weight Remember what was added in the previous epoch In the current epoch Add on a small amount of the previous Δ The amount is determined by The momentum parameter, denoted α α is taken to be between 0 and 1
76
How Momentum Works If direction of the weight doesn’t change Caution:
Then the movement of search gets bigger The amount of additional extra is compounded in each epoch May mean that narrow local minima are avoided May also mean that the convergence rate speeds up Caution: May not have enough momentum to get out of local minima Also, too much momentum might carry search Back out of the global minimum, into a local minimum
77
Building Neural Networks
Define the problem in terms of neurons think in terms of layers Represent information as neurons operationalize neurons select their data type locate data for testing and training Define the network Train the network Test the network
78
Application: FACE RECOGNITION
The problem: Face recognition of persons of a known group in an indoor environment. The approach: Learn face classes over a wide range of poses using neural network.
79
Navigation of a car Done by Pomerlau. The network takes inputs from a 34X36 video image and a 7X36 range finder. Output units represent “drive straight”, “turn left” or “turn right”. After training about 40 times on 1200 road images, the car drove around CMU campus at 5 km/h (using a small workstation on the car). This was almost twice the speed of any other non-NN algorithm at the time. 11/17/2018
80
Automated driving at 70 mph on a public highway
Camera image 30 outputs for steering 30x32 weights into one out of four hidden unit 4 hidden units 30x32 pixels as inputs
81
Exercises Perform one iteration of backprpgation to network of two layers. First layer has one neuron with weight 1 and bias –2. The transfer function in first layer is f=n2 The second layer has only one neuron with weight 1 and bias 1. The f in second layer is 1/n. The input to the network is x=1 and t=1
82
X 1 2 W W13 W23 b2 b1 b3 using the initial weights [b1= - 0.5, w11=2, w12=2, w13=0.5, b2= 0.5, w21= 1, w22 = 2, w23 = 0.25, and b3= 0.5] and input vector [2, 2.5] and t = 8. Process one iteration of backpropagation algorithm.
83
Consider a transfer function as f(n) = n2
Consider a transfer function as f(n) = n2. Perform one iteration of BackPropagation with a= 0.9 for neural network of two neurons in input layer and one neuron in output layer. The input values are X=[1 -1] and t = 8, the weight values between input and hidden layer are w11 = 1, w12 = - 2, w21 = 0.2, and w22 = 0.1. The weight between input and output layers are w1 = 2 and w2= -2. The bias in input layers are b1 = -1, and b2= 3.
84
Kakuro. is a kind of game puzzle
Kakuro is a kind of game puzzle. The object of the puzzle is to insert a digit from 1 to 9 inclusive into each white cell such that the sum of the numbers in each entry matches the clue associated with it and that no digit is duplicated in any entry. Briefly describe how you’d use Constraint Satisfaction Problem methods to solve Kakuro puzzles intelligently.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.