November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Artificial Neural Networks (1)
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
Computer Vision Lecture 18: Object Recognition II
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)
X0 xn w0 wn o Threshold units SOM.
Linear Discriminant Functions
Intro. ANN & Fuzzy Systems Lecture 8. Learning (V): Perceptron Learning.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
September 7, 2010Neural Networks Lecture 1: Motivation & History 1 Welcome to CS 672 – Neural Networks Fall 2010 Instructor: Marc Pomplun Instructor: Marc.
A Review: Architecture
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
November 5, 2009Introduction to Cognitive Science Lecture 16: Symbolic vs. Connectionist AI 1 Symbolism vs. Connectionism There is another major division.
An Illustrative Example
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Data Mining with Neural Networks (HK: Chapter 7.5)
Neural Networks Lecture 17: Self-Organizing Maps
CS 484 – Artificial Intelligence
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
November 7, 2012Introduction to Artificial Intelligence Lecture 13: Neural Network Basics 1 Note about Resolution Refutation You have a set of hypotheses.
Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
From Biological to Artificial Neural Networks Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Feed-Forward Neural Networks 主講人 : 虞台文. Content Introduction Single-Layer Perceptron Networks Learning Rules for Single-Layer Perceptron Networks – Perceptron.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Classification with Perceptrons
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any.
Perceptrons Michael J. Watts
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Start with student evals. What function does perceptron #4 represent?
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Supervised Learning in ANNs
Neural Networks Winter-Spring 2014
Chapter 2 Single Layer Feedforward Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
The Naïve Bayes (NB) Classifier
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Computer Vision Lecture 19: Object Recognition III
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical features, e.g., size, brightness, or curvature. The pattern (or pattern vector, feature vector) is a vector of all chosen features for a given object, e.g. (x 1, x 2, …, x n ). The description of any object corresponds to one point in the pattern space X. Construction of formal description Object Pattern PatternClassifier Classification Classification

November 21, 2013Computer Vision Lecture 14: Object Recognition II 2 Classification Principles A statistical classifier is a device with n inputs and 1 output. The input is a feature vector (x 1, x 2, …, x n ). For an R-class classifier, its output is one of R symbols w 1, w 2, …, w R, which are the class identifiers. The function d(x) = w r is the classifier’s decision rule. It divides the feature space X into R disjoint subsets K r, r = 1, …, R.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 3 Classification Principles The function d(x) can be defined using R scalar discrimination functions g 1 (x), g 2 (x), …, g R (x). For all x  K r and any s  {1, …, R}, s  r, the discrimination functions must be defined such that g r (x)  g s (x). This way, the discrimination hyper-surface between regions K r and K s is defined by g r (x) - g s (x) = 0.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 4 Classification Principles The decision rule then becomes d(x) = w r  g r (x) = max s=1, …, R g s (x). Often, linear classification functions are used: g r (x) = q r0 + q r1 x 1 + … + q rn x n. In that case, the discrimination hyper-surfaces become discrimination hyper-planes.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 5 Neural Networks The “building blocks” of neural networks are the neurons.The “building blocks” of neural networks are the neurons. In technical systems, we also refer to them as units or nodes.In technical systems, we also refer to them as units or nodes. Basically, each neuronBasically, each neuron –receives input from many other neurons, –changes its internal state (activation) based on the current input, –sends one output signal to many other neurons, possibly including its input neurons (recurrent network)

November 21, 2013Computer Vision Lecture 14: Object Recognition II 6 How do Neural Networks Work? Information is transmitted as a series of electric impulses, so-called spikes.Information is transmitted as a series of electric impulses, so-called spikes. The frequency and phase of these spikes encodes the information.The frequency and phase of these spikes encodes the information. In biological systems, one neuron can be connected to as many as 10,000 other neurons.In biological systems, one neuron can be connected to as many as 10,000 other neurons. Usually, a neuron receives its information from other neurons in a confined area, its so-called receptive field.Usually, a neuron receives its information from other neurons in a confined area, its so-called receptive field.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 7 How do Neural Networks Learn? NNs are able to learn by adapting their connectivity patterns so that the organism improves its behavior in terms of reaching certain (evolutionary) goals.NNs are able to learn by adapting their connectivity patterns so that the organism improves its behavior in terms of reaching certain (evolutionary) goals. The strength of a connection, or whether it is excitatory or inhibitory, depends on the state of a receiving neuron’s synapses.The strength of a connection, or whether it is excitatory or inhibitory, depends on the state of a receiving neuron’s synapses. The NN achieves learning by appropriately adapting the states of its synapses.The NN achieves learning by appropriately adapting the states of its synapses.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 8 An Artificial Neuron x1x1 x2x2 xnxn … W i,1 W i,2 … W i,n xixi neuron i net input signal synapses output

November 21, 2013Computer Vision Lecture 14: Object Recognition II 9 The Activation Function One possible choice is a threshold function: The graph of this function looks like this: 1 0  f i (net i (t)) net i (t)

November 21, 2013Computer Vision Lecture 14: Object Recognition II 10 Linear Separability Input space in the two-dimensional case (n = 2): w 1 = 1, w 2 = 2,  = 2 x1x x2x w 1 = -2, w 2 = 1,  = 2 x1x x2x w 1 = -2, w 2 = 1,  = 1 x1x x2x

November 21, 2013Computer Vision Lecture 14: Object Recognition II 11 Linear Separability So by varying the weights and the threshold, we can realize any linear separation of the input space into a region that yields output 1, and another region that yields output 0. As we have seen, a two-dimensional input space can be divided by any straight line. A three-dimensional input space can be divided by any two-dimensional plane. In general, an n-dimensional input space can be divided by an (n-1)-dimensional plane or hyperplane. Of course, for n > 3 this is hard to visualize.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 12 The Perceptron x1x1 x2x2 xnxn … W1W1 W2W2 … WnWn f(x 1,x 2,…,x n ) unit i net input signal output threshold 

November 21, 2013Computer Vision Lecture 14: Object Recognition II 13 The Perceptron x1x1 x2x2 xnxn … W1W1 W2W2 … WnWn f(x 1,x 2,…,x n ) unit i net input signal output threshold 0 x 0  1 W0W0 W 0 corresponds to -  Here, only the weight vector is adaptable, but not the threshold

November 21, 2013Computer Vision Lecture 14: Object Recognition II 14 Perceptron Computation A perceptron divides its n-dimensional input space by an (n-1)-dimensional hyperplane defined by the equation: w 0 + w 1 x 1 + w 2 x 2 + … + w n x n = 0 For w 0 + w 1 x 1 + w 2 x 2 + … + w n x n > 0, its output is 1, and for w 0 + w 1 x 1 + w 2 x 2 + … + w n x n  0, its output is -1. With the right weight vector (w 0, …, w n ) T, a single perceptron can compute any linearly separable function. We are now going to look at an algorithm that determines such a weight vector for a given function.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 15 Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly chosen weight vector w 0 ; Let k = 1; while there exist input vectors that are misclassified by w k-1, do Let i j be a misclassified input vector; Let x k = class(i j )  i j, implying that w k-1  x k < 0; Update the weight vector to w k = w k-1 +  x k ; Increment k; end-while;

November 21, 2013Computer Vision Lecture 14: Object Recognition II 16 Perceptron Training Algorithm For example, for some input i with class(i) = -1, If w  i > 0, then we have a misclassification. Then the weight vector needs to be modified to w +  w with (w +  w)  i < w  i to possibly improve classification. We can choose  w = -  i, because (w +  w)  i = (w -  i)  i = w  i -  i  i < w  i, and i  i is the square of the length of vector i and is thus positive. If class(i) = 1, things are the same but with opposite signs; we introduce x to unify these two cases.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 17 Perceptron Learning Example i1i i2i We would like our perceptron to correctly classify the five 2-dimensional data points below. Let the random initial weight vector w 0 = (2, 1, -2) T. Then the dividing line crosses at (0, 1) T and (-2, 0) T. Then the dividing line crosses at (0, 1) T and (-2, 0) T. class -1 class 1 Let us pick the misclassified point (-2, -1) T for learning: i = (1, -2, -1) T (include offset 1) x 1 = (-1)  (1, -2, -1) T (i is in class -1) x 1 = (-1, 2, 1) T

November 21, 2013Computer Vision Lecture 14: Object Recognition II 18 Perceptron Learning Example i1i i2i w 1 = w 0 + x 1 (let us set  = 1 for simplicity) w 1 = (2, 1, -2) T + (-1, 2, 1) T = (1, 3, -1) T The new dividing line crosses at (0, 1) T and (-1/3, 0) T. Let us pick the next misclassified point (0, 2) T for learning: i = (1, 0, 2) T (include offset 1) x 2 = (1, 0, 2) T (i is in class 1) class -1 class 1

November 21, 2013Computer Vision Lecture 14: Object Recognition II 19 Perceptron Learning Example i1i i2i w 2 = w 1 + x 2 w 2 = (1, 3, -1) T + (1, 0, 2) T = (2, 3, 1) T Now the line crosses at (0, -2) T and (-2/3, 0) T. With this weight vector, the perceptron achieves perfect classification! The learning process terminates. In most cases, many more iterations are necessary than in this example. class -1 class 1

November 21, 2013Computer Vision Lecture 14: Object Recognition II 20 Learning Rate and Termination Terminate when all samples are correctly classified.Terminate when all samples are correctly classified. If the number of misclassified samples has not changed in a large number of steps, the problem could be the choice of learning rate  :If the number of misclassified samples has not changed in a large number of steps, the problem could be the choice of learning rate  : If  is too large, classification may just be swinging back and forth and take a long time to reach the solution;If  is too large, classification may just be swinging back and forth and take a long time to reach the solution; On the other hand, if  is too small, changes in classification can be extremely slow.On the other hand, if  is too small, changes in classification can be extremely slow. If changing  does not help, the samples may not be linearly separable, and training should terminate.If changing  does not help, the samples may not be linearly separable, and training should terminate. If it is known that there will be a minimum number of misclassifications, train until that number is reached.If it is known that there will be a minimum number of misclassifications, train until that number is reached.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 21 Guarantee of Success Novikoff (1963) proved the following theorem: Given training samples from two linearly separable classes, the perceptron training algorithm terminates after a finite number of steps, and correctly classifies all elements of the training set, irrespective of the initial random non-zero weight vector w 0. But are those solutions optimal? One of the reasons why we are interested in neural networks is that they are able to generalize, i.e., give plausible output for new (untrained) inputs. How well does a perceptron deal with new inputs?

November 21, 2013Computer Vision Lecture 14: Object Recognition II 22 Perceptron Learning Results Perfect classification of training samples, but may not generalize well to new (untrained) samples.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 23 Perceptron Learning Results This function is likely to perform better classification on new samples.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 24 Perceptron Learning Results Perceptrons do not “care” about whether they found an optimal solution with regard to new samples. They stop learning once perfect classification of the training samples has been achieved. Therefore, results are often suboptimal for novel samples. This is one of the reasons why perceptrons are rarely used in current applications. However, their learning algorithm is simple and illustrates the general idea underlying neural classification approaches.