Download presentation
Presentation is loading. Please wait.
1
Introduction To Neural Networks
These slides are largely borrowed from George Papadourakis, Prévotet Jean-Christophe
2
Introduction What are Neural Networks?
Neural networks are a new method of programming computers. They are exceptionally good at performing pattern recognition and other tasks that are very difficult to program using conventional techniques. Programs that employ neural nets are also capable of learning on their own and adapting to changing conditions.
3
Background Development of Neural Networks date back to the early 1940s. It experienced an upsurge in popularity in the late 1980s. This was a result of the discovery of new techniques and developments and general advances in computer hardware technology. Some NNs are models of biological neural networks and some are not, but historically, much of the inspiration for the field of NNs came from the desire to produce artificial systems capable of sophisticated, perhaps intelligent, computations similar to those that the human brain routinely performs, and thereby possibly to enhance our understanding of the human brain. Most NNs have some sort of training rule. In other words, NNs learn from examples (as children learn to recognize dogs from examples of dogs) and exhibit some capability for generalization beyond the training data. Neural computing must not be considered as a competitor to conventional computing. Rather, it should be seen as complementary as the most successful neural solutions have been those which operate in conjunction with existing, traditional techniques.
4
Background (Cont.) An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the biological nervous systems, such as the human brain’s information processing mechanism. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. NNs, like people, learn by example. An NN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of NNs as well.
5
Biological Neuron In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons.
6
How the Human Brain learns
The brain is a collection of about 10 billion interconnected neurons. Each neuron is a cell that uses biochemical reactions to receive, process and transmit information. A neuron's dendritic tree is connected to a thousand neighbouring neurons. When one of those neurons fire, a positive or negative charge is received by one of the dendrites. The strengths of all the received charges are added together through the processes of spatial and temporal summation.
7
Biological inspirations
Some numbers… The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through synapses Properties of the brain It can learn, reorganize itself from experience It adapts to the environment It is robust and fault tolerant
8
Biological neuron A neuron has
A branching input (dendrites) A branching output (the axon) The information circulates from the dendrites to the axon via the cell body Axon connects to dendrites via synapses Synapses vary in strength Synapses may be excitatory or inhibitory
9
A Neuron Model When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes. We conduct these neural networks by first trying to deduce the essential features of neurons and their interconnections. We then typically program a computer to simulate these features.
10
A Simple Neuron An artificial neuron is a device with many inputs and one output. The neuron has two modes of operation; the training mode and the using mode.
11
A Simple Neuron (Cont.) In the training mode, the neuron can be trained to fire (or not), for particular input patterns. In the using mode, when a taught input pattern is detected at the input, its associated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine whether to fire or not. The firing rule is an important concept in neural networks and accounts for their high flexibility. A firing rule determines how one calculates whether a neuron should fire for any input pattern. It relates to all the input patterns, not only the ones on which the node was trained on previously.
12
Neural Network Techniques
Computers have to be explicitly programmed Analyze the problem to be solved. Write the code in a programming language. Neural networks learn from examples No requirement of an explicit description of the problem. No need for a programmer. The neural computer adapts itself during a training period, based on examples of similar problems even without a desired solution to each problem. After sufficient training the neural computer is able to relate the problem data to the solutions, inputs to outputs, and it is then able to offer a viable solution to a brand new problem. Able to generalize or to handle incomplete data.
13
NNs vs Computers Digital Computers
Deductive Reasoning. We apply known rules to input data to produce output. Computation is centralized, synchronous, and serial. Memory is packetted, literally stored, and location addressable. Not fault tolerant. One transistor goes and it no longer works. Exact. Static connectivity. Applicable if well defined rules with precise input data. Neural Networks Inductive Reasoning. Given input and output data (training examples), we construct the rules. Computation is collective, asynchronous, and parallel. Memory is distributed, internalized, short term and content addressable. Fault tolerant, redundancy, and sharing of responsibilities. Inexact. Dynamic connectivity. Applicable if rules are unknown or complicated, or if data are noisy or partial.
14
Applications classification
in marketing: consumer spending pattern classification In defence: radar and sonar image classification In agriculture & fishing: fruit and catch grading In medicine: ultrasound and electrocardiogram image classification, EEGs, medical diagnosis recognition and identification In general computing and telecommunications: speech, vision and handwriting recognition In finance: signature verification and bank note verification
15
Applications (Cont.) assessment
In engineering: product inspection monitoring and control In defence: target tracking In security: motion detection, surveillance image analysis and fingerprint matching forecasting and prediction In finance: foreign exchange rate and stock market forecasting In agriculture: crop yield forecasting In marketing: sales forecasting In meteorology: weather prediction
16
What can you do with an NN and what not?
In principle, NNs can compute any computable function, i.e., they can do everything a normal digital computer can do. Almost any mapping between vector spaces can be approximated to arbitrary precision by feedforward NNs. In practice, NNs are especially useful for classification and function approximation problems usually when rules such as those that might be used in an expert system cannot easily be applied. NNs are, at least today, difficult to apply successfully to problems that concern manipulation of symbols and memory. And there are no methods for training NNs that can magically create information that is not contained in the training data.
17
Who is concerned with NNs?
Computer scientists want to find out about the properties of non-symbolic information processing with neural nets and about learning systems in general. Statisticians use neural nets as flexible, nonlinear regression and classification models. Engineers of many kinds exploit the capabilities of neural networks in many areas, such as signal processing and automatic control. Cognitive scientists view neural networks as a possible apparatus to describe models of thinking and consciousness (High-level brain function). Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics). Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks. Biologists use Neural Networks to interpret nucleotide sequences. Philosophers and some other people may also be interested in Neural Networks for various reasons
18
Pattern Recognition An important application of neural networks is pattern recognition. Pattern recognition can be implemented by using a feed-forward neural network that has been trained accordingly. During training, the network is trained to associate outputs with input patterns. When the network is used, it identifies the input pattern and tries to output the associated output pattern. The power of neural networks comes to life when a pattern that has no output associated with it, is given as an input. In this case, the network gives the output that corresponds to a taught input pattern that is least different from the given pattern.
19
Pattern Recognition (cont.)
Suppose a network is trained to recognize the patterns T and H. The associated patterns are all black and all white respectively as shown above.
20
Pattern Recognition (cont.)
Since the input pattern looks more like a ‘T’, when the network classifies it, it sees the input closely resembling ‘T’ and outputs the pattern that represents a ‘T’.
21
Pattern Recognition (cont.)
The input pattern here closely resembles ‘H’ with a slight difference. The network in this case classifies it as an ‘H’ and outputs the pattern representing an ‘H’.
22
Pattern Recognition (cont.)
Here the top row is 2 errors away from a ‘T’ and 3 errors away from an H. So the top output is a black. The middle row is 1 error away from both T and H, so the output is random. The bottom row is 1 error away from T and 2 away from H. Therefore the output is black. Since the input resembles a ‘T’ more than an ‘H’ the output of the network is in favor of a ‘T’.
23
Perceptron Learning Algorithm
First neural network learning model in the 1960’s Simple and limited (single layer models) Basic concepts are similar for multi-layer models so this is a good learning tool Still used in many current applications (modems, etc.)
24
Perceptron Node – Threshold Logic Unit
x1 w1 x2 w2 z xn wn
25
Perceptron Node – Threshold Logic Unit
x1 w1 x2 w2 z xn wn What parameters and objective function Learn weights such that an objective function is maximized. What objective function should we use? What learning algorithm should we use?
26
Perceptron Learning Algorithm
x1 .4 .1 z x2 -.2 x1 x2 t 1 .1 .3 .4 .8
27
First Training Instance
.8 .3 .4 .1 z =1 -.2 net = .8* *-.2 = .26 x1 x2 t 1 .1 .3 .4 .8
28
Second Training Instance
.4 .1 .4 .1 z =1 -.2 net = .4* *-.2 = .14 x1 x2 t 1 .1 .3 .4 .8 Dwi = c(t – z) xi
29
Perceptron Rule Learning
Dwi = c(t – z) xi Where wi is the weight from input i to perceptron node, c is the learning rate, tj is the target for the current instance, z is the current output, and xi is ith input Least perturbation principle Only change weights if there is an error small c rather than changing weights sufficient to make current pattern correct Scale by xi Create a perceptron node with n inputs Iteratively apply a pattern from the training set and apply the perceptron rule Each iteration through the training set is an epoch Continue training until total training set error ceases to improve Perceptron Convergence Theorem: Guaranteed to find a solution in finite time if a solution exists Author uses y-t and thus negates the delta w
31
Augmented Pattern Vectors
> 0 > 1 Augmented Version > 0 > 1 Treat threshold like any other weight. No special case. Call it a bias since it biases the output up or down. Since we start with random weights anyways, can ignore the - notion, and just think of the bias as an extra available weight. (note the author uses a -1 input) Always use a bias weight Author uses -1 into bias, any real difference (back to being a threshold weight), but since starts random anyway, will make no difference, though the final weight will be negated by comparison
32
Perceptron Rule Example
Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) Assume a learning rate c of 1 and initial weights all 0: Dwi = c(t – z) xi Training set > 0 > 1 > 1 > 0 Pattern Target Weight Vector Net Output DW
33
Example Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) Assume a learning rate c of 1 and initial weights all 0: Dwi = c(t – z) xi Training set > 0 > 1 > 1 > 0 Pattern Target Weight Vector Net Output DW
34
Example Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) Assume a learning rate c of 1 and initial weights all 0: Dwi = c(t – z) xi Training set > 0 > 1 > 1 > 0 Pattern Target Weight Vector Net Output DW
35
Example Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) Assume a learning rate c of 1 and initial weights all 0: Dwi = c(t – z) xi Training set > 0 > 1 > 1 > 0 Pattern Target Weight Vector Net Output DW
36
Example Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) Assume a learning rate c of 1 and initial weights all 0: Dwi = c(t – z) xi Training set > 0 > 1 > 1 > 0 Pattern Target Weight Vector Net Output DW
37
Example Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) Assume a learning rate c of 1 and initial weights all 0: Dwi = c(t – z) xi Training set > 0 > 1 > 1 > 0 Pattern Target Weight Vector Net Output DW
38
If no bias then the hyperplane must go through the origin
What if no bias? If no bias then the hyperplane must go through the origin
39
Linear Separability
40
Linear Separability and Generalization
When is data noise vs. a legitimate exception
41
Limited Functionality of Hyperplane
42
How to Handle Multi-Class Output
This is an issue with any learning model which only supports binary classification (perceptron, SVM, etc.) Create 1 perceptron for each output class, where the training set considers all other classes to be negative examples Run all perceptrons on novel data and set the output to the class of the perceptron which outputs high If there is a tie, choose the perceptron with the highest net value Create 1 perceptron for each pair of output classes, where the training set only contains examples from the 2 classes Run all perceptrons on novel data and set the output to be the class with the most wins (votes) from the perceptrons In case of a tie, use the net values to decide Number of models grows by the square of the output classes Use next slide as example
43
Objective Functions: Accuracy/Error
How do we judge the quality of a particular model (e.g. Perceptron with a particular setting of weights) Consider how accurate the model is on the data set Classification accuracy = # Correct/Total instances Classification error = # Misclassified/Total instances (= 1 – acc) Usually minimize a Loss function (aka cost, error) For real valued outputs and/or targets Pattern error = Target – output Errors could cancel each other: S|ti – zi| (L1 loss) Common approach is Squared Error = S(ti – zi)2 (L2 loss) Total sum squared error = S Pattern Errors = S S (ti – zi)2 For nominal data, pattern error is typically 1 for a mismatch and 0 for a match For nominal (including binary) output and targets, SSE and classification error are equivalent L2 some mathematical advantages with derivative, etc.
44
Mean Squared Error Mean Squared Error (MSE) – SSE/n where n is the number of instances in the data set This can be nice because it normalizes the error for data sets of different sizes MSE is the average squared error per pattern Root Mean Squared Error (RMSE) – is the square root of the MSE This puts the error value back into the same units as the features and can thus be more intuitive Since we squared the error on the SSE RMSE is the average distance (error) of targets from the outputs in the same scale as the features
45
Perceptron Learning Theorem
A perceptron (threshold unit) can learn anything that it can represent (i.e. anything separable with a hyperplane)
46
The Exclusive OR problem
A Perceptron cannot represent Exclusive OR since it is not linearly separable.
48
Minsky & Papert (1969) offered solution to XOR problem by
combining perceptron unit responses using a second layer of Units. Piecewise linear classification using an MLP with threshold (perceptron) units +1 1 3 2 +1
49
Linear Models which are Non-Linear in the Input Space
So far we have used We could preprocess the inputs in a non-linear way and do To the perceptron algorithm it looks just the same and can use the same learning algorithm, it just has different inputs - SVM For example, for a problem with two inputs x and y (plus the bias), we could also add the inputs x2, y2, and x·y The perceptron would just think it is a 5 dimensional task, and it is linear in those 5 dimensions But what kind of decision surfaces would it allow for the original 2-d input space? Draw on board
50
Quadric Machine All quadratic surfaces (2nd order) ellipsoid parabola
etc. That significantly increases the number of problems that can be solved But still many problem which are not quadrically separable Could go to 3rd and higher order features, but number of possible features grows exponentially Multi-layer neural networks will allow us to discover high-order features automatically from the input space
51
Simple Quadric Example
f1 Perceptron with just feature f1 cannot separate the data Could we add a transformed feature to our perceptron?
52
Simple Quadric Example
f1 Perceptron with just feature f1 cannot separate the data Could we add a transformed feature to our perceptron? f2 = f12
53
Simple Quadric Example
f2 f1 f1 Perceptron with just feature f1 cannot separate the data Could we add another feature to our perceptron f2 = f12 Note could also think of this as just using feature f1 but now allowing a quadric surface to separate the data Note didn’t need feature f1 in this case, just f1^2 (not all combinations needed)
54
The Key Elements of Neural Networks
Neural computing requires a number of neurons, to be connected together into a neural network. Neurons are arranged in layers. Each neuron within the network is usually a simple processing unit which takes one or more inputs and produces an output. At each neuron, every input has an associated weight which modifies the strength of each input. The neuron simply adds together all the inputs and calculates an output to be passed on.
55
xn x1 x2 Input Output Three-layer networks Hidden layers
56
Properties of architecture
No connections within a layer Each unit is a perceptron
57
Properties of architecture
No connections within a layer No direct connections between input and output layers Each unit is a perceptron
58
Properties of architecture
No connections within a layer No direct connections between input and output layers Fully connected between layers Each unit is a perceptron
59
Properties of architecture
No connections within a layer No direct connections between input and output layers Fully connected between layers Often more than 3 layers Number of output units need not equal number of input units Number of hidden units per layer can be more or less than input or output units Each unit is a perceptron Often include bias as an extra weight
60
What do each of the layers do?
3rd layer can generate arbitrarily complex boundaries 1st layer draws linear boundaries 2nd layer combines the boundaries
61
Feed Forward Neural Networks
The information is propagated from the inputs to the outputs Computations of No non linear functions from n input variables by compositions of N algebraic functions Time has no role (NO cycle between outputs and inputs) Output layer 2nd hidden layer 1st hidden layer x1 x2 ….. xn
62
Learning The procedure that consists in estimating the parameters of neurons so that the whole network can perform a specific task 2 types of learning The supervised learning The unsupervised learning The Learning process (supervised) Present the network a number of inputs and their corresponding outputs See how closely the actual outputs match the desired ones Modify the parameters to better approximate the desired outputs
63
Supervised learning The desired response of the neural network in function of particular inputs is well known. A “Professor” may provide examples and teach the neural network how to fulfill a certain task
64
Unsupervised learning
Idea : group typical input data in function of resemblance criteria un-known a priori Data clustering No need of a professor The network finds itself the correlations between the data
65
Properties of Neural Networks
Supervised networks are universal approximators (Non recurrent networks) Theorem : Any limited function can be approximated by a neural network with a finite number of hidden neurons to an arbitrary precision Type of Approximators Linear approximators : for a given precision, the number of parameters grows exponentially with the number of variables (polynomials) Non-linear approximators (NN), the number of parameters grows linearly with the number of variables
66
Other properties Adaptivity Generalization ability Fault tolerance
Adapt weights to environment and retrained easily Generalization ability May provide against lack of data Fault tolerance Graceful degradation of performances if damaged => The information is distributed within the entire net.
67
Example
68
Classification (Discrimination)
Class objects in defined categories Rough decision OR Estimation of the probability for a certain object to belong to a specific class Example : Data mining Applications : Economy, speech and patterns recognition, sociology, etc.
69
Example Examples of handwritten postal codes
drawn from a database available from the US Postal service
70
What do we need to use NN ? Determination of pertinent inputs
Collection of data for the learning and testing phase of the neural network Finding the optimum number of hidden nodes Estimate the parameters (Learning) Evaluate the performances of the network IF performances are not satisfactory then review all the precedent points
71
Why we need backpropagation
Networks without hidden units are very limited in the input-output mappings they can model. More layers of linear units do not help. Its still linear. Fixed output non-linearities are not enough We need multiple layers of adaptive non-linear hidden units. This gives us a universal approximator. But how can we train such nets? We need an efficient way of adapting all the weights, not just the last layer. This is hard. Learning the weights going into hidden units is equivalent to learning features. Nobody is telling us directly what hidden units should do.
72
Learning by perturbing weights
Randomly perturb one weight and see if it improves performance. If so, save the change. Very inefficient. We need to do multiple forward passes on a representative set of training data just to change one weight. Towards the end of learning, large weight perturbations will nearly always make things worse. We could randomly perturb all the weights in parallel and correlate the performance gain with the weight changes. Not any better because we need lots of trials to “see” the effect of changing one weight through the noise created by all the others. output units hidden units input units Learning the hidden to output weights is easy. Learning the input to hidden weights is hard.
73
The idea behind back propagation
We don’t know what the hidden units ought to do, but we can compute how fast the error changes as we change a hidden activity. Instead of using desired activities to train the hidden units, use error derivatives w.r.t. hidden activities. Each hidden activity can affect many output units and can therefore have many separate effects on the error. These effects must be combined. We can compute error derivatives for all the hidden units efficiently. Once we have the error derivatives for the hidden activities, its easy to get the error derivatives for the weights going into a hidden unit.
74
Some Success Stories Back-propagation has been used for a large number of practical applications. Recognizing hand-written characters Predicting the future price of stocks Detecting credit card fraud Recognize speech Predicting the next word in a sentence from the previous words
75
Overview of the applications in this lecture
Modeling relational data This toy application shows that the hidden units can learn to represent sensible features that are not at all obvious. It also bridges the gap between relational graphs and feature vectors. Learning to predict the next word in a sentence The toy model above can be turned into a useful model for predicting words to help a speech recognizer. Reading documents An impressive application that is used to read checks.
76
Multi-Layer Perceptron
One or more hidden layers Sigmoid activations functions Output layer 2nd hidden layer 1st hidden layer Input data
77
Learning Back-propagation algorithm Credit assignment
If the jth node is an output unit
78
Momentum term to smooth
The weight changes over time
79
Different non linearly separable problems
Types of Decision Regions Exclusive-OR Problem Classes with Meshed regions Most General Region Shapes Structure Single-Layer Half Plane Bounded By Hyperplane A B B A Two-Layer Convex Open Or Closed Regions A B B A Abitrary (Complexity Limited by No. of Nodes) Three-Layer A B B A Neural Networks – An Introduction Dr. Andrew Hunter
80
Feedforword NNs The basic structure off a feedforward Neural Network
The learning rule modifies the weights according to the input patterns that it is presented with. In a sense, ANNs learn by example as do their biological counterparts. When the desired output are known we have supervised learning or learning with a teacher.
81
An overview of the backpropagation
1. A set of examples for training the network is assembled. Each case consists of a problem statement (which represents the input into the network) and the corresponding solution (which represents the desired output from the network). 2. The input data is entered into the network via the input layer. Each neuron in the network processes the input data with the resultant values steadily "percolating" through the network, layer by layer, until a result is generated by the output layer. 4. The actual output of the network is compared to expected output for that particular input. This results in an error value.. The connection weights in the network are gradually adjusted, working backwards from the output layer, through the hidden layer, and to the input layer, until the correct output is produced. Fine tuning the weights in this way has the effect of teaching the network how to produce the correct output for a particular input, i.e. the network learns.
82
The Learning Rule The delta rule is often utilized by the most common class of ANNs called backpropagational neural networks. When a neural network is initially presented with a pattern it makes a random guess as to what it might be. It then sees how far its answer was from the actual one and makes an appropriate adjustment to its connection weights.
83
The Insides off Delta Rule
Backpropagation performs a gradient descent within the solution's vector space towards a global minimum. The error surface itself is a hyperparaboloid but is seldom smooth as is depicted in the graphic below. Indeed, in most problems, the solution space is quite irregular with numerous pits and hills which may cause the network to settle down in a local minimum which is not the best overall solution.
84
Early stopping Training data Validation data Test data
85
Design Considerations
What transfer function should be used? How many inputs does the network need? How many hidden layers does the network need? How many hidden neurons per hidden layer? How many outputs should the network have? There is no standard methodology to determinate these values. Even there is some heuristic points, final values are determinate by a trial and error procedure.
86
Characteristics of NNs
Learning from experience: Complex difficult to solve problems, but with plenty of data that describe the problem Generalizing from examples: Can interpolate from previous learning and give the correct response to unseen data Rapid applications development: NNs are generic machines and quite independent from domain knowledge Adaptability: Adapts to a changing environment, if is properly designed Computational efficiency: Although the training off a neural network demands a lot of computer power, a trained network demands almost nothing in recall mode Non-linearity: Not based on linear assumptions about the real word
87
Pre-processing Transform data to NN inputs
Applying a mathematical or statistical function Encoding textual data from a database Selection of the most relevant data and outlier removal Minimizing network inputs Feature extraction Principal components analysis Waveform / Image analysis Coding pre-processing data to network inputs
88
Different types of Neural Networks
Feed-forward networks Feed-forward NNs allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affect that same layer. Feed-forward NNs tend to be straight forward networks that associate inputs with outputs. They are extensively used in pattern recognition. This type of organization is also referred to as bottom-up or top-down.
89
Continued Feedback networks
Feedback networks can have signals traveling in both directions by introducing loops in the network. Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at the equilibrium point until the input changes and a new equilibrium needs to be found. Feedback architectures are also referred to as interactive or recurrent, although the latter term is often used to denote feedback connections in single-layer organizations.
90
Diagram of an NN Fig: A simple Neural Network
91
Network Layers Input Layer - The activity of the input units represents the raw information that is fed into the network. Hidden Layer - The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units. Output Layer - The behavior of the output units depends on the activity of the hidden units and the weights between the hidden and output units.
92
Continued This simple type of network is interesting because the hidden units are free to construct their own representations of the input. The weights between the input and hidden units determine when each hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents.
93
Network Structure The number of layers and of neurons depend on the specific task. In practice this issue is solved by trial and error. Two types of adaptive algorithms can be used: start from a large network and successively remove some neurons and links until network performance degrades. begin with a small network and introduce new neurons until performance is satisfactory.
94
Network Parameters How are the weights initialized?
How many hidden layers and how many neurons? How many examples in the training set?
95
Preprocessing
96
Why Preprocessing ? The curse of Dimensionality
The quantity of training data grows exponentially with the dimension of the input space In practice, we only have limited quantity of input data Increasing the dimensionality of the problem leads to give a poor representation of the mapping
97
Preprocessing methods
Normalization Translate input values so that they can be exploitable by the neural network Component reduction Build new input variables in order to reduce their number No Lost of information about their distribution
98
Character recognition example
Image 256x256 pixels 8 bits pixels values (grey level) Necessary to extract features
99
Normalization Inputs of the neural net are often of different types with different orders of magnitude (E.g. Pressure, Temperature, etc.) It is necessary to normalize the data so that they have the same impact on the model Center and reduce the variables
100
Average on all points Variance calculation Variables transposition
101
Components reduction Sometimes, the number of inputs is too large to be exploited The reduction of the input number simplifies the construction of the model Goal : Better representation of the data in order to get a more synthetic view without losing relevant information Reduction methods (PCA, CCA, etc.)
102
Principal Components Analysis (PCA)
Principle Linear projection method to reduce the number of parameters Transfer a set of correlated variables into a new set of uncorrelated variables Map the data into a space of lower dimensionality Form of unsupervised learning Properties It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables New axes are orthogonal and represent the directions with maximum variability
103
Compute d dimensional mean Compute d*d covariance matrix
Compute eigenvectors and Eigenvalues Choose k largest Eigenvalues K is the inherent dimensionality of the subspace governing the signal Form a d*d matrix A with k columns of eigenvectors The representation of data consists of projecting data into a k dimensional subspace by
104
Example of data representation using PCA
105
Limitations of PCA The reduction of dimensions for complex distributions may need non linear processing
106
Curvilinear Components Analysis
Non linear extension of the PCA Can be seen as a self organizing neural network Preserves the proximity between the points in the input space i.e. local topology of the distribution Enables to unfold some varieties in the input data Keep the local topology
107
Example of data representation using CCA
Non linear projection of a spiral Non linear projection of a horseshoe
108
Other methods Neural pre-processing
Use a neural network to reduce the dimensionality of the input space Overcomes the limitation of PCA Auto-associative mapping => form of unsupervised training
109
Non linear component analysis
D dimensional output space x1 x2 …. xd Transformation of a d dimensional input space into a M dimensional output space Non linear component analysis The dimensionality of the sub-space must be decided in advance M dimensional sub-space z1 zM x1 x2 …. xd D dimensional input space
110
« Intelligent preprocessing »
Use an “a priori” knowledge of the problem to help the neural network in performing its task Reduce manually the dimension of the problem by extracting the relevant features More or less complex algorithms to process the input data
111
Example in the H1 L2 neural network trigger
Principle Intelligent preprocessing extract physical values for the neural net (impulse, energy, particle type) Combination of information from different sub-detectors Executed in 4 steps Post Processing Clustering Matching Ordering find regions of interest within a given detector layer combination of clusters belonging to the same object sorting of objects by parameter generates variables for the neural network
112
Conclusion on the preprocessing
The preprocessing has a huge impact on performances of neural networks The distinction between the preprocessing and the neural net is not always clear The goal of preprocessing is to reduce the number of parameters to face the challenge of “curse of dimensionality” It exists a lot of preprocessing algorithms and methods Preprocessing with prior knowledge Preprocessing without
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.