Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

Neural Networks Lecture 4 out of 4

Practical Considerations Input Architecture Output

Multilayer Perceptron Architecture

What is the input to a neural net? Binary or real numbered strings How do we represent real world input like this? XOR is simply 2 (usually) binary inputs. But how would you represent – an image? – a voice? – a document?

Input: Features Extraction: How to get features out of the input. Selection: Which features to choose. Representation: How to represent these features as binary or real valued strings

Example: Audio Input Eg voice recognition amplitude (loudness) Pitch (frequency) Frequency ‘signature’ of a voice can be represented as a sequence of numbers.

Example: Time Series Data Eg stock market prediction Use 'past performance' figures – caveat: past performance is not a guarantee of future performance Use a 'sliding window' technique – each day the ANN predicts tomorrow’s performance – Often uses a different architecture (recurrent connections) ANNs are now a standard part of the analysis toolkits – And so their effect has been 'factored into' the market

Example: Image Processing Eg face recognition – Face recognition (is it you?) is easier than face classification (who is it?) Naïve approach - bitmap values for each pixel features – edges, corners, even shapes markers - position of eyes, mouths, faces (in photos). Lots of ways of turning an image into a string of real numbers

Example: Text Processing eg document classification (news recommender systems). o Scan the document looking for interesting features. o Words: dictionary of interesting words o Entities: people or places you care about. o Concepts: ideas and themes eg political stories, sports news

Input: Features Extraction: How to get features out of the input. Selection: Which features to choose. Representation: How to represent these features as binary or real valued strings

Feature Extraction Audio: Fourier transforms to represent a sound as a series of frequencies and their relative importance. Question: what ‘granularity’ do we choose? Time Series Data: use 'past performance' figures over a certain time period (day? minute? second?) Images: preprocessing to extract features like eyes; easier to detect simple things like edges Documents: easy to extract words from a document!

Feature Selection Which features are most important? Audio: what frequencies? Time Series Data: what time period? Images: what are the ‘most important’ features (eyes position? Inter-eye distance? Gaze direction?) Documents: most ‘informative’ words (words like ‘and’ are probably not that useful) – stopwords – information gain/entropy

Feature Representation Binary or real? Scaling? between 0 and 1 Audio: normalise each pitch contribution Time Series Data: normalise magnitudes – Can also look at differences rather than absolute level Images: – Binary: presence or absence of a feature. – Real numbered: eg size, colour or illumination Documents: – Binary for presence or absence of a word/entity/concept – real numbered for (normalised) frequency – Standard measure TFIDF

Middle Layer

Architecture basically means how many hidden layer nodes Input units fixed (once you have decided on features) Output units fixed (once you have decided on output – see next section) Most networks are feedforward, fully connected (changing this means changing network type) What’s left is the middle layer

Overfitting and underfitting What a neural network does, under the hood, is to compute a function. – It’s difficult to work out what this function is, but in most cases we won’t actually care. We just want to know what the output will be for any given input. Underfitting is where you fit too simple a function eg a straight line where you actually needed a curve. Overfitting is where you fit too complex a function eg a very wavy line where you just needed a simple curve without twists and turns. These 2 faults show themselves up in different ways.

Training and Test Data training data = data for which we already know the answer (expected output). Therefore can work out the error –.. apply backpropagation to change the weights. Test data = we also know the answer (expected output) but we don’t tell the network during training – After training, ‘freeze’ the weights – Try it out on test data – see how well it generalises to unseen data.

How do over and underfitting neural networks do on training and test data? Underfitting neural networks perform poorly on both training and test sets. Overfitting networks may do very well on training sets but terribly on test sets. Underfitting neural networks tend to have too few hidden units: overfitting networks have too many. Some learning regimes allow the number of hidden units to be changed during training.

Validation Data A set of data that, like the test set, is ‘hidden’ from the network during training. From time to time during training, we try out the network against validation data. If the validation error starts increasing while the training error is decreasing, we have started overfitting and should stop training. We may allow the network to add (or remove) hidden units during this process.

Output

Classification vs Regression Classification: putting data into ‘classes’ – ‘whose face is this’, – ‘is this an interesting web page’, – ‘is this an example of fraud’.. Regression: function approximation. – If I adjust the dials like this, what will the result be? – If I change the interest rate, what will happen to the economy? For classification, output units produce a ‘1’ or ‘0’ – i.e. a threshold For regression, output units produce a real number – Typically they are linear – the output is exactly the same as the weighted sum of the input.

Summary Input: – Feature Extraction, – Feature Selection – Feature Representation Architecture: – Training, Test, Validation – Number of hidden units Output – Classification – Regression

Summary of Module Neural Networks: Nature Inspired, units connected with weights that can be changed Used for classification and regression tasks Long history; logical calculus, perceptrons Problem of linear separability (solved by backpropagation) Pragmatics – main task is input processing.

Next Up… Other nature inspired techniques – Evolutionary Algorithms – Swarms

Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

Similar presentations

Presentation on theme: "Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

Similar presentations

Presentation on theme: "Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output."— Presentation transcript:

Similar presentations

About project

Feedback