Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural networks for data mining Eric Postma MICC-IKAT Universiteit Maastricht.

Similar presentations


Presentation on theme: "Neural networks for data mining Eric Postma MICC-IKAT Universiteit Maastricht."— Presentation transcript:

1 Neural networks for data mining Eric Postma MICC-IKAT Universiteit Maastricht

2 Overview Introduction: The biology of neural networks the biological computer brain-inspired models basic notions Interactive neural-network demonstrations Perceptron Multilayer perceptron Kohonen’s self-organising feature map Examples of applications

3 A typical AI agent

4 Two types of learning Supervised learningSupervised learning curve fitting, surface fitting,...curve fitting, surface fitting,... Unsupervised learningUnsupervised learning clustering, visualisation...clustering, visualisation...

5 An input-output function

6 Fitting a surface to four points

7 Regression

8 Classification

9 The history of neural networks A powerful metaphor A powerful metaphor Several decades of theoretical analyses led to the formalisation in terms of statistics Several decades of theoretical analyses led to the formalisation in terms of statistics Bayesian framework Bayesian framework We discuss neural networks from the original metaphorical perspective We discuss neural networks from the original metaphorical perspective

10 (Artificial) neural networks The digital computer versus the neural computer

11 The Von Neumann architecture

12 The biological architecture

13 Digital versus biological computers 5 distinguishing properties speed speed robustness robustness flexibility flexibility adaptivity adaptivity context-sensitivity context-sensitivity

14 Speed: The “hundred time steps” argument The critical resource that is most obvious is time. Neurons whose basic computational speed is a few milliseconds must be made to account for complex behaviors which are carried out in a few hudred milliseconds (Posner, 1978). This means that entire complex behaviors are carried out in less than a hundred time steps. Feldman and Ballard (1982)

15 Graceful Degradation damage performance

16 Flexibility: the Necker cube

17 vision = constraint satisfaction

18 And sometimes plain search…

19 Adaptivitiy processing implies learning in biological computers versus processing does not imply learning in digital computers

20 Context-sensitivity: patterns emergent properties

21 Robustness and context-sensitivity coping with noise

22 The neural computer Is it possible to develop a model after the natural example?Is it possible to develop a model after the natural example? Brain-inspired models:Brain-inspired models: models based on a restricted set of structural en functional properties of the (human) brainmodels based on a restricted set of structural en functional properties of the (human) brain

23 The Neural Computer (structure)

24 Neurons, the building blocks of the brain

25

26 Neural activity in out

27 Synapses, the basis of learning and memory

28 Learning: Hebb’s rule neuron 1synapseneuron 2

29 Forgetting in neural networks

30 Towards neural networks

31 Connectivity An example: The visual system is a feedforward hierarchy of neural modules Every module is (to a certain extent) responsible for a certain function

32 (Artificial) Neural Networks NeuronsNeurons activityactivity nonlinear input-output functionnonlinear input-output function ConnectionsConnections weightweight LearningLearning supervisedsupervised unsupervisedunsupervised

33 Artificial Neurons input (vectors) input (vectors) summation (excitation) summation (excitation) output (activation) output (activation) i

34 Input-output function nonlinear function: nonlinear function: e f(e) f(x) = 1 + e -x/a 1 a  0 a  

35 Artificial Connections (Synapses) w AB w AB The weight of the connection from neuron A to neuron BThe weight of the connection from neuron A to neuron B AB w AB

36 The Perceptron

37 Learning in the Perceptron Delta learning rule Delta learning rule the difference between the desired output t and the actual output o, given input xthe difference between the desired output t and the actual output o, given input x Global error E Global error E is a function of the differences between the desired and actual outputsis a function of the differences between the desired and actual outputs

38 Gradient Descent

39 Linear decision boundaries

40 Minsky and Papert’s connectedness argument

41 The history of the Perceptron Rosenblatt (1959) Rosenblatt (1959) Minsky & Papert (1961) Minsky & Papert (1961) Rumelhart & McClelland (1986) Rumelhart & McClelland (1986)

42 The multilayer perceptron input one or more hidden layers output

43 Training the MLP supervised learning supervised learning each training pattern: input + desired outputeach training pattern: input + desired output in each epoch: present all patternsin each epoch: present all patterns at each presentation: adapt weightsat each presentation: adapt weights after many epochs convergence to a local minimumafter many epochs convergence to a local minimum

44 phoneme recognition with a MLP input: frequencies Output: pronunciation

45 Non-linear decision boundaries

46 Compression with an MLP the autoencoder

47 hidden representation

48 Restricted Boltzmann machines (RBMs)

49 Learning in the MLP

50

51

52 Preventing Overfitting GENERALISATION = performance on test set GENERALISATION = performance on test set Early stopping Early stopping Training, Test, and Validation set Training, Test, and Validation set k-fold cross validation k-fold cross validation leaving-one-out procedureleaving-one-out procedure

53 Image Recognition with the MLP

54

55 Hidden Representations

56 Other Applications PracticalPractical OCROCR financial time seriesfinancial time series fraud detectionfraud detection process controlprocess control marketingmarketing speech recognitionspeech recognition TheoreticalTheoretical cognitive modelingcognitive modeling biological modelingbiological modeling

57 Some mathematics…

58 Perceptron

59 Derivation of the delta learning rule Target output Actual output h = i

60 MLP

61 Sigmoid function May also be the tanh functionMay also be the tanh function ( instead of )( instead of ) Derivative f’(x) = f(x) [1 – f(x)]Derivative f’(x) = f(x) [1 – f(x)]

62 Derivation generalized delta rule

63 Error function (LMS)

64 Adaptation hidden-output weights

65 Adaptation input-hidden weights

66 Forward and Backward Propagation

67 Decision boundaries of Perceptrons Straight lines (surfaces), linear separable

68 Decision boundaries of MLPs Convex areas (open or closed)

69 Decision boundaries of MLPs Combinations of convex areas

70 Learning and representing similarity

71 Alternative conception of neurons Neurons do not take the weighted sum of their inputs (as in the perceptron), but measure the similarity of the weight vector to the input vector Neurons do not take the weighted sum of their inputs (as in the perceptron), but measure the similarity of the weight vector to the input vector The activation of the neuron is a measure of similarity. The more similar the weight is to the input, the higher the activation The activation of the neuron is a measure of similarity. The more similar the weight is to the input, the higher the activation Neurons represent “prototypes” Neurons represent “prototypes”

72 Course Coding

73 2nd order isomorphism

74 Prototypes for preprocessing

75 Kohonen’s SOFM (Self Organizing Feature Map) Unsupervised learning Unsupervised learning Competitive learning Competitive learning output input (n-dimensional) winner

76 Competitive learning Determine the winner (the neuron of which the weight vector has the smallest distance to the input vector) Determine the winner (the neuron of which the weight vector has the smallest distance to the input vector) Move the weight vector w of the winning neuron towards the input i Move the weight vector w of the winning neuron towards the input i Before learning i w After learning i w

77 Kohonen’s idea Impose a topological order onto the competitive neurons (e.g., rectangular map) Impose a topological order onto the competitive neurons (e.g., rectangular map) Let neighbours of the winner share the “prize” (The “postcode lottery” principle.) Let neighbours of the winner share the “prize” (The “postcode lottery” principle.) After learning, neurons with similar weights tend to cluster on the map After learning, neurons with similar weights tend to cluster on the map

78 Biological inspiration

79 Topological order neighbourhoods SquareSquare winner (red)winner (red) Nearest neighboursNearest neighbours HexagonalHexagonal Winner (red)Winner (red) Nearest neighboursNearest neighbours

80 inputs Outputs (map)

81

82 A simple example A topological map of 2 x 3 neurons and two inputs A topological map of 2 x 3 neurons and two inputs 2D input input weights visualisation

83 Weights before training

84 Input patterns (note the 2D distribution)

85 Weights after training

86 Another example Input: uniformly randomly distributed pointsInput: uniformly randomly distributed points Output: Map of 20 2 neuronsOutput: Map of 20 2 neurons TrainingTraining Starting with a large learning rate and neighbourhood size, both are gradually decreased to facilitate convergenceStarting with a large learning rate and neighbourhood size, both are gradually decreased to facilitate convergence

87 Weights visualisation

88 Dimension reduction 3D input 2D output

89 Adaptive resolution 2D input 2D output

90 Output map representation

91 Application of SOFM Examples (input)SOFM after training (output)

92 Visual features (biologically plausible)

93 Face Classification

94 Colour classification

95 Car classification

96 Principal Components Analysis (PCA) Principal Components Analysis (PCA) pca1 pca2 pca1 pca2 Projections of data Relation with statistical methods 1

97 Relation with statistical methods 2 Multi-Dimensional Scaling (MDS) Multi-Dimensional Scaling (MDS) Sammon Mapping Sammon Mapping Distances in high- dimensional space


Download ppt "Neural networks for data mining Eric Postma MICC-IKAT Universiteit Maastricht."

Similar presentations


Ads by Google