Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision trees 5.Support Vector Machine.

Similar presentations


Presentation on theme: "Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision trees 5.Support Vector Machine."— Presentation transcript:

1 Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision trees 5.Support Vector Machine

2 What is classification Locate new observations into known classes by previously trained model Model was trained by existing data with known label Introduction

3 Machine Learning is the study of computer algorithms that improve automatically through experience. [Machine learning, Tom Mitchell, McGraw Hill, 1997] Machine Classifier Training Data: example input/output pairs input output Machine Learning for Classification

4 Classification Steps Introduction

5 Examples of Classification Task Determining cells as cancer or non-cancer Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc Introduction

6 Prototype based Methods: K-Nearest Neighbour (KNN),Weighed KNN, Fuzzy KNN, etc. Boundary based Methods: Neural Networks, such as Multiple Layer Perception (MLP), Back Propagration (BP), Support Vector Machine (SVM), etc. Rule based Methods: Decision Tree Classification Methods used in DM

7 Classification Method: 1 kNN - Basic Information Training method: – Save the training examples At prediction time: – Find the k training examples (x 1,y 1 ),…(x k,y k ) that are nearest to the test example x – Predict the most frequent class among those y i ’s. kNN

8 kNN Steps Store all input data in the training set For each sample in the test set Search for the K nearest sample to the input sample using a Euclidean distance measure For classification, compute the confidence for each class as C i /K, (where C i is the number of samples among the K nearest samples belonging to class i.) The classification for the input sample is the class with the highest confidence. kNN

9 An arbitrary instance is represented by(a 1 (x), a 2 (x), a 3 (x),.., a n (x)) – a i (x) denotes features Euclidean distance between two instances d(x i, x j )=sqrt (sum for r=1 to n (a r (x i ) - a r (x j )) 2 ) Continuous valued target function – mean value of the k nearest training examples kNN Calculation kNN

10 kNN Calculation - 1-Nearest Neighbor kNN

11 kNN Calculation - 3-Nearest Neighbor kNN

12 On Class Practice 1 Data – Iris.arff and your own data (if applicable) Method – k-NN – Parameter (Select by yourself) Software – wekaclassalgos1.7 Step – Explorer->Classify->Classifier (Lazy IBK)

13 Classification Method: 2 Neural Networks – Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours. An appropriate model/simulation of the nervous system should be able to produce similar responses and behaviours in artificial systems. The nervous system is build by relatively simple units, the neurons, so copying their behaviour and functionality should be the solution. Neural networks

14 A neural network is an interconnected group of nodes Neural Networks – Basic Structure Neural networks

15 Neural Networks – Biological inspiration Neural networks Dendrites Soma (cell body) Axon

16 Neural Networks – Biological inspiration Neural networks synapses axon dendrites The information transmission happens at the synapses.

17 Neural Networks – Biological inspiration Neural networks The spikes (signal) travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse. The neurotransmitters cause excitation (+) or inhibition (-) in the dendrite of the post-synaptic neuron. The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron. The contribution of the signals depends on the strength of the synaptic connection.

18 Neural Networks – Artificial neurons Neural networks Neurons work by processing information. They receive and provide information in form of spikes. The McCullogh-Pitts model Inputs Output w2w2 w1w1 w3w3 wnwn w n-1... x 1 x 2 x 3 … x n-1 x n y

19 Neural Networks – Artificial neurons Neural networks The McCullogh-Pitts model: spikes are interpreted as spike rates; synaptic strength are translated as synaptic weights; excitation means positive product between the incoming spike rate and the corresponding synaptic weight; inhibition means negative product between the incoming spike rate and the corresponding synaptic weight;

20 Neural Networks – Artificial neurons Neural networks Nonlinear generalization of the McCullogh-Pitts neuron: y is the neuron’s output, x is the vector of inputs, and w is the vector of synaptic weights. Examples: sigmoidal neuron Gaussian neuron

21 Neural Networks – Artificial neural networks Neural networks Inputs Output An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs.

22 Neural Networks Neural networks Learning in biological systems Learning = learning by adaptation The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behaviour. At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones.

23 Neural Networks Neural networks Learning as optimisation The objective of adapting the responses on the basis of the information received from the environment is to achieve a better state. E.g., the animal likes to eat many energy rich, juicy fruits that make its stomach full, and makes it feel happy. In other words, the objective of learning in biological organisms is to optimise the amount of available resources, happiness, or in general to achieve a closer to optimal state.

24 Neural Networks Neural networks Learning in biological neural networks The learning rules of Hebb: synchronous activation increases the synaptic strength; asynchronous activation decreases the synaptic strength. These rules fit with energy minimization principles. Maintaining synaptic strength needs energy, it should be maintained at those places where it is needed, and it shouldn’t be maintained at places where it’s not needed.

25 Neural Networks Neural networks Learning principle for artificial neural networks ENERGY MINIMIZATION We need an appropriate definition of energy for artificial neural networks, and having that we can use mathematical optimisation techniques to find how to change the weights of the synaptic connections between neurons. ENERGY = measure of task performance error

26 Neural Networks- mathematics Neural networks Inputs Output

27 Neural Networks-mathematics Neural networks input / output transformation W is the matrix of all weight vectors. F actually is two functions: weighted sum of input & activation function

28 Neural Networks- Perceptron Neural networks ● Basic unit in a neural network ● Linear separator ● Parts  N inputs, x 1... x n  Weights for each input, w 1... w n  A bias input x 0 (constant) and associated weight w 0  Weighted sum of inputs, z = w 0 x 0 + w 1 x 1 +... + w n x n  A threshold function, i.e y=1 if z > 0, y=-1 if z <= 0

29 Neural Networks- Perceptron Neural networks x1 x2...... xn Σ Thres hold z = Σ w i x i x0 w0 w1 w2 wn 1 if z >0 -1 otherwise

30 Neural Networks- Perceptron Neural networks Learning in Perceptron Start with random weights Select an input couple (x, F(x)) if then modify the weight according with Note that the weights are not modified if the network gives the correct answer

31 Neural Networks- Perceptron Neural networks Can add learning rate to speed up the learning process; just multiply in with delta computation Essentially a linear discriminant Perceptron theorem: If a linear discriminant exists that can separate the classes without error, the training procedure is guaranteed to find that line or plane. only one layer, problem with solving complex problem

32 Neural Networks Neural networks MLP Backpropagation networks Attributed to Rumelhart and McClelland, late 70’s Can construct multilayer networks. Typically we have fully connected, feedforward networks. Inputs Output

33 Neural Networks – MLP BP Neural networks Learning Procedure: Randomly assign weights (between 0-1) Present inputs from training data, propagate to outputs Compute outputs O, adjust weights according to the delta rule, backpropagating the errors. The weights will be nudged closer so that the network learns to give the desired output. Repeat; stop when no errors, or enough epochs completed

34 Neural Networks – MLP BP Neural networks Inputs Output if Error found here Perceptron can only change weight here MLP BP changes weight here as well

35 Neural Networks – MLP BP Neural networks Very powerful - can learn any function, given enough hidden units! With enough hidden units, we can generate any function. Have the same problems of Generalization vs. Memorization. With too many units, we will tend to memorize the input and not generalize well. Some schemes exist to “prune” the neural network. Networks require extensive training, many parameters to fiddle with. Can be extremely slow to train. May also fall into local minima. Inherently parallel algorithm, ideal for multiprocessor hardware. Despite the cons, a very powerful algorithm that has seen widespread successful deployment.

36 Neural Networks – MLP BP Neural networks Parameters: number of layers number of neurals on layer transfer function (activation function) number of iterations (cycles)

37 On Class Practice 2 Data – Iris.arff (weka format) and your own data (if applicable) – Iris.txt (Neucom format) Method – Back-Propagation and Multiple layer Perceptron – Parameters (Select by yourself) Software – wekaclassalgos1.7 Steps:Explorer->Classify->Classifier (Nerual – multilayerperceptron - BackPropagation) – Neucom Steps: Modeling Discovery -> Classification-> Neural Networks-> Multi-Layer Perceptron

38 Self Organizing Map (SOM) Neural networks Characteristic: 1.uses neighborhood 2.High-dimensional low-dimensional Two concepts: 1.Training builds the map using input examples. 2.Mapping classifies a new input vector Components: Nodes or Neurons weight vector of the same dimension as the input data vectors and a position in the map space

39 – Gaussian neighborhood function: – d ji : initial distance of neurons i and j in a 1-dimensional lattice| j - i | in a 2-dimensional lattice|| r j - r i || where r j is the position of neuron j in the lattice. Neighborhood Function Neural networks

40 40 N 13 (1) N 13 (2) Neural networks

41 –  measures the degree to which excited neurons in the vicinity of the winning neuron cooperate in the learning process. – In the learning algorithm  is updated at each iteration during the ordering phase using the following exponential decay update rule, with parameters Neighborhood Function Neural networks

42 Degree of neighbourhood Distance from winner Degree of neighbourhood Distance from winner Time Neighborhood Function Neural networks

43 SOM – Algorithm Steps 43 1.Randomly initialise all weights 2.Select input vector x = [x 1, x 2, x 3, …, x n ] from training set 3.Compare x with weights w j for each neuron j to 4.determine winner find unit j with the minimum distance 5.Update winner so that it becomes more like x, together with the winner’s neighbours for units within the radius according to 6.Adjust parameters: learning rate & ‘neighbourhood function’ 7.Repeat from (2) until … ? Note that: Learning rate generally decreases with time: Neural networks Step

44 SOM - Architecture Lattice of neurons (‘nodes’) accepts and responds to set of input signals Responses compared; ‘winning’ neuron selected from lattice Selected neuron activated together with ‘neighbourhood’ neurons Adaptive process changes weights to more closely inputs 2d array of neurons Set of input signals Weights x1x1 x2x2 x3x3 xnxn... w j1 w j2 w j3 w jn j Neural networks

45 On Class Practice 3 Data – Iris.arff and your own data (if applicable) Method – SOM – Parameter (Select by yourself) Software – wekaclassalgos1.7 Step – Explorer->Classify->Classifier (Functions - SOM)


Download ppt "Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision trees 5.Support Vector Machine."

Similar presentations


Ads by Google