Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.

Last lecture summary Naïve Bayes Classifier

Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated from the data)

learning prior – A hundred independently drawn training examples will usually suffice to obtain a reasonable estimate of P(Y). larning likelihood – The Naïve Bayes Assumption: Assume that all features are independent given the class label Y.

Example – Play Tennis

Example – Learning Phase OutlookPlay=YesPlay=No Sunny 2/93/5 Overcast 4/90/5 Rain 3/92/5 TemperaturePlay=YesPlay=No Hot 2/92/5 Mild 4/92/5 Cool 3/91/5 HumidityPlay=YesPlay=No High 3/94/5 Normal 6/91/5 WindPlay=YesPlay=No Strong 3/93/5 Weak 6/92/5 P(Play=Yes) = 9/14P(Play=No) = 5/14 P(Outlook=Sunny|Play=Yes) = 2/9

Last lecture summary Binary classifier performance

TP, TN, FP, FN Precision, Positive Predictive Value (PPV) TP / (TP + FP) Recall, Sensitivity, True Positive Rate (TPR), Hit rate TP / P = TP/(TP + FN) False Positive Rate (FPR), Fall-out FP / N = FP / (FP + TN) Specificity, True Negative Rate (TNR) TN / (TN + FP) = 1 - FPR Accuracy (TP + TN) / (TP + TN + FP + FN)

Neural networks (new stuff)

Biological motivation The human brain has been estimated to contain (~10 11 ) brain cells (neurons). A neuron is an electrically excitable cell that processes and transmits information by electrochemical signaling. Each neuron is connected with other neurons through the connections called synapses. A typical neuron possesses a cell body (often called soma), dendrites (many, mm), and an axon (one, 10 cm – 1 m).

Synapse permits a neuron to pass an electrical or chemical signal to another cell. Synapse can be either excitatory, or inhibitory. Synapses are of different strength (the stronger the synapse is, the more important it is). The effects of synapses cumulate inside the neuron. When the cumulative effect of synapses reaches certain threshold, the neuron gets activated, the signal is sent to the axon, through which the neuron is connected to other neuron(s).

Neural networks for applied science and engineering, Samarasinghe

Warren McCulloch Walter Pitts 1899 - 19691923 - 1969 Threshold neuron

1 st mathematical model of neuron – McCulloch & Pitts binary (threshold) neuron – only binary inputs and output – the weights are pre-set, no learning x1x2t 0.20.30 0.20.80 0.20 1.00.81

x1x2t 0.20.30 0.20.80 0.20 1.00.81

Heavyside (threshold) activation function

Perceptron (1957) Frank Rosenblatt Developed the learning algorithm. Used his neuron (pattern recognizer = perceptron) for classification of letters.

Multiple output perceptron for multicategory (i.e. more than 2 classes) classification one output neuron for each class input layer output layer single layer (one-layered) vs. double layer (two-layered)

Learning

requirements for the minimum Gradient grad is a vector pointing in the direction of the greatest rate of increase of the function We want to decline, we take -grad.

Delta rule

error gradient

To find a gradient, differentiate the error E with respect to w 1 : According to the delta rule, weight change is proportional to the negative of the error gradient: New weight:

β is called a learning rate. It determines how far along the gradient it is necessary to move.

the new weight after i th iteration

This is an iterative algorithm, one pass through training set is not enough. One pass of the whole training data set is called an epoch. Adjusting the weights after each input pattern presentation (iteration) is called example-by- example (online) learning. – For some problems this can cause weights to oscillate – adjustment required by one pattern may be canceled by the next pattern. – More popular is the next method.

Batch learning – wait until all input patterns (i.e. epoch) have been processed and then adjust weights in the average sense. – More stable solution. – Obtain the error gradient for each input pattern – Average them at the end of the epoch – Use this average value to adjust the weights using the delta rule

Perceptron failure Please, help me and draw on the blackboard following functions: – AND, OR, XOR (eXclusive OR, true when exactly one of the operands is true, otherwise false) 0 1 0 1 0 1 0 1 0 1 0 1 AND ORXOR ???

Perceptron uses linear activation function, so only linearly separable problems can be solved. 1969 – famous book “Perceptrons” by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. They conjectured (incorrectly !) that a similar result would hold for a perceptron with three or more layers. The often-cited Minsky/Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s.

Play with http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html

Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.

Similar presentations

Presentation on theme: "Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.

Similar presentations

Presentation on theme: "Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated."— Presentation transcript:

Similar presentations

About project

Feedback