Presentation is loading. Please wait.

Presentation is loading. Please wait.

Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.

Similar presentations


Presentation on theme: "Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated."— Presentation transcript:

1 Last lecture summary Naïve Bayes Classifier

2 Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated from the data)

3 learning prior – A hundred independently drawn training examples will usually suffice to obtain a reasonable estimate of P(Y). larning likelihood – The Naïve Bayes Assumption: Assume that all features are independent given the class label Y.

4 Example – Play Tennis

5 Example – Learning Phase OutlookPlay=YesPlay=No Sunny 2/93/5 Overcast 4/90/5 Rain 3/92/5 TemperaturePlay=YesPlay=No Hot 2/92/5 Mild 4/92/5 Cool 3/91/5 HumidityPlay=YesPlay=No High 3/94/5 Normal 6/91/5 WindPlay=YesPlay=No Strong 3/93/5 Weak 6/92/5 P(Play=Yes) = 9/14P(Play=No) = 5/14 P(Outlook=Sunny|Play=Yes) = 2/9

6 Example - Prediction x’=(Outl=Sunny, Temp=Cool, Hum=High, Wind=Strong) Look up tables P(Outl=Sunny|Play=No) = 3/5 P(Temp=Cool|Play=No) = 1/5 P(Hum=High|Play=No) = 4/5 P(Wind=Strong|Play=No) = 3/5 P(Play=No) = 5/14 P(Outl=Sunny|Play=Yes) = 2/9 P(Temp=Cool|Play=Yes) = 3/9 P(Hum=High|Play=Yes) = 3/9 P(Wind=Strong|Play=Yes) = 3/9 P(Play=Yes) = 9/14 P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206 Given the fact P(Yes| x ’) < P(No| x ’), we label x ’ to be “No”.

7 Last lecture summary Binary classifier performance

8 TP, TN, FP, FN Precision, Positive Predictive Value (PPV) TP / (TP + FP) Recall, Sensitivity, True Positive Rate (TPR), Hit rate TP / P = TP/(TP + FN) False Positive Rate (FPR), Fall-out FP / N = FP / (FP + TN) Specificity, True Negative Rate (TNR) TN / (TN + FP) = 1 - FPR Accuracy (TP + TN) / (TP + TN + FP + FN)

9

10 Neural networks (new stuff)

11 Biological motivation The human brain has been estimated to contain (~10 11 ) brain cells (neurons). A neuron is an electrically excitable cell that processes and transmits information by electrochemical signaling. Each neuron is connected with other neurons through the connections called synapses. A typical neuron possesses a cell body (often called soma), dendrites (many, mm), and an axon (one, 10 cm – 1 m).

12

13 Synapse permits a neuron to pass an electrical or chemical signal to another cell. Synapse can be either excitatory, or inhibitory. Synapses are of different strength (the stronger the synapse is, the more important it is). The effects of synapses cumulate inside the neuron. When the cumulative effect of synapses reaches certain threshold, the neuron gets activated, the signal is sent to the axon, through which the neuron is connected to other neuron(s).

14

15 Neural networks for applied science and engineering, Samarasinghe

16 Warren McCulloch Walter Pitts 1899 - 19691923 - 1969 Threshold neuron

17 1 st mathematical model of neuron – McCulloch & Pitts binary (threshold) neuron – only binary inputs and output – the weights are pre-set, no learning x1x2t 0.20.30 0.20.80 0.20 1.00.81

18 x1x2t 0.20.30 0.20.80 0.20 1.00.81

19 Heavyside (threshold) activation function

20

21

22 Perceptron (1957) Frank Rosenblatt Developed the learning algorithm. Used his neuron (pattern recognizer = perceptron) for classification of letters.

23

24 Multiple output perceptron for multicategory (i.e. more than 2 classes) classification one output neuron for each class input layer output layer single layer (one-layered) vs. double layer (two-layered)

25 Learning

26

27 requirements for the minimum Gradient grad is a vector pointing in the direction of the greatest rate of increase of the function We want to decline, we take -grad.

28 Delta rule

29 error gradient

30 To find a gradient, differentiate the error E with respect to w 1 : According to the delta rule, weight change is proportional to the negative of the error gradient: New weight:

31 β is called a learning rate. It determines how far along the gradient it is necessary to move.

32 the new weight after i th iteration

33 This is an iterative algorithm, one pass through training set is not enough. One pass of the whole training data set is called an epoch. Adjusting the weights after each input pattern presentation (iteration) is called example-by- example (online) learning. – For some problems this can cause weights to oscillate – adjustment required by one pattern may be canceled by the next pattern. – More popular is the next method.

34 Batch learning – wait until all input patterns (i.e. epoch) have been processed and then adjust weights in the average sense. – More stable solution. – Obtain the error gradient for each input pattern – Average them at the end of the epoch – Use this average value to adjust the weights using the delta rule

35

36 Perceptron failure Please, help me and draw on the blackboard following functions: – AND, OR, XOR (eXclusive OR, true when exactly one of the operands is true, otherwise false) 0 1 0 1 0 1 0 1 0 1 0 1 AND ORXOR ???

37 Perceptron uses linear activation function, so only linearly separable problems can be solved. 1969 – famous book “Perceptrons” by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. They conjectured (incorrectly !) that a similar result would hold for a perceptron with three or more layers. The often-cited Minsky/Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s.

38 Play with http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html


Download ppt "Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated."

Similar presentations


Ads by Google