Computer Vision Lecture 8 Performance Evaluation.

Computer Vision Lecture 8 Performance Evaluation

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 1 This Lecture Estimates of random variables and confidence intervals –Probability estimation –Parameter estimation Receiver operating characteristic –Theory –Experiment Training and testing –Bias due to re-testing –Leave-one-out

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 2 Experiment We develop a vision system that detects whether it is safe to cross the street. We test it 100 times and it works every time. What can we say about the probability of success?

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 3 Repeated Tosses of a Fair Coin We toss a coin 10 times –How many heads? –Numerical experiment Function RAND() in Excel produces random numbers uniformly distibuted from 0 to 1 Expression IF(RAND()>0.5,1,0) will produce 0 and 1 50% of the time Number of 1’s in 10 trials: 7, 6, 6, 6, 7

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 4 Theory The number of 1’s in n trials is given by P(k, n, p) =C(k, n) p k (1-p) n-k where p is the probability of a 1 and C(k, n) = n!/(k!(n-k)!) is the binomial coefficient. P(5, 10, 0.5) = 0.246 P(6, 10, 0.5) = 0.205

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 6 Parameter Estimation We perform an experiment with a binary outcome n times and obtain k successes. What can we say about the probability of error? The expected number of successes is np. The standard deviation of the number of successes is [np(1-p)] 0.5 The usual estimate of the probability is p^=k/n What range of values could have produced this result?

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 7 Cumulative Probability Plot P(35)=0.027, P(54) = 0.971. The interval [0.35, 0.54] has a probability of 0.944. We call this the 95% confidence interval for estimating the probability of success

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 8 Experimental Design We need to perform an experiment to test the reliability of a collection of parts. The test is destructive: we want to test as few parts as possible. The test consists of examining n parts, and rejecting the collection if k fail. The test is designed using the binomial formula. The design is complicated because –We need to specify the type of test to use and what constitutes a success/failure, how many bad parts are we willing to accept –We need to specify how to select the parts to be tested –There are two types of errors: we need to decide acceptable levels for both

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 9 Two-Category Test We perform a test on a patient, and get a measurement x. There are two possibilities: the patient is healthy or sick. The density functions for the two possibilities are shown below. We chose a threshold t, and decide that the patient is sick if the value x is higher than t. The two probabilities are called P(False Positive) and P(True Positive) t

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 11 Evaluating the ROC The ROC lies in the unit square. Ideally, the curve should go up from (0, 0) to (0, 1), and then horizontally to (1,1). Rather than evaluating one combination of the two probabilities, it is desirable to measure the whole curve. A single measure of quality is given by A z, the area under the operating curve. –On the right the red curve has A z = 1, the blue curve has A z = 0.89

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 12 Experimental Results In any experiment with actual data, the performance, such as the error probability or A z should be treated as a random variable. The evaluation results should consist not only of the performance estimate but also a confidence interval.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 13 Performance of Classifiers In the last lecture, we considered the performance of classifiers, such as neural networks. These elements contain a number of parameters whose values must be chosen to obtain the correct operation. –For example, if the input to the classifier is one- dimensional, and the classifier uses a threshold to discriminate between two categories, we need to find the threshold from the training data.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 14 Training and Evaluation The quality of performance depends on the accuracy with which we estimate the parameters. Suppose we are measuring the error rate. We will represent the error rate obtainable with the best values of the classifier parameters by P min –To measure this probability we need to have a very large collection of samples, so the confidence interval of the estimate is very small –If we do not use the right values of parameters, then the probability will be greater than P min

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 15 Training Error We find the classifier parameters by examining some training samples and adjusting the classifier to correctly identify these data. –With a finite sample, the values we get will be a random variable. Since these will differ from the best values, the performance with these parameters will be worse than with the best parameters. To avoid this we need a large training set. –If we test the classifier with the same sample as that used for training, then the performance may seem to be better than the true performance. This is an optimistic bias, and can be very large.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 16 Independent Testing To avoid the bias of testing on the same data as training, it is necessary to divide the available data into two groups: A training set and a test set. One set is used for training and the other for testing. If the total number of available data is N, then one can use N 1 data items for training and N 2 items for testing, with N = N 1 + N 2.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 18 Leave-One-Out One way of ameliorating this problem is to use the leave-one-out design –We exclude one data item, and train the system on N-1 items –Test the system on the remaining item –Repeat this for each item in the set This produces training sets with N 1 = N-1 elements and each test is independent of the training set. –The tests are not independent, so that confidence interval estimation is complicated.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 19 Summary Complex classifiers can be designed to correctly identify large data sets. However, they may not perform well on data they have not encountered –This is called a generalization problem To obtain a valid evaluation of performance, one must use independent training and test data. The complexity of a classifier depends not only on the problem (data distribution) but also on the size of the training set. –A classifier with very many parameters may perform poorly when trained with a small set because the parameters are not estimated accurately.

Computer Vision Lecture 8 Performance Evaluation.

Similar presentations

Presentation on theme: "Computer Vision Lecture 8 Performance Evaluation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Vision Lecture 8 Performance Evaluation.

Similar presentations

Presentation on theme: "Computer Vision Lecture 8 Performance Evaluation."— Presentation transcript:

Similar presentations

About project

Feedback