Download presentation
Presentation is loading. Please wait.
Published byLeo Jacobs Modified over 8 years ago
1
Evaluating Hypotheses
2
Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy cover additional samples – When data is limited, what is the best way to use this data to both learn a hypothesis and estimate its accuracy
3
Motivation Evaluate the performance of learned hypotheses as precisely as possible – Whether to use the hypothesis – Evaluating hypotheses is an integral component of many learning method Estimate its future accuracy given only a limited set of data – Bias in the estimate – Variance in the estimate
4
Estimating Hypothesis Accuracy There is some space of possible instance X over which many target functions may be defined. A convenient way to model this is to assume there is some unknown probability distribution D that defines the probability of encountering each instance in X The learning task is to learn the target concept or target function f by considering a space H of possible hypotheses
5
Estimating Hypothesis Problem Given a hypothesis h and a data sample containing n examples drawn at random according to the distribution D, what is the best estimate of the accuracy of h over future instances drawn from the same distribution What is the probable error in this accuracy estimate
6
Sample Error and True Error The sample error (error s (h)) of a hypothesis h with respect to target function f and data sample S The true error (error D (h)) of hypothesis h with respect to target function f and distribution D, is the probability that h will misclassify an instance drawn at random according to D
7
Sample Error and True Error We want to know the true error (error D (h)) of the hypothesis because this is the error we can expect when applying the hypothesis to future examples The one we can measure is error S (h) How good an estimate of error D (h) is provided by error s (h)
8
Confidence Intervals for Discrete- Valued Hypothesis H is a discrete-valued hypothesis – The sample S contains n examples drawn independent of one another, and independent of h, according to the probability distribution D – n>=30 – Hypothesis h commits r errors over these n examples Under these conditions, statistical theory allows us to make the following assertions – Given no other information, the most probable value of error D (h) is error S (h) – With approximately 95% probability, the true error error D (h) lies in the interval
9
Example Suppose the data sample S contains n=40 examples and that hypothesis h commits r=12 error over this data. In this case, error S (h)=0.3 Given no other information, the best estimate of the true error error D (h)=0.3 If we were to collect a second sample S’ containing 40 new randomly drawn examples, we might expect the sample error error S’ (h) If we repeated this experiment over and over, each time drawing a new sample containing 40 new examples, we would find that for approximately 95% of these experiments, the calculated interval would contain the true error. We call this interval the 95% confidence interval estimate for error D (h) 0.30 (1.96*0.07).
10
Confidence Intervals for Discrete- Valued Hypothesis We can calculate the 68% confidence interval in this case to be 0.30 (1.00*0.07). It makes intuitive sense that the 68% confidence interval is smaller than the 95% confidence interval Confidence level N%50%68%80%90%95%98%99% Constant Z N :0.671.001.281.641.962.332.58
11
Basic of Sampling Theory A random variable A probability distribution The expect value The variance of a random variable The standard deviation The Binomial distribution The Normal distribution The estimation bias of Y A N% confidence interval
12
Random Variables A random variable is a function from the sample space of an experiment to the set of real numbers. That is, a random variable assigns a real number to each possible outcome. Suppose a coin is flipped 3 times. X(t): the random variable that equals the number of heads that appear when t is the outcome. X(HHH)=3 X(HHT)=X(HTH)=X(THH)=2 X(TTH)=X(THT)=X(HTT)=1 X(TTT)=0 12
13
Distribution of a Random Variable The distribution of a random variable X on a sample space S is the set of pairs (r, p(X=r)), where p(X=r) is the probability that X takes the value r. A distribution is usually described by specifying p(X=r) X(HHH)=3, X(HHT)=X(HTH)=X(THH)=2, X(TTH)=X(THT)=X(HTT)=1, X(TTT)=0 p(X=3)=1/8 p(X=2)=3/8 p(X=1)=3/8 p(X=0)=1/8 13
14
Expected Values The expected value of the random variable X(s) on the sample space S is equal to Example Let X be the number that comes up when a die is rolled. What is the expected value of X? 14
15
Variance and Standard Deviation Let X be a random variable on a sample space S. The variance of X, denoted by V(X), is The standard deviation of X, denoted by δ(X), is defined to be If X is a random variable on a sample space S, then V(X)=E(X 2 )-E(X) 2. If X and Y are two independent random variables on a sample space S, then V(X+Y)=V(X)+V(Y). Furthermore, if X i (i=1, 2, …, n) are pairwise independent random variables on S, then V(X 1 +X 2 +…+X n )=V(X 1 )+V(X 2 )+…+V(X n ). 15
16
Bernoulli Trials and Binomial Distribution Each performance of an experiment with 2 possible outcomes is called a Bernoulli trial. Bernoulli trials are mutually independent. Theorem The probability of exactly r successes in n independent Bernoulli trials, with probability of success p and probability of failure q=1-p, is C(n, r)p r q n-r. Binomial distribution b(r; n, p) = C(n, r)p r q n-r. A coin is biased so that the probability of heads is 2/3. What is the probability that exactly 4 heads come up when the coin is flipped 7 times, assuming that the flips are independent? C(7,4)(2/3) 4 (1/3) 3 =560/2187 16
17
The Binomial Distribution The general setting to which the Binomial distribution applies is – There is a base, or underlying, experiment whose outcome can be described by a random variable, say Y. – The probability that Y=1 on any single trial of the underlying experiment is given by some constant p, independent of the outcome of any other experiment – A series of n independent trials of the underlying experiment is performed. Let R denote the number of trials for which Y i =1 in this series of n experiments. – The probability that the random variable R will take on a specific value r is given by the Binomial distribution
18
The Binomial Distribution Expect Value – E[X]=np Variance – Var(X)=np(1-p) Standard Deviation
19
Error Estimation and Estimating Binomial Proportions Image that we run k such random experiments, measuring the random variables error S1 (h), error S2 (h), …, error Sk (h). As we allowed k to grow, the histogram would approach the Binomial distribution
20
The Binomial Distribution Estimating p from a random sample of coin tosses is equivalent to estimating error D (h). The probability p that a single random coin toss will turn up heads corresponds to the probability that a single instance drawn at random will be misclassified (p corresponds to error D (h)) Binomial distribution depends on the specific sample size n and the specific probability p or error D (h)
21
Error Estimation and Estimating Binomial Proportions Measuring the sample error is performing an experiment with a random outcome Collect a random sample S of n independently drawn instances from the distribution D, and then measure the sample error error S (h) Repeat this experiment many times. error Si (h) is a random variable
22
Estimators, Bias, and Variance If random variable error S (h) obeys a Binomial distribution, what is likely different between error S (h) and error D (h) We have – error S (h)=r/n – error D (h)=p Statisticians call error S (h) an estimator for the true error error D (h) Whether an estimator on average gives the right estimate.
23
Estimators, Bias, and Variance The estimation bias of an estimator Y for an arbitrary parameter p is E[Y]-p If the estimation is zero, we say that Y is an unbiased estimator of p. The average of many random values of Y generated by repeated random experiments converge toward p error S (h) is a Binomial distribution. Thus error S (h) is an unbiased estimator for error D (h) In order for error S (h) to give an unbiased estimate of error D (h), the hypothesis h and sample S must be chosen independently
24
Estimators, Bias, and Variance Example – N=40 – R=12 – Standard deviation of error S (h) is 2.9/40=0.07 In general, given r errors in a sample of n independently drawn test examples, the standard deviation of error S (h) is given by
25
Confidence Intervals Describe the uncertainty associated with an estimate is to give an interval within which the true value is expected to fall into this interval An N% confidence interval for some parameter p is an interval that is expected with probability N% to contain p error S (h) follows Binomial probability distribution. To derive a 95% confidence interval, we need only find the interval centered around the mean value error D (h), which is wide enough to contain 95%
26
Normal Distribution It is difficult to find the size of the interval that contains N% of the probability mass for Binomial distribution Sufficiently large examples sizes the Binomial distribution can be closely approximated by the Normal distribution Normal Distribution - Bell-shaped continuous distribution widely used in statistical inference A random variable X with mean and standard deviation is normally distributed if its probability density function is given by
27
The probability density function The probability that X will fall into interval (a, b) is given by The expected value, The variance of X is
28
68-95-99.7 Rule 68% of the data 95% of the data 99.7% of the data
29
Confidence Intervals If a random variable Y obeys a Normal distribution with mean and standard deviation , then the measured random value y of Y will fall into the following interval N% of the time The mean will fall into the following interval N% of the time
30
Confidence Intervals With 95% confidence, the value of random variable will lie in the two sided interval [-1.96,1.96]. Note that Z 0.95 =1.96 1. In estimating the standard deviation of error S (h),we have approximated error D (h) by error S (h) 2. The Binomial distribution has been approximated by the Normal Distribution
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.