Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.

Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Credibility: Evaluating What’s Been Learned
Issues: training, testing, tuning Predicting performance Holdout, cross-validation, bootstrap Comparing schemes: the t-test Predicting probabilities: loss functions Cost-sensitive measures Evaluating numeric prediction The Minimum Description Length principle

Evaluating Numeric Prediction
Same strategies: independent training, validation and test sets, significance tests, etc. (avoid cross-validation and bootstrapping for reporting) Difference: error measures Actual target values: y1 y2 …yN Predicted target values: y^1 y^2 … y^N Most popular measure: mean-squared error Easy to manipulate mathematically

Other Measures The root mean-squared error :
The mean absolute error is less sensitive to outliers than the mean-squared error: Sometimes relative error values are more appropriate (e.g. 10% for an error of 50 when predicting 500)

Improvement on the Mean
How much does the scheme improve on simply predicting the average? The relative squared error is: Root relative squared error Relative absolute error

Correlation Coefficient
Measures the statistical correlation between the predicted values and the actual values Pearson product-moment correlation coefficient, rho Scale independent, between –1 and +1 Good performance leads to large values

Pearson product-moment correlation coefficient
Examples of scatter diagrams with different values of correlation coefficient (ρ)

Pearson product-moment correlation coefficient
Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). Note: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero.

Which Measure? Best to look at all of them Often it doesn’t matter
Student Q: In what situations would we want to use the correlation coefficient as a performance measure for numeric prediction? Best to look at all of them Often it doesn’t matter Example: 0.91 0.89 0.88 Correlation coefficient 30.4% 34.8% 40.1% 43.1% Relative absolute error 35.8% 39.4% 57.2% 42.2% Root rel squared error 29.2 33.4 38.5 41.3 Mean absolute error 57.4 63.3 91.7 67.8 Root mean-squared error D C B A D best C second-best A, B arguable

Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.

Similar presentations

Presentation on theme: "Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.

Similar presentations

Presentation on theme: "Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these."— Presentation transcript:

Similar presentations

About project

Feedback