CSE 5331/7331 Fall 2011 P-Value and Statistical Significance Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.

CSE 5331/7331 Fall 2011 P-Value and Statistical Significance Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University. CSE 5331/7331 F‘11 1

Outline Overview Overview P-Value P-Value Statistical Significance Statistical Significance Test Statistics Test Statistics Examples Examples 2CSE 5331/7331 F‘11

Data Mining - Remember When we analyze data we may only be dealing with a sample of the complete data (which may be infinite) When we analyze data we may only be dealing with a sample of the complete data (which may be infinite) We often want to generalize our findings to the entire data population We often want to generalize our findings to the entire data population Calculation of a P-value can help us to determine how likely our results are to apply to the entire population Calculation of a P-value can help us to determine how likely our results are to apply to the entire population How do we do this? How do we do this? 4CSE 5331/7331 F‘11

Normal Distribution 5CSE 5331/7331 F‘11 Will G. Hopkins, “A New View of Statistics; P Values and Statistical Significance,” 2002, http://sportsci.org/resource/stats/pvalues.htmlhttp://sportsci.org/resource/stats/pvalues.html

But what are we comparing? Two different results: Two different results: –Employment rate for MSCS vs MBA –Scores on a standardized test between male and female or USA and another country. –Classification by two different classifiers Sometimes a correlation coefficient or a test statistic whose distribution is known (such as chi squared) is examined. Sometimes a correlation coefficient or a test statistic whose distribution is known (such as chi squared) is examined. 6CSE 5331/7331 F‘11

Correlation Coefficient When we look at Statistical Significance we will be comparing two values. The figure below shows the probability coefficient values in a sample of size 20 when the correlation is 0 (that is no correlation).* When we look at Statistical Significance we will be comparing two values. The figure below shows the probability coefficient values in a sample of size 20 when the correlation is 0 (that is no correlation).* 7CSE 5331/7331 F‘11 * Will G. Hopkins, “A New View of Statistics,” 2002, http://www.sportsci.org/resource/stats/pvalues.html

Statistics Assumptions Sample data is representative of the entire population Sample data is representative of the entire population Sample data is randomly chosen Sample data is randomly chosen These assumptions often (usually) do not hold for a given real world sample (training data). These assumptions often (usually) do not hold for a given real world sample (training data). –Real world data format is unknown. –May wish to extrapolate to a larger population 8CSE 5331/7331 F‘11

Hypotheses Alternative Hypothesis (H A ): This is the relationship between the variables that you expect (hope) your experiments will demonstrate. Alternative Hypothesis (H A ): This is the relationship between the variables that you expect (hope) your experiments will demonstrate. Null Hypothesis (H 0 ). This is just the opposite relationship. There is no relationship between the variables. Null Hypothesis (H 0 ). This is just the opposite relationship. There is no relationship between the variables. In significance testing we really determine whether we should reject the null hypothesis. In significance testing we really determine whether we should reject the null hypothesis. 9CSE 5331/7331 F‘11

P-Value The probability that a variable has a value greater than the observed value The probability that a variable has a value greater than the observed value http://en.wikipedia.org/wiki/P-value http://en.wikipedia.org/wiki/P-value http://en.wikipedia.org/wiki/P-value http://sportsci.org/resource/stats/pvalue s.html http://sportsci.org/resource/stats/pvalue s.html http://sportsci.org/resource/stats/pvalue s.html http://sportsci.org/resource/stats/pvalue s.html 11CSE 5331/7331 F‘11

P-Value “ The probability that a variate would assume a value greater than or equal to the observed value strictly by chance.”* “ The probability that a variate would assume a value greater than or equal to the observed value strictly by chance.”*probability “the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.”** “the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.”** Smaller the value the better (You want to reject the Null Hypothesis. Smaller the value the better (You want to reject the Null Hypothesis. * Weisstein, Eric W. "P-Value." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/P-Value.html, 9/21/11. Weisstein, Eric W.MathWorld http://mathworld.wolfram.com/P-Value.html Weisstein, Eric W.MathWorld http://mathworld.wolfram.com/P-Value.html ** Wikipedia, “P-value”, http://en.wikipedia.org/wiki/P-value, 9/19/11. http://en.wikipedia.org/wiki/P-value /12CSE 5331/7331 F‘11

Finding P-value May be able to calculate P-value directly or you may have to convert the data (by using a test statistic) into a value that can be used. May be able to calculate P-value directly or you may have to convert the data (by using a test statistic) into a value that can be used. Example: Find correlation between two variables. The correlation coefficient can be used directly if we know its distribution. Example: Find correlation between two variables. The correlation coefficient can be used directly if we know its distribution. However, in some cases a new statistic is used to provide that P-value. However, in some cases a new statistic is used to provide that P-value. 13CSE 5331/7331 F‘11

Another Way to Look at It P-value is a measure of how much evidence you have against the Null Hypothesis. P-value is a measure of how much evidence you have against the Null Hypothesis. Null Hypothesis: Hypothesis of no change Null Hypothesis: Hypothesis of no change Critical Regions – values of statistics for which, if they occur you will reject the Null Hypothesis Critical Regions – values of statistics for which, if they occur you will reject the Null Hypothesis 14CSE 5331/7331 F‘11

Confidence Intervals Range of values that contains the true parameter. Range of values that contains the true parameter. Size depends on sample size and variance. Size depends on sample size and variance. Larger sample size, smaller interval Larger sample size, smaller interval If the variance is large, larger interval. If the variance is large, larger interval. 15CSE 5331/7331 F‘11

P-Value and Confidence Intervals Confidence Intervals and P-Values are related. Confidence Intervals and P-Values are related. However, 95% confidence interval However, 95% confidence interval is not the same thing as P=0.05 It is the same as a P-value of 0.05 It is the same as a P-value of 0.05 “that doesn’t overlap zero.” [3] “If the 95% CI includes no “If the 95% CI includes no difference between groups, then the P values is > 0.05.” [1] “If the 95% CI does not include “If the 95% CI does not include no difference between groups then the P value is < 0.05.” [1] 16CSE 5331/7331 F‘11 * Will G. Hopkins, “A New View of Statistics,” 2002, http://www.sportsci.org/resource/stats/pvalues.html 95% confidence intervals

So … So, if we obtain a data value from a population distribution that is normal AND we know that it occurs in one of the 0.2% wings of the distribution, than we should be somewhat convinced that that value is not as likely (although still possible) to have seen as the more commonly occurring data values. So, if we obtain a data value from a population distribution that is normal AND we know that it occurs in one of the 0.2% wings of the distribution, than we should be somewhat convinced that that value is not as likely (although still possible) to have seen as the more commonly occurring data values. How sure are you that the results you’ve found in the experiments are actually true??? How sure are you that the results you’ve found in the experiments are actually true??? 18CSE 5331/7331 F‘11

Statistical Significance Way to assign a confidence value to a finding. Way to assign a confidence value to a finding. Probability that a finding is true in the general population and not a fluke (not due to chance or random). Probability that a finding is true in the general population and not a fluke (not due to chance or random). A significance level is a measure as to how likely the result is due to chance. A significance level is a measure as to how likely the result is due to chance. P-value is a test or measure of statistical significance. P-value is a test or measure of statistical significance. What is probability that relationship between two variables exists? What is probability that relationship between two variables exists? What is the probability that this relationship (our experiments seem to indicate exists) is due only to random chance? What is the probability that this relationship (our experiments seem to indicate exists) is due only to random chance? 19CSE 5331/7331 F‘11

Using Significance Test State Alternative Hypothesis State Alternative Hypothesis State Null Hypothesis. State Null Hypothesis. Perform research Perform research Identify statistic and its distribution Identify statistic and its distribution Decide on alpha threshold Decide on alpha threshold Calculate statistic Calculate statistic Calculate P-Value Calculate P-Value Compare P-Value to threshold Compare P-Value to threshold –If lower, probability is small that the result was by chance therefore the finding is significant –If higher, then finding is not significant 20CSE 5331/7331 F‘11

Probability of Error Alpha Level (Threshold) Alpha Level (Threshold) Threshold value to make decision Threshold value to make decision Rule of thumb: 0.05 Rule of thumb: 0.05 Since you hope to reject the Null Hypothesis, then you hope you’ll find p < 0.05. Since you hope to reject the Null Hypothesis, then you hope you’ll find p < 0.05. This means you are beyond (outside in the distribution) the Alpha level and your chance of a Type I error is acceptable. This means you are beyond (outside in the distribution) the Alpha level and your chance of a Type I error is acceptable. 21CSE 5331/7331 F‘11

Significance Levels Rule of thumb: 0.95 – The result has a 95% chance of being true. Rule of thumb: 0.95 – The result has a 95% chance of being true. However, results are being stated in terms of not being true. (i.e. P-value = 1 – Level )So then a 0.05 value is good. However, results are being stated in terms of not being true. (i.e. P-value = 1 – Level )So then a 0.05 value is good. So p<0.005 indicates a significant result. So p<0.005 indicates a significant result. The smaller the p value the better. P=0.001 is better than p=0.05. The smaller the p value the better. P=0.001 is better than p=0.05. If we know what the distribution of the statistic is, then we can estimate the extreme areas outside (furthest from the mean). This can tell us the significance If we know what the distribution of the statistic is, then we can estimate the extreme areas outside (furthest from the mean). This can tell us the significance 22CSE 5331/7331 F‘11

Sample P-Value Levels Suppose your threshold is P=0.05 Suppose your threshold is P=0.05 Table from GraphPad, “Interpreting statistical significance,” 1999, http://www.graphpad.com/articles/interpret/principles/stat_sig.htm. Table from GraphPad, “Interpreting statistical significance,” 1999, http://www.graphpad.com/articles/interpret/principles/stat_sig.htm.http://www.graphpad.com/articles/interpret/principles/stat_sig.htm 23CSE 5331/7331 F‘11 P-ValueWording > 0.05Not Significant 0.01 to 0.05Significant 0.001 to 0.01Very Significant < 0.001Extremely Significant

DM &Significance Tests In MOST data mining experiments we want to determine if the results we found in the sample can be generalized to the general population. In MOST data mining experiments we want to determine if the results we found in the sample can be generalized to the general population. So the two variables being compared are the one found in the sample and the one that should be found in the population (if we could even get the entire population set). So the two variables being compared are the one found in the sample and the one that should be found in the population (if we could even get the entire population set). 24CSE 5331/7331 F‘11

Two-Tailed vs One-Tailed Whether the test assumes both sides of the statistics distribution Whether the test assumes both sides of the statistics distribution If direction of difference or relationship is important than a one-tailed probability is include – otherwise two-tailed. If direction of difference or relationship is important than a one-tailed probability is include – otherwise two-tailed. 25CSE 5331/7331 F‘11 Will G. Hopkins, “A New View of Statistics; P Values and Statistical Significance,” 2002, http://sportsci.org/resource/stats/pvalues.html http://sportsci.org/resource/stats/pvalues.html

Types of Errors Type I: Conclude relationship exists when in fact it doesn’t and evidence shows that it doesn’t. Type I: Conclude relationship exists when in fact it doesn’t and evidence shows that it doesn’t. – Null hypothesis should be accepted. –Alpha error Type II: Conclude relationship does not exist when in fact it does and the evidence shows it does. Type II: Conclude relationship does not exist when in fact it does and the evidence shows it does. –Null hypothesis should be rejected. –Beta error To be on safe side, try to minimize Type I errors. To be on safe side, try to minimize Type I errors. –Use high Alpha probability –Alpha level is the probability of making an error that you are willing to accept. –Often Alpha=0.05 or even 0.01. 26CSE 5331/7331 F‘11

Test Statistics So how do we calculate P-Value? So how do we calculate P-Value? We need to convert the raw data generated by experiments into a single value which we know follows some known frequency distribution. We need to convert the raw data generated by experiments into a single value which we know follows some known frequency distribution. We can then calculate the P-Value based on that We can then calculate the P-Value based on that These are called Test Statistics These are called Test Statistics 28CSE 5331/7331 F‘11

Standard Test Statistics Common Test Statistics Common Test Statistics http://en.wikipedia.org/wiki/Statistical_hypothesis_testing T – Compares distributions of means of two groups T – Compares distributions of means of two groups http://www.socialresearchmethods.net/kb/stat_t.php http://mathworld.wolfram.com/Studentst-Distribution.html http://www.stattools.net/tTest_Tab.php Conversion between Conversion between http://www.graphpad.com/quickcalcs/pvalue1.cfm 29CSE 5331/7331 F‘11

Chi Square Test Experiments may actually general nominal or ordinal data. Experiments may actually general nominal or ordinal data. How do you convert these values into numbers that can be evaluated? How do you convert these values into numbers that can be evaluated? Often a Chi Square Test is used. Here the collected data is summarized and then converted into a statistic that can be used to measure how the values compare to what would be expected randomly. Often a Chi Square Test is used. Here the collected data is summarized and then converted into a statistic that can be used to measure how the values compare to what would be expected randomly. 30CSE 5331/7331 F‘11

Chi Square Distribution Not normal but approaches normal as degrees of freedom (number of indepdent variables) increases. Not normal but approaches normal as degrees of freedom (number of indepdent variables) increases. http://stattrek.com/lesson3/chisquare.aspx 31CSE 5331/7331 F‘11

Chi Square Test Process Complete contingency table based on two subset types and possible values. Complete contingency table based on two subset types and possible values. Even though actual values are not numeric, when we complete the table we then have counts (frequencies) which are numeric and can be analyzed. Even though actual values are not numeric, when we complete the table we then have counts (frequencies) which are numeric and can be analyzed. The Chi Squared Statistic is calculated by comparing the observed frequency values to the expected frequency values. The Chi Squared Statistic is calculated by comparing the observed frequency values to the expected frequency values. This statistic is what is examined for significance. This statistic is what is examined for significance. It follows a chi squared distribution. It follows a chi squared distribution. Note: http://epm.sagepub.com/content/52/1/57.short Note: http://epm.sagepub.com/content/52/1/57.shorthttp://epm.sagepub.com/content/52/1/57.short 32CSE 5331/7331 F‘11

Chi Squared Example Suppose we compare the hiring rate for recent MS CS graduates to MBA graduates. We find that 150 out of 200 MS students have a job and 100 out of 200 MBA students have a job. Suppose we compare the hiring rate for recent MS CS graduates to MBA graduates. We find that 150 out of 200 MS students have a job and 100 out of 200 MBA students have a job. Hypothesis: MSCS students are hired at a higher rate than MBA students. Hypothesis: MSCS students are hired at a higher rate than MBA students. Null Hypothesis: MBA and MSCS students are hired at the same rate. Null Hypothesis: MBA and MSCS students are hired at the same rate. Suppose we use p=0.05 as significance level. Suppose we use p=0.05 as significance level. 33CSE 5331/7331 F‘11

Chi Squared Example (cont’d) 250 students have jobs so 250/400=0.625 250 students have jobs so 250/400=0.625 150 do not have jobs, so 150/400=0.375 150 do not have jobs, so 150/400=0.375 We would expect these percentages to hold for each population if the Null Hypothesis is true. We would expect these percentages to hold for each population if the Null Hypothesis is true. 150/200=0.75 MSCS have jobs 150/200=0.75 MSCS have jobs 100/200=0.5 MBA have jobs 100/200=0.5 MBA have jobs We would expect 0.625 * 200 = 125 in each group to have jobs We would expect 0.625 * 200 = 125 in each group to have jobs 34CSE 5331/7331 F‘11 MSCSMBATotal Job150100250 No Job50100150 Total200 400

Chi Squared Example (cont’d) Calculate Chi Square Calculate Chi Square http://www.graphpad.com/quickcalcs/contingency1.cfm http://www.graphpad.com/quickcalcs/contingency1.cfm http://www.graphpad.com/quickcalcs/contingency1.cfm Chi Square value is 26.67 Chi Square value is 26.67 How do we convert this to a P-Value? How do we convert this to a P-Value? http://www.statsoft.com/textbook/distribution-tables/#chi http://www.statsoft.com/textbook/distribution-tables/#chi http://www.statsoft.com/textbook/distribution-tables/#chi P-Value is <0.05. P-Value is <0.05. Reject Null Hypothesis Reject Null Hypothesis Yes it is significant Yes it is significant (Actually p<0.0001 so it is extremely significant) 35CSE 5331/7331 F‘11 http://course1.winona.edu/sberg/Equation/chi-squa.gif

Example 1 – Iris Data J48 Default Cross Validation J48 Default Cross Validation We have the Kappa statistic but can’t use it to calculate P-value without variance. We have the Kappa statistic but can’t use it to calculate P-value without variance. http://twiki.org/p/pub/Main/SigurdurRunarSaemundsson/Interrater_ag reement.Kappa_statistic.pdf http://twiki.org/p/pub/Main/SigurdurRunarSaemundsson/Interrater_ag reement.Kappa_statistic.pdf 37CSE 5331/7331 F‘11

Example 1 – Iris Data 38CSE 5331/7331 F‘11 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.98 0 1 0.98 0.99 0.99 Setosa 0.94 0.03 0.94 0.94 0.94 0.952 Versicolor 0.96 0.03 0.941 0.96 0.95 0.961 Verginica Weighted Avg. 0.96 0.02 0.96 0.96 0.96 0.968 === Confusion Matrix === a b c <-- classified as 49 1 0 | a = Setosa 0 47 3 | b = Versicolor 0 2 48 | c = Verginica View confusion matrix as contingency Table and Calculate Chi Squared

Example 1 (cont’d)  http://faculty.vassar.edu/lowry/newcs.html http://faculty.vassar.edu/lowry/newcs.html  Chi Square = 266  P < 0.0001  Extremely statistically significant 39CSE 5331/7331 F‘11

Example 2 Nektarios Leontiadis, Tyler Moore and Nicolas Christin. "Measuring and Analyzing Search-Redirection Attacks in the Illict Online Prescription Drug Trade". 20th USENIX Security Symposium. August 10-12, 2011: San Francisco, Ca. 20th USENIX Security Symposium 20th USENIX Security Symposium http://cs.wellesley.edu/~tmoore/usenix11.pdf 40CSE 5331/7331 F‘11

References 1. American College of Physicians- American Society of Internal Medicine, “Primer on 95% Confidence Intervals,” Effective Clinical Practice, September/October 2001, Vol 4, No 6, pp 229-231. 2. Will G. Hopkins, “A New View of Statistics,” 2002, http://www.sportsci.org/resource/stats/pvalues.html http://www.sportsci.org/resource/stats/pvalues.html 3. M. A. Saint-Germain, “PPA 696 Research Methods,” 1/1/01, http://www.csulb.edu/~msaintg/ppa696/696menu.htm http://www.csulb.edu/~msaintg/ppa696/696menu.htm 41CSE 5331/7331 F‘11

CSE 5331/7331 Fall 2011 P-Value and Statistical Significance Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.

Similar presentations

Presentation on theme: "CSE 5331/7331 Fall 2011 P-Value and Statistical Significance Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 5331/7331 Fall 2011 P-Value and Statistical Significance Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.

Similar presentations

Presentation on theme: "CSE 5331/7331 Fall 2011 P-Value and Statistical Significance Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University."— Presentation transcript:

Similar presentations

About project

Feedback