Presentation on theme: "STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!"— Presentation transcript:
1STATISTICAL ANALYSIS.Your introduction to statistics should not be like drinking water from a fire hose!!
2What do you mean by data?? Nature of the Data Two main types: categorical or continuous1. Categorical:Nominal (unordered, unequal categories)E.g.: Female=1 and male=2Ordinal (ordered unequal or ranked categories)E.g.: 1=SD 2=D 3=N 4=A 5=SA2. Continuous:Interval (ordered, equal intervals, no zero)E.g.: 5-point Likert scale with equal intervals or IQ scoreRatio (ordered, equal intervals with absolute zero)E.g.: raw scores, class attendance (in days); age (in years)Descriptive statistics: Procedures used for summarizing the data in both numerical and graphic form. Includes, frequencies, distributions, percents, cumulative percents, pie charts, bar graphs (histograms) and scatter plots.(Cross-tabulations: summarizes relationships between two variables like a scatter plot but in a table form.)Measures of central tendency:Mean: arithmetic average (interval & ratio data only)Mode: most frequent; can be bimodal or multimodal (all types)Median: mid point with equal half above and below; (ordinal, interval and ration)
3Statistics 101!! Statistics Measures of location—mean vs. median and whyMeasures of scale—range, interquartile range, standard deviation (and variance)Measures of position—percentiles, deciles, quartiles, medianNote. For categorical variables, we use proportions as the descriptive statistics
4Why does lack of normality cause problems? When we calculate the p-value for an inference test, we find the probability that the sample was different due to sampling variability. Basically, we are trying to see if a recorded value occurred by chance and chance alone. When we look for a p-value, we are assuming that all samples of the given sample size are normally distributed around the mean. This is why the test statistic, which is the number of standard deviations away from the population mean the sample mean is, is able to be used. Therefore, without normality, no p-value can be found.
5Goal for Parametric Test Non-Parametric Test There are non-parametric tests which are similar to the parametric tests. The following table shows how some of the tests match up.Parametric TestGoal for Parametric TestNon-Parametric TestGoal for Non-Parametric TestTwo Sample T-TestTo see if two samples have identical population meansWilcoxon Rank-Sum TestTo see if two samples have identical population mediansOne Sample T-TestTo test a hypothesis about the mean of the population a sample was taken fromWilcoxon Signed Ranks TestTo test a hypothesis about the median of the population a sample was taken fromChi-Squared Test for Goodness of FitTo see if a sample fits a theoretical distribution, such as the normal curveKolmogorov-Smirnov TestTo see if a sample could have come from a certain distributionANOVATo see if two or more sample means are significantly differentKruskal-Wallis TestTo test if two or more sample medians are significantly different
6What is different about Non-Parametric Statistics? Sometimes statisticians use what is called “ordinal” data. This data is obtained by taking the raw data and giving each sample a rank. These ranks are then used to create test statistics.In parametric statistics, one deals with the median rather than the mean. Since a mean can be easily influenced by outliers or skewness, and we are not assuming normality, a mean no longer makes sense. The median is another judge of location, which makes more sense in a non-parametric test. The median is considered the center of a distribution.
7Drawing a histogram..the good the bad and the downright ugly!!. Many modern introductory texts and confuse frequency graphs, relative frequency graphs, and histograms.BadGood
8What's the difference between a bar chart & a Histogram??
9Critical ValuesFor a given number of degrees of freedom, by the property of the t-distribution, we know how large the t-statistic must be in order to reject the null.We call that number the “critical value” of the t-statistic and is typically determined by the values in a table of the t-statistic.If the value of the t-statistic calculated from the data is greater than this critical value, then we “reject the null hypothesis.”- This is because, for t-statistics greater than this critical value, our probability of falsely rejecting the null hypothesis is very small.
10Example Suppose our null hypothesis is that X is less than 0. The sample mean is 3;The sample standard deviation is 2;There are 121 observations.Step 1. We need to establish our “critical value.”We wish to reject the null hypothesis if we are 95% certain that it is false. For 121 observations and a “one-tailed test,” the critical value is 1.66 (which we look up on the table. This corresponds to a significance level of .05 with 120 degrees of freedom).Step 2. The t-statistic = ( 3 – 0 ) / ( 2 / 121 ) 3 / .18 16.7.Step 3. Compare the t-statistic with the critical value. If the t-statistic is greater than the critical value, then you can reject the null hypothesis.In this case, 16.7 is greater than 1.66, so we can reject the null hypothesis that X is less than zero.
11Example The table to the right is a sample “cross-tab” Your research hypothesis is that dog ownership and gender are related.How do you test this hypothesis?Dog-OwnersNo PetsTotalsMen100400500Women504501508501,000
12Hypothesis Tests about tables Step 1. Define null and research hypotheses.The null hypothesis will usually be that there is no relationship between the rows and the columns.Step 2. Determine your tolerance for falsely rejecting the null hypothesis of no relationship.Step 3. Empirically analyse the data to determine if there is a relationship.
13Example To calculate independence: 1) Identify the number of respondents in each internal cell of the table2) Calculate the number of respondents who would be in each cell if independent (corresponds to the second number under each total)e.g. cell1,1 = .5 * .15 *1000 = 75cell1,2 = .5 * .85 *1000 = 4253) Compute the chi-squared test statistic (next slide)Dog-OwnersNo PetsTotalsMen100( 75 )400( 425 )500Women504501508501,0001.00
14The Chi-Square Test Statistic To calculate independence:3) Compute the chi-squared test statisticThe chi-squared test statistic is simply:2 = rowscolumns (Observedrow,column - Expectedrow,column)2Expectedrow,columnThe chi-squared statistic follows a chi-squared distribution with degrees of freedom = (rows – 1) (columns – 1).
15ExampleIf we look at our table of the 2 with 1 degrees of freedom, the critical value for our test statistic is 3.84.2 = ( )2 / 75+( )2 / 425+ (50- 75)2 / 75+ ( )2 / 425=19.6In this case, we reject the null hypothesis that the two populations are statistically independent because our test-statistic is greater than our critical value.Dog-OwnersNo PetsTotalsMen100(75)400(425)500Women504501508501,000