 Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.

Presentation on theme: "Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution."— Presentation transcript:

Statistics II: An Overview of Statistics

Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution Hypothesis Testing Type I and Type II Errors Linking Z to, alpha, and hypothesis testing. Bivariate Measures of Association Bivariate Regression/Correlation

The Normal Distribution

The standard normal distribution: a bell-shaped symmetrical distribution having a mean of 0 and a standard deviation of 1.

Z scores. A z score (or standard score): a transformed score expressed as a deviation from an expected value that has a standard deviation as its unit of measurement. A standard score belonging to the standard normal distribution. Y-μ z = — σ

Sampling Distribution

That is, the spread of the sampling distribution depends on the sample size n, and the spread of the population distribution. As the sample size n increases the standard error decreases. The reason for this is that the denominator of the ratio increases as n increases, whereas the numerator is the population standard deviation, which is a constant and is not dependent on the value of n.

Central Limit Theorem For random sampling, as the sample size n grows, the sampling distribution of approaches a normal distribution. The approximate normality of the sampling distribution applies no matter what the shape of the population distribution.

Hypothesis Testing

Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size 2. Hypotheses Null hypothesis, Ho (parameter value for “no effect”) Alternative hypothesis, Ha (alternative parameter values) 3. Test statistic Compares point estimate to null hypothesized parameter value 4. P-value Weight of evidence about Ho; smaller P is more contradictory

5. Conclusion Report P-value Formal decision

Alpha or significance levels: The α - level is a number such that one reject if Ho if the P-value is less than or equal to it. The α - level is also called the significance level of the test. The most common α - levels are.05 and.01.

Type I and Type II Errors: A Type I error occurs when Ho is rejected, even though it is true. A Type II error occurs when Ho is not rejected, even though it is false.

Bivariate Statistics

PROPORTIONAL REDUCTION IN ERROR (PRE) “all good measures of association use a proportionate reduction in error (PRE) approach. The PRE family of statistics is based on comparing the errors made in predicting the dependent variable with knowledge of the independent variable, to the errors made without information about the independent variable. In other words, PRE measures indicate how knowing the values of the independent variable (first variable) increase our ability to accurately predict the dependent variable (second variable).

Error without Error with decision rule - decision rule PRE statistic = _____________________________ Error without decision rule

Another way of stating this is: E1 - E2 PRE value = _____ E1 Where E1 = number of errors made by the first prediction method. E2 = number of errors made by the second prediction method.

PRE measures are more versatile and more informative than are the chi-square-based measures. All PRE measures are normed; they use a standardized scale where the value 0 means there is no association and 1 means there is perfect association. Any value between these extremes indicates the relative degree of association in a ratio comparison sense. E.g., a PRE measure with a value of.50 represents an association that is twice as strong as one that has a PRE value of.25. The number of cases, the table size, and the variables being measured do not interfere with the interpretation that can be given to them.

Chi Square The Chi-square test examines whether two nominal variables are associated. It is NOT a PRE measure. The chi-square test is based on a comparison between the frequencies that are observed in the cells of a cross-classification table and those that we would expect to observe if the null hypothesis were true. The hypotheses for the chi-square are: Ho: the variables are statistically independent. Ha: the variables are statistically dependent.

Goodman and Kruska’s Gamma (G) A measure of association for data grouped in ordered categories. G is a PRE measure. G compares two measures of a prediction: 1st it randomly predicts all untied scores to be either in agreement or disagreement. 2nd it predicts all untied pairs to be of the same type. Agreement or disagreement is determined by the direction of the bivariate distribution. For a positive pattern we expect untied pairs to be in agreement For a negative pattern we expect untied pairs to be in disagreement.

Pa: we find the number of agreement pairs by multiplying the frequency for each cell by the sum of the frequencies from all cells that are both to the right and below it. Pd: is found by multiplying the frequency for each cell in the table by the sum of the frequencies from all cells that are both to the left and below it.

Bivariate Regression and Correlation

BIVARIATE REGRESSION AND CORRELATION WHY AND WHEN TO USE REGRESSION/CORRELATION? WHAT DOES REGRESSION/CORRELATION MEAN?

You should be able to interpret: The least squares equation. R 2 and Adjusted R 2 F and significance. The unstandardized regression coefficient. The standardized regression coefficient. t and significance. The 95% confidence interval. A graph of the regression line.

ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the dependent variable Y is normal. NORMALITY OF VARIANCE FOR THE ERROR TERM The error term is normally distributed. (Many authors argue that this is more important than normality in the distribution of Y). THE INDEPENDENT VARIABLE IS UNCORRELATED WITH THE ERROR TERM

ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION (Continued) HOMOSCEDASTICITY It is assumed that there is equal variances for Y, for each fixed value of X. LINEARITY The relationship between X and Y is linear. INDEPENDENCE The Y’s are statistically independent of each other.

Download ppt "Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution."

Similar presentations