Practical Statistics Abbreviated Summary.

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Advertisements

Practical Statistics Z-Tests. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. Comparison of.
Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation Chapter 9.
Data Analysis Statistics. Inferential statistics.
Chapter 13 Multiple Regression
Chapter 12 Multiple Regression
The Simple Regression Model
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Data Analysis Statistics. Inferential statistics.
Practical Statistics Mean Comparisons. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. Comparison.
Correlation and Linear Regression
Introduction to Linear Regression and Correlation Analysis
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Chapter 15 Correlation and Regression
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Testing Hypotheses about Differences among Several Means.
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Practical Statistics Regression. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. Comparison.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Practical Statistics Correlation. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. t-tests 5.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Practical Statistics Regression. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. Comparison.
Chapter 13 Understanding research results: statistical inference.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Practical Statistics Correlation. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. t-tests 5.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Stats Methods at IC Lecture 3: Regression.
Virtual University of Pakistan
Testing the Difference between Means, Variances, and Proportions
Regression and Correlation
Dependent-Samples t-Test
Lecture Nine - Twelve Tests of Significance.
Practical Statistics Mean Comparisons.
Correlation and Simple Linear Regression
Inference for Regression
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing Review
Elementary Statistics
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Simple Linear Regression
Correlation and Regression
Introduction to Statistics
Chapter 9 Hypothesis Testing.
Chi-square and F Distributions
Correlation and Simple Linear Regression
Hypothesis Tests for a Standard Deviation
Simple Linear Regression and Correlation
Statistics II: An Overview of Statistics
SIMPLE LINEAR REGRESSION
15.1 The Role of Statistics in the Research Process
Chapter Nine: Using Statistics to Answer Questions
Chapter Thirteen McGraw-Hill/Irwin
EE, NCKU Tien-Hao Chang (Darby Chang)
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Practical Statistics Abbreviated Summary

There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests Comparison of Means Correlation Regression

Z-test are for proportions. This test is so easy…. That it is not even given in some computer programs like SPSS…..

Z-test are for proportions. What is the probability that out of 250 customers, 220 would like the service when the usual percent is that 70% (175 out of 250) are satisfied?

Z-test are for proportions. What is the probability that out of a random sample of male and female customers, the percent of both men and women who like a new product is the same?

Z-test are for proportions. They come in two types: A sample proportion against a hypothesis.

Z-test are for proportions. They come in two types: A sample proportion against a hypothesis. Two samples compared to each other.

Z-test are for proportions. The standard error (sampling error) for proportions is: Where p = freq/total and q = 1 - p

Z-test are for proportions. Hence: Where p is the hypothesized value, and pt is the proportion found in a sample of size n.

Z-test are for proportions. Suppose that XYZ Company believed that 20% of their customers bought 80% of their product (“heavy half”). A sample of 200 customers found that 25% bought 80% of the product. Was the company correct in their estimate?

Z-test are for proportions. The test statistic looks like this:

Z-test are for proportions. Since the test was “two-tailed,” the critical value of Z would be 1.96. Therefore, we would conclude that there is not enough evidence to over-ride the assumption that 20% of the customers bought 80% of the product.

Z-test are for proportions. https://www.medcalc.org/calc/test_one_proportion.php http://www.danielsoper.com/statcalc/calculator.aspx?id=2 P = 0.077

Z-test are for proportions. They come in two types: A sample proportion against a hypothesis. Two samples compared to each other.

Z-test are for proportions. The test for this case looks like this:

Z-test are for proportions. Usually, the test assumes that the two groups Are equal, or:

There is a problem here. What is the value of p: ?

p is the value of the population proportion, but we usually don’t know that, so p is estimated by the weighted average of the two groups….

Suppose that a new product was test marketed in the United States and in Japan. The company hypothesizes that both countries response to the product will be the same. 80% of a sample of 500 said they would buy the product again in the U.S., while 75% of a sample of 200 in Japan said they would buy the product again.

Test the hypothesize…..

Since p = 0.80 in the U.S., and 0.75 in Japan, the weighted average is used for p. So: p = ((.8 x 500)+(.75 x 200))/700 = 0.786

The test would be: Z = .05/.0343 = 1.45 The critical value is 1.96; p = 0.147. The null hypothesis cannot be rejected, the U.S. and Japanese customers are assumed to be the same.

There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests Comparison of Means Correlation Regression

interval and ratio scales t-test and ANOVA are for the means of interval and ratio scales They are very common statistics….

T-test Why is it called a t-test?

William S. Gosset 1876-1937 Published under the name: Student

t-test come in three types: A sample mean against a hypothesis.

t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other.

t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other. Two means within the same sample.

t-test The standard error for means is:

Each t value comes with a certain degree t-test Hence for one mean compared to a hypothesis: Each t value comes with a certain degree of freedom df = n - 1

IQ has a mean of 100 and a standard deviation of t-test IQ has a mean of 100 and a standard deviation of 15. Suppose a group of immigrants came into Iowa. A sample of 400 of these immigrants found an average IQ of 98. Does this group have an IQ below the population average?

The test statistic looks like this: t-test The test statistic looks like this: There are n – 1 = 399 degrees of freedom. The results are printed out by a computer or looked up on a t-test table.

Of course, we could look this up on the internet…. http://www.danielsoper.com/statcalc/calculator.aspx?id=8 For the IQ test: t(399) = 2.67, p = 0.00395

Therefore, t(399) = 2.67 would indicate t-test Since the test was “one-tailed,” the critical value of t would be 1.65. Therefore, t(399) = 2.67 would indicate that the immigrants IQ is below normal.

t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other. Two means within the same sample.

t-test The standard error of the difference between two means looks like this:

Therefore the test statistic would look like this: t-test Therefore the test statistic would look like this: With degrees of freedom = n(1) + n(2) - 2

t-test Usually this is simplified by looking at the difference between two samples; so that:

Where:

Suppose that a new product was test marketed in the United States and in Japan. The company hypothesizes that customers in both countries would consume the product at the same rate. A sample of 500 in the U.S. used an average of 200 kilograms a year (SD = 20), while a sample of 400 in Japan used an average of 180 kilograms a year (SD = 25). Test the hypothesize…..

The test would start be computing: = 500 (a SD = 22.36)

The results are written as: (t(898) = 13.33, p < .0001), and the conclusion is that there is a large difference in the consumption rate between the U.S. and Japanese customers. 44

t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other. Two means within the same sample.

t-test come in three types: 3. Two means within the same sample. This t-test is used with correlated samples and/or when the same person or object is measured twice in the same sample.

Student T1 T2 d Tom 89 90 1 Jan 88 91 3 Jason 87 86 -1 Halley 90 90 0 Bill 75 79 4 The measurement of interest is d.

That is… the average difference between test 1 and test 2 is zero. H0 : Average of d = 0 That is… the average difference between test 1 and test 2 is zero.

t-test The sampling error for this t-test is: Were d = score(2) – score(1)

t-test The t-test is: The degrees of freedom = n - 1

Suppose there are more than two groups that need to be compared. The t-test cannot be utilized for two reason. The number of pairs becomes large. The probability of t is no longer accurate.

Analysis of Variance (ANOVA) Hence a new statistic is needed: The F-test Or Analysis of Variance (ANOVA) R.A. Fisher 1880-1962

Compares the means of two or more groups The F-test Compares the means of two or more groups by comparing the variance between groups with the variance that exists within groups. According to the Central Limit Theorem there is a relationship between the variance of a statistic and the variance of the population. If that relationship is violated, it is likely that the statistics did not come from the same population as the other statistics.

https://en.wikipedia.org/wiki/Analysis_of_variance https://www.youtube.com/watch?v=0Vj2V2qRU10

F is the ratio of variance:

The F-test http://www.statsoft.com/textbook/distribution-tables/

The F-test Typical output looks like this:

In SPSS ANOVA looks like this:

There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests t-tests Correlation Regression

Correlation tests the degree of association between interval and ratio measures.

Karl Pearson Darwin Galton

Correlation is based on a very simple idea that Karl Pearson saw….

If you take two measures of the same person or object, multiple them, and then add the products across persons or objects… such as: Person M1 M2 Product 1 5 5 25 2 4 4 16 3 3 3 9 4 2 2 4 5 1 1 1 Sum = 55

This is called: the sum of the cross products. Person M1 M2 Product 1 5 5 25 2 4 4 16 3 3 3 9 4 2 2 4 5 1 1 1 Sum = 55

The largest possible sum will occur if M1 and M2 are in perfect ordinal order. Note what happens when only one measure changes. Person M1 M2 Product 1 5 4 20 2 4 5 20 3 3 3 9 4 2 2 4 5 1 1 1 Sum = 54

The smallest possible sum will occur if M1 and M2 are in perfect inverse order. Person M1 M2 Product 1 5 1 5 2 4 2 8 3 3 3 9 4 2 4 8 5 1 5 5 Sum = 35

The “normal score” or “standardized score” is equal to:

Converting the measures to Z scores… Person M1 M2 Product 1 -1.41 -1.41 2.0 2 -0.71 -0.71 0.5 3 0 0 0 4 0.71 1.71 0.5 5 1.41 1.41 2.0 Sum = 5.0

Note, that the sum is equal to the number of people in the sample, i.e., 5. Person M1 M2 Product 1 -1.41 -1.41 2.0 2 -0.71 -0.71 0.5 3 0 0 0 4 0.71 1.71 0.5 5 1.41 1.41 2.0 Sum = 5.0

Note now what happens when the measure in Z scores are arranging for perfect inverse order: Person M1 M2 Product 1 -1.41 1.41 -2.0 2 -0.71 0.71 -0.5 3 0 0 0 4 0.71 -1.71 -0.5 5 1.41 -1.41 -2.0 Sum = -5.0

  This is called “the sum of the cross products”

When X and Y are in ranked order the So: When X and Y are in ranked order the max will be equal to n, and when ranked in perfect negative order, the max will be -n. (Or, n-1 and –(n-1) if taken from a sample).

The average of the sum of cross products is the correlation.

r = 1.0 or r = -1.0 This means that a perfect association always has a value of 1.0 when in positive order and a value of -1.0 when in negative order. A value of zero would indicate a random relationship between the two variables. r = 1.0 or r = -1.0

The correlation can be graphically shown by using a scatter plot:

The correlation is related to the shape of the scatter plot: http://en.wikipedia.org/wiki/Scatter_plot http://www.itl.nist.gov/div898/handbook/eda/section3/eda33q.htm

The correlation is an INDEX of association. It contains three pieces of information:

The correlation is an INDEX of association. It contains three pieces of information: How much association is present, (an index)

The correlation is an INDEX of association. It contains three pieces of information: How much association is present, Is that a “significant” association, That is, can we reject the H0 that the true association is zero.

The correlation is an INDEX of association. It contains three pieces of information: How much association is present, Is that a “significant” association, And, what is the magnitude of that association.

is the amount of variation accounted The correlation is an INDEX of association. It contains three pieces of information: The correlation is r….. and r-squared is the amount of variation accounted for in Y by knowing X.

Be careful!! The correlation does not tell you that X is the cause of Y. It is a necessary condition for cause, but it does not prove cause…

cum hoc ergo propter hoc That correlation proves causation, is a logical fallacy by which two events that occur together are claimed to have a cause-and-effect relationship. The fallacy is also known as cum hoc ergo propter hoc (Latin for "with this, therefore because of this") and false cause.

The number of people waiting for a bus or train is highly correlated with how long a person must wait for a ride.

Trains come sooner when more people are waiting for it! Does this mean that It will come sooner if you bring your friends?

Hence, atmospheric CO2 causes crime. Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply. Hence, atmospheric CO2 causes crime.

There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests Comparison of Means Correlation Regression

Regression tests the degree of association between interval and ratio measures, AND gives the best fit to the data.

Regression Does three things: 1. Association 2. Best fit 3. Prediction

Regression Regression creates an equation: A simple linear equation would be: Y = bX + a

Can we use the correlations to create equations to estimate one variable from another?

For example: Evaluations = b*Personality + a Y = bX + a

So… Evaluation = 0.637 * Personality - 0.530

The equations do not have to be linear?

Regression can use more than one variable to predict. This is called multiple regression.

When all these variables are put together, as they are in the real world, only the instructors gender, what section a student took, the students’ GPA, and the evaluation the students gave to the instructor were related to the final grade.

Path Diagram

Question: What predicts the evaluation a class and instructor will get?

The final evaluation of the class and instructor is related to (in order): Expected grade in Week 16 Actual grade in Week 16 Final grade for the class Note: If all these grades were the same thing, only one would be related.

Why? If we add variables one at a time, sometimes the answer is different. Notice below how the deserved grade at Week 16 becomes important. Why?

This diagram shows that it is the expected grade at Week 16 that is predicting the evaluation of the Class and instructor.

This diagram shows that it is the expected grade at Week 16 that is predicting the evaluation of the Class and instructor. The other grades are being used by students to estimated the expected grades.