Practical Statistics Abbreviated Summary.

Practical Statistics Abbreviated Summary

There are six statistics that will answer 90% of all questions!
Descriptive Chi-square Z-tests Comparison of Means Correlation Regression

Z-test are for proportions.
This test is so easy…. That it is not even given in some computer programs like SPSS…..

What is the probability that out of 250 customers, 220 would like the service when the usual percent is that 70% (175 out of 250) are satisfied?

What is the probability that out of a random sample of male and female customers, the percent of both men and women who like a new product is the same?

They come in two types: A sample proportion against a hypothesis.

They come in two types: A sample proportion against a hypothesis. Two samples compared to each other.

The standard error (sampling error) for proportions is: Where p = freq/total and q = 1 - p

Hence: Where p is the hypothesized value, and pt is the proportion found in a sample of size n.

Suppose that XYZ Company believed that 20% of their customers bought 80% of their product (“heavy half”). A sample of 200 customers found that 25% bought 80% of the product. Was the company correct in their estimate?

The test statistic looks like this:

Since the test was “two-tailed,” the critical value of Z would be Therefore, we would conclude that there is not enough evidence to over-ride the assumption that 20% of the customers bought 80% of the product.

P = 0.077

They come in two types: A sample proportion against a hypothesis. Two samples compared to each other.

The test for this case looks like this:

Usually, the test assumes that the two groups Are equal, or:

There is a problem here. What is the value of p: ?

p is the value of the population
proportion, but we usually don’t know that, so p is estimated by the weighted average of the two groups….

Suppose that a new product was test marketed in the United States and
in Japan. The company hypothesizes that both countries response to the product will be the same. 80% of a sample of 500 said they would buy the product again in the U.S., while 75% of a sample of 200 in Japan said they would buy the product again.

Test the hypothesize…..

Since p = 0.80 in the U.S., and 0.75 in Japan,
the weighted average is used for p. So: p = ((.8 x 500)+(.75 x 200))/700 = 0.786

The test would be: Z = .05/ = The critical value is 1.96; p = The null hypothesis cannot be rejected, the U.S. and Japanese customers are assumed to be the same.

interval and ratio scales
t-test and ANOVA are for the means of interval and ratio scales They are very common statistics….

T-test Why is it called a t-test?

William S. Gosset Published under the name: Student

t-test come in three types:
A sample mean against a hypothesis.

A sample mean against a hypothesis. Two sample means compared to each other.

A sample mean against a hypothesis. Two sample means compared to each other. Two means within the same sample.

t-test The standard error for means is:

Each t value comes with a certain degree
t-test Hence for one mean compared to a hypothesis: Each t value comes with a certain degree of freedom df = n - 1

IQ has a mean of 100 and a standard deviation of
t-test IQ has a mean of 100 and a standard deviation of 15. Suppose a group of immigrants came into Iowa. A sample of 400 of these immigrants found an average IQ of 98. Does this group have an IQ below the population average?

The test statistic looks like this:
t-test The test statistic looks like this: There are n – 1 = 399 degrees of freedom. The results are printed out by a computer or looked up on a t-test table.

Of course, we could look this
up on the internet…. For the IQ test: t(399) = 2.67, p =

Therefore, t(399) = 2.67 would indicate
t-test Since the test was “one-tailed,” the critical value of t would be 1.65. Therefore, t(399) = 2.67 would indicate that the immigrants IQ is below normal.

t-test The standard error of the difference between two means looks like this:

Therefore the test statistic would look like this:
t-test Therefore the test statistic would look like this: With degrees of freedom = n(1) + n(2) - 2

t-test Usually this is simplified by looking at the difference between two samples; so that:

Where:

Suppose that a new product was test marketed in
the United States and in Japan. The company hypothesizes that customers in both countries would consume the product at the same rate. A sample of 500 in the U.S. used an average of 200 kilograms a year (SD = 20), while a sample of 400 in Japan used an average of 180 kilograms a year (SD = 25). Test the hypothesize…..

The test would start be computing:
= (a SD = 22.36)

The results are written as:
(t(898) = 13.33, p < .0001), and the conclusion is that there is a large difference in the consumption rate between the U.S. and Japanese customers. 44

3. Two means within the same sample. This t-test is used with correlated samples and/or when the same person or object is measured twice in the same sample.

Student T1 T2 d Tom 89 90 1 Jan 88 91 3 Jason 87 86 -1 Halley 90 90 0
Bill The measurement of interest is d.

That is… the average difference between test 1 and test 2 is zero.
H0 : Average of d = 0 That is… the average difference between test 1 and test 2 is zero.

t-test The sampling error for this t-test is: Were d = score(2) – score(1)

t-test The t-test is: The degrees of freedom = n - 1

Suppose there are more than two groups
that need to be compared. The t-test cannot be utilized for two reason. The number of pairs becomes large. The probability of t is no longer accurate.

Analysis of Variance (ANOVA)
Hence a new statistic is needed: The F-test Or Analysis of Variance (ANOVA) R.A. Fisher

Compares the means of two or more groups
The F-test Compares the means of two or more groups by comparing the variance between groups with the variance that exists within groups. According to the Central Limit Theorem there is a relationship between the variance of a statistic and the variance of the population. If that relationship is violated, it is likely that the statistics did not come from the same population as the other statistics.

https://en.wikipedia.org/wiki/Analysis_of_variance

F is the ratio of variance:

The F-test

The F-test Typical output looks like this:

In SPSS ANOVA looks like this:

Descriptive Chi-square Z-tests t-tests Correlation Regression

Correlation tests the degree of association
between interval and ratio measures.

Karl Pearson Darwin Galton

Correlation is based on a very simple idea that
Karl Pearson saw….

If you take two measures of the
same person or object, multiple them, and then add the products across persons or objects… such as: Person M M2 Product Sum = 55

This is called: the sum of the cross products.
Person M M2 Product Sum = 55

The largest possible sum will occur if M1 and M2
are in perfect ordinal order. Note what happens when only one measure changes. Person M M2 Product Sum = 54

The smallest possible sum will occur if M1 and M2
are in perfect inverse order. Person M M2 Product Sum = 35

The “normal score” or “standardized score” is equal to:

Converting the measures to Z scores…
Person M M2 Product Sum = 5.0

Note, that the sum is equal to the number of people
in the sample, i.e., 5. Person M M2 Product Sum = 5.0

Note now what happens when the measure in Z scores
are arranging for perfect inverse order: Person M M2 Product Sum =

This is called “the sum of the cross products”

When X and Y are in ranked order the
So: When X and Y are in ranked order the max will be equal to n, and when ranked in perfect negative order, the max will be -n. (Or, n-1 and –(n-1) if taken from a sample).

The average of the sum of cross products is
the correlation.

r = 1.0 or r = -1.0 This means that a perfect association
always has a value of 1.0 when in positive order and a value of -1.0 when in negative order. A value of zero would indicate a random relationship between the two variables. r = 1.0 or r = -1.0

The correlation can be graphically shown by
using a scatter plot:

The correlation is related to the shape of
the scatter plot:

The correlation is an INDEX of association. It
contains three pieces of information:

contains three pieces of information: How much association is present, (an index)

contains three pieces of information: How much association is present, Is that a “significant” association, That is, can we reject the H0 that the true association is zero.

contains three pieces of information: How much association is present, Is that a “significant” association, And, what is the magnitude of that association.

is the amount of variation accounted
The correlation is an INDEX of association. It contains three pieces of information: The correlation is r….. and r-squared is the amount of variation accounted for in Y by knowing X.

Be careful!! The correlation does not tell you that X is
the cause of Y. It is a necessary condition for cause, but it does not prove cause…

cum hoc ergo propter hoc
That correlation proves causation, is a logical fallacy by which two events that occur together are claimed to have a cause-and-effect relationship. The fallacy is also known as cum hoc ergo propter hoc (Latin for "with this, therefore because of this") and false cause.

The number of people waiting for a bus or train is highly
correlated with how long a person must wait for a ride.

Trains come sooner when more people are waiting for it! Does this mean that It will come sooner if you bring your friends?

Hence, atmospheric CO2 causes crime.
Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply. Hence, atmospheric CO2 causes crime.

Regression tests the degree of association
between interval and ratio measures, AND gives the best fit to the data.

Regression Does three things: 1. Association 2. Best fit 3. Prediction

Regression Regression creates an equation: A simple linear equation would be: Y = bX + a

Can we use the correlations to create equations
to estimate one variable from another?

For example: Evaluations = b*Personality + a Y = bX + a

So… Evaluation = 0.637 * Personality - 0.530

The equations do not have to be linear?

Regression can use more than one variable to
predict. This is called multiple regression.

When all these variables are put together, as they are in the real world,
only the instructors gender, what section a student took, the students’ GPA, and the evaluation the students gave to the instructor were related to the final grade.

Path Diagram

Question: What predicts the evaluation a class and instructor will get?

The final evaluation of the class
and instructor is related to (in order): Expected grade in Week 16 Actual grade in Week 16 Final grade for the class Note: If all these grades were the same thing, only one would be related.

Why? If we add variables one at a time,
sometimes the answer is different. Notice below how the deserved grade at Week 16 becomes important. Why?

This diagram shows that it is the expected grade
at Week 16 that is predicting the evaluation of the Class and instructor.

This diagram shows that it is the expected grade
at Week 16 that is predicting the evaluation of the Class and instructor. The other grades are being used by students to estimated the expected grades.

Practical Statistics Abbreviated Summary.

Similar presentations

Presentation on theme: "Practical Statistics Abbreviated Summary."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Practical Statistics Abbreviated Summary.

Similar presentations

Presentation on theme: "Practical Statistics Abbreviated Summary."— Presentation transcript:

Similar presentations

About project

Feedback