Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03

Similar presentations


Presentation on theme: "Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03"— Presentation transcript:

1 Data Analysis and Statistical Software I (323-21-403) Quarter: Autumn 02/03
Daniela Stan, PhD Course homepage: Office hours: (No appointment needed) M, 3:00pm - 3:45pm at LOOP, CST 471 W, 3:00pm - 3:45pm at LOOP, CST 471 11/7/2019 Daniela Stan - CSC323

2 Outline Chapter 7: Inference for Distributions
Summary of tests of significance (Chapter 6) Inference for the mean of a population (Section 7.1) The t-distributions The one-sample t confidence interval The one-sample t significance test Robustness of the t procedures 11/7/2019 Daniela Stan - CSC323

3 General comments on stating hypotheses
It is not easy to state the null and the alternative hypothesis! Often we set Ha first and then Ho is defined as the “opposite” statement! The hypotheses are statements on the population values. The alternative hypothesis Ha is often called “researcher hypothesis”, because it is the hypothesis we are interested about. A significance test is a test against the null hypothesis. 11/7/2019 Daniela Stan - CSC323

4 Significance levels & P-values
If the p-value is small, then the null hypothesis should not be accepted (or should be rejected using the statistical terminology). In common statistical terminology: If P is less than  = 0.05, the null hypothesis is rejected at 5% significance level. The test result is called ‘statistically significant”. If P is less than  = 0.01, the null hypothesis is rejected at 1% significance level. The test result is called ‘highly significant’. If P is larger than 0.05, the null hypothesis cannot be rejected. The test is called “not significant”. 11/7/2019 Daniela Stan - CSC323

5 Assumptions when applying z-statistic
1. The population has a normal distribution with mean µ and standard deviation . 2. The standard deviation  is known 3. The size ‘n’ of the simple random sample (SRS) is large 4. The appropriate test statistic to use for inference about µ when  is known is the z statistic: where the expected value µ0 is the value assumed in the null hypothesis Ho. z has a normal distribution N(0,1) z = (x - µ0)  /  n 11/7/2019 Daniela Stan - CSC323

6 Assumptions when applying z-statistic
Is z-statistic appropriate to use when: The sample size is small? 2. The population does not have a normal distribution? 3. The population has a normal distribution but the standard deviation  is unknown? When the standard deviation of a statistic (in our case x) is estimated from data, the result is called the standard error of the statistic: SE x = s/  n What is the distribution of (x - µ0) s/  n ? It is not normal! 11/7/2019 Daniela Stan - CSC323

7 The t-distributions (x -µ0) t = s /  n
Suppose that an SRS of size n is drawn from an N(µ, ). Then the one-sample t statistic t = (x -µ0) s /  n has the t-distribution with n-1 degrees of freedom. - The degrees of freedom come from the standard deviation s in the denominator of t. 11/7/2019 Daniela Stan - CSC323

8 Inference on averages for small samples
When the sample is small, say n<50, the z-test has to be modified. We need to use other methods! Consider the following example. A new type of keyboard has been developed. The producers want to test if the new design makes the data entry easier and faster. They take a random sample of 24 individuals and for each individual record the input time of a standard data entry task with the new keyboard. A previous study showed that the same data entry task was completed in seconds (on average) using a current type of keyboard. We need to perform a test of significance on the hypotheses: Null hypothesis Ho:  = sec Alternative hypothesis Ha: 11/7/2019 Daniela Stan - CSC323

9 Inference on averages for small samples (cont.)
The number of observations (n=24) is so small that the normal approximation is not accurate to calculate the p-value! If data arise from a population with normal distribution, we can use a different curve, called t- distribution or Student’s curve. The t-distribution was discovered by W. S. Gosset (born on 13 June 1876 in Canterbury, England), the chief statistician of the Guinness brewery in Dublin, Ireland. He discovered the t-distribution in order to deal with small samples arising in statistical quality control. The brewery had a policy against employees publishing under their own names, thus he published his results about the t-distribution under the pen name "Student", and that name has become attached to the distribution. 11/7/2019 Daniela Stan - CSC323

10 Comparing the student’s curve and the standard normal curve
d.f.=5 d.f.=15 t t Student’s curve Standard Normal curve Student’s curve has “fatter” tails. For d.f. around 30, the student’s curve is very similar to the standard normal curve. d.f.=30 11/7/2019 Daniela Stan - CSC323 t

11 Finding the p-value using the student’s curve
There are many student’s curves! There is one student’s curve for each number of degrees of freedom; for tests on averages: Degrees of freedom = number of observations – 1 In the previous example we had 24 observations, therefore the degrees of freedom are d.f. = 24–1=23. The p-value is found using a table of values for the student’s curves or a statistical package such as SAS. The table for t-distribution (Table D) can be found on page T-11 in the appendix. 11/7/2019 Daniela Stan - CSC323

12 The one-sample t test Step 1: Specify the hypotheses in the significance test: Set up the null hypothesis Ho and the alternative hypothesis Ha Step 2: Compute the test statistic The test statistic measures the difference between the data and what is expected on the null hypothesis. Step 3: Determine the appropriate student’s curve The P-value is obtained NOT from the normal curve but from one of the Student’s curves, with degrees of freedom d.f.=number of observations – 1 11/7/2019 Daniela Stan - CSC323

13 The one-sample t test (cont.)
Step 4: Compute the P-value Compute the p-value using the student’s curve with degrees of freedoms calculated in step 3. Step 5: Draw a conclusion about the test on the basis of the p-value. Small p-values are evidence against the null hypothesis; they indicate that the observed difference from Ho is NOT due just to chance. 11/7/2019 Daniela Stan - CSC323

14 When to use the t-test When should we use it? Each of the following conditions should hold: For computing a statistical test on averages. The sample is a simple random sample. The number of observations is small, the sample size n is less than 30. The distribution of the population is bell-shaped, it is not too different from the normal distribution. (Not easy to check, typically true for measurements!) 11/7/2019 Daniela Stan - CSC323

15 Tests on averages: z-test or t-test?
If the amount of current data is large Small (n <50) Use the z-test & the normal curve The distribution of the population is Unknown but quite different from the normal curve Unknown but not different from the normal curve Use the t-test & the student’s curve Do not use the t-test! 11/7/2019 Daniela Stan - CSC323

16 Example: keyboard data
Null hypothesis Ho:  = sec Alternative hypothesis Ha:  = sec Assume data are drawn from approximately normal population. The sample size is small, we use the t-test. Use SAS to compute the test. SAS only supports testing for zero population mean. Test H0:  =c, for some nonzero constant c, Transform the data by subtracting c The new null hypothesis is Ho: _new =0 The PROC UNIVARIATE computes the two-sided p-value "Pr>|T|" for the alternative hypothesis Ha: 11/7/2019 Daniela Stan - CSC323

17 SAS output Observed t = -2.22 < 0
Testing for Population Mean Completion Time of 47.20 The UNIVARIATE Procedure ……………………………………………………………………………………………. Tests for Location: Mu0=0 Test Statistic p Value------ Student's t t Pr > |t| Sign M Pr >= |M| Signed Rank S Pr >= |S| Observed t = < 0 Two-sided p-value = (< 0.05). Therefore, it is a significant result, and thus, I can reject the null hypothesis and conclude that the new keyboard is better! Since t =-2.22 is negative, we can conclude that the average completion time is probably shorter! 11/7/2019 Daniela Stan - CSC323


Download ppt "Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03"

Similar presentations


Ads by Google