Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to the Statistical Analysis Using SPSS

Similar presentations


Presentation on theme: "Introduction to the Statistical Analysis Using SPSS"— Presentation transcript:

1 Introduction to the Statistical Analysis Using SPSS
Lecture # 2 By: Dr. Nahed Mohammad Hilmy Department of Statistical and Operations Research

2 T-Test Hypothesis testing involves making a decision concerning some hypothesis or statement about a population parameter such as the population mean, using the sample mean, to decide whether this statement about the value of is valid or not. The steps of the hypothesis testing : 1- The first step is to formulate a null hypothesis written The statement for is usually expressed as an equation or inequality as follows:

3 Also in this step it is stated an alternative hypothesis, written , a statement that indicates the opinion of the conductor of the test as to the actual value of is expressed as follows: We conduct a hypothesis test on a given value to find out if actual observation would lead us to reject the stated value.

4 T-Test The alternative hypothesis suggests the direction of the actual value of the parameter relative to the stated value. The statement of in the form of an inequality that indicates that the investigator has no opinion as to whether the actual value of is more than or less than the stated value but the feeling is that the stated value is incorrect. In this case the test is two-tail test. Statements in the form of strictly greater than or strictly less than relationship indicate that the investigator has an opinion as to the direction of the value of the parameter relative to the stated value. In this case it is called one-tail test.

5 T-Test 2- State the level of significance of the test and the corresponding Z values (for large sample tests), or the corresponding T values ( for small sample tests). The hypothesis test is frequently conducted at the 5%, 1% and 10% levels of significance. Some can use the Z values. For a test conducted at any other level of significance, we simply use the normal distribution table to determine a corresponding Z value. 3- Calculate the test statistic for the sample that has taken. There are three cases:

6 T-Test Case 1: The variable has a normal distribution and is known. In this case the test statistic is which has a standard normal distribution if Case 2: The variable has a normal distribution and is unknown. The test statistic is which has a distribution if is true.

7 T-Test Case 3: The variable is not normal but n is large (which n>30), may be known or unknown. The test statistic is By central limit theorem it has approximately standard normal distribution (0,1) if is true.

8 T-Test 4- Determine the boundary (or boundaries) for the area of rejection regions using either or values. A critical value is the boundary or limit value that requires as to reject the statement of the null hypothesis.

9 T-Test Rejection region Rejection region Lower upper
In directional test there are two critical values when:

10 T-Test Rejection region upper
In directional test there is one critical value (upper boundary ) when:

11 T-Test Rejection region Lower
In directional test there is one critical value (lower boundary ) when:

12 The critical value is simply the maximum or minimum value that we are willing to accept as being consistent with the stated parameter . The mean of the distribution is given by: The standard deviation of the distribution is given by: 5- Formulate a decision rule on the basis of the boundary values obtained in step 4. When we conduct an hypothesis test, we are required to make one of two decisions: a- Reject Ho or B- Accept Ho

13 It is possible to make two errors in decision
It is possible to make two errors in decision . One error is called a type I error or We make a type I error whenever we reject the statement of ,when is in fact true. The probability of making a type I error is the level of significance of the test. The second error we can make in an hypothesis test is called a type II error, or B-error. We commit a type II error if we fail to reject the statement of ,when is in fact false. The four combinations of truth values of and the resulting decisions are summarizing below:

14 False True Correct Decision Type error Reject Accept

15 When we lower the level of significance of an hypothesis test we always increase the possibility of committing a B-error. 6- State a conclusion for the hypothesis test based on the sample data obtained and the decision rule stated in steps.

16 P-value of a test: The p- value is the probability of getting a value more extreme than one observed value of the test statistic, it is denoted by When is as follows: P-value= 2p (Z >| |) When is :> p-value= p (Z > ) When is :< P-value = p (Z < )

17 If we have a T statistic with a distribution and observe value , these p-values becomes:
alternative :p-value = 2p ( >| |) > alternative :p-value = p ( > ) < alternative :p-value = p( < )

18 Thus is rejected if p-value <
Thus is rejected if p-value < When data is collected from a normally distributed population and the sample size is small, the t values of the student t distribution must be used in the hypothesis test not the Z values of the normal distribution. This is due to the fact that her central limit theorem does not apply when n < 30.

19 Ex: Suppose we measure the sulfur content (as a percent) of 15 samples of crude oil from a particular Middle Eastern area obtaining: 1.9,2.3,2.9,2.5,2.1,2.7,2.8,2.6,2.6,2.5,2.7,2.2,2.8,2.7,3. Assume that sulfur content are normally distributed . Can we conclude that the average sulfur content in this area is less than 2.6? Use a level of significance of .05.

20

21 Rejection region .95 .05 -1.6

22

23

24 Testing for the Difference in Two Population means:
Often we have two populations for which we would like to compare the means. Independent random samples of sizes and are selected from the two populations with no relationship between the elements we drawn from the two populations. The statistical hypothesis are given by:

25

26 There are three cases which depend on what is known about the the population variances.
Population variances are known for normal populations (or non normal populations with both and large). In this case the test statistic is to be :

27 Populations are unknown but are to be equal
Case2: Populations are unknown but are to be equal in normal populations. In this case, we pool our estimates to get the pooled two- sample variance

28 And the test statistic is to be
Which has a distribution if is true.

29 Case 3: and are unknown and unequal normal populations . In this case the test statistic is given by: which does not have a known distribution.

30 Ex: The amount of solar ultraviolet light of wavelength from 290 to 320 nm which reached the earths surface in the Riyadh area was measured for independent samples of days in cooler months (October to March) and in warmer months (April to September): Cooler:5.31,4.36,3.71,3.74,4.51,4.58,4.64,3.83,3.16,3.67,4.34,2.95,3.62,3.29,2.45. Warmer:4.07,3.83,4.75,4.84,5.03,5.48,4.11,4.15,3.9,4.39,4.55,4.91,4.11,3.16,2.99,3.01,3.5,3.77.

31 Assuming normal distributions with equal variances , test whether there is a difference in the average ultraviolet light reaching Riyadh in the cooler and warmer months . Use a level of significance of .05.

32

33 The pooled two sample variance is
And the test statistic is to be

34 .95 .025 .025

35

36

37 Since the value of the test statistic is in the acceptance region , then is accepted at .
It means that there is no difference in the average ultraviolet light reaching Riyadh in the cooler and warmer months .

38 Dependent Samples: The method of comparing parameters of populations using paired dependent samples requires that we pair the items of data as we sample them from the two populations. .Further more , the size of the two populations selected from both populations is the same, that is

39 For each (the elements of the sample before the experiment) and (the elements of the sample after the experiment) we obtain in the two samples, we compute a value of a random variable D which represents the difference between the two populations and n is the number of items of data obtained in each of the two samples .

40 The samples drawn from the two populations are therefore converted to single sample –a sample of
The mean , , and the standard deviation, , of the distribution of are obtained as follows:

41 We are interested in testing one of the tests of hypothesis:
Thus the quantity has a distribution.

42 Ex: In an experiment comparing two feeding methods for calves, eight pairs of twins were used-one twin receiving Method A and the other twin receiving Method B. At the end of a given time, the calves were slaughtered and cooked, and the meat was rated for its taste (with a higher number indicating a better taste

43 Method B Method A Twin pair 23 27 1 28 37 2 30 31 3 32 38 4 29 5 35 6 36 41 7 8

44 Assuming approximate normality, test if the average taste score for calves fed by Method B is less than the average taste for calves fed by Method A. Use

45 16 4 81 9 1 36 6 2 25 5 235 39

46

47 The test statistic is

48 .95 rejection region .05

49

50

51

52 Quality Control A “defect” is an instance of a failure to meet a requirement imposed on a unit with respect to single quality characteristic . In inspection or testing , each unit is checked to see if it does or dose not contain any defects. For example , if every dosage unit could be tested , the expense would probably be prohibitive both to manufacturer and consumer. Also it is may cause misclassification of items and other errors . Quality can be accurately and precisely estimated by testing only part of the total material (a sample) .It requires small samples for inspection or analysis .

53 Data obtained from this sampling can then be treated statistically to estimate population parameters. After inspection (n) units we will have found say (d) of them to be defectives and (n - d) of them to be good ones. On the other hand we may count and record the number of defects, c, we find on single unit. This count may be 0,1,2,…. Such an approach of counting of defects on a unit becomes especially useful if most of the units contain one or more defects.

54 Control charts can be applied during in - process manufacturing operations, for finished product characteristics and in research and development for repetitive procedures.We may always convert a measurable characteristics of a unit to an attribute by setting limits, say L (lower bound) and U (upper bound) for x. Then if x lies between, the unit is a good one, or if outside, it is a defective one. As an example for the control chart the tablet weight.

55 We are interested in ensuring that tablet weight remain close to a target value under “statistical control”. To achieve this object , we will periodically sample a group of tablets, measuring the mean weight and variability. Variability can be calculated on the basis of the standard deviation or the range. The range is the difference between the lowest and highest value.

56 If the sample size is not large (<10) the range is an efficient estimator of the standard deviation. The mean weight and variability of each sample (subgroup) are plotted sequentially as a function of time. The control chart is a graph that has time or order of submission of sequential lots on the x axis and the average test result on the Y axis. The subgroups should be as homogeneous as possible relative to overall process. They are usually ( but not always) taken as units manufactured close in time.

57 Four to five items per subgroup is usually as adequate sample size
Four to five items per subgroup is usually as adequate sample size. In our example (10) tablets are individually weighted at approximately (1) hour intervals. The mean and range are calculated for each of the subgroups samples. As long as the mean and range of the 10 tablet samples do not vary “ too much” from subgroup to subgroup, the product is considered to be in control (it means that the observed variation is due only to the random, uncontrolled variation inherent in the process).

58 We will define upper and lower limits for the mean and range of the subgroups. The construct of these limits is based on normal distribution. In particular, a value more than (3) standard deviations from the mean is highly unlikely and can be considered to be probably due to some systematic, assignable cause. The average line (the target value) may be determined from the history of the product regular updating or may be determined from the product specifications .

59 The action lines (the limits) are constructed to represent
standard deviations ( limits) from the target value. The upper and lower limits for the mean chart are given by: is the average range , K is the number of samples (subgroups).A is a factor which is obtained from a table according to the sample size .

60 Central line = Lower limit = Upper limit =
The central line, the upper and lower limits for the range chart are given by: Central line = Lower limit = Upper limit =

61 Where and are factors which are
obtained from a table according to the sample size. It is noticed that the sample size is constant. Ex: Tablet weights and ranges from a tablet Manufacturing Process (Data are the average and range of 10 tablets):

62 5 12 300.3 17 Range R Mean Time Date 16 302.4 11 a.m. 3/1 13 298.4
12 p.m. 10 300.2 1 p.m. 9 299 2 p.m. 300.4 3/5 5 12 300.3 17

63 Range R Mean Time Date 18 300.8 11 a.m. 3/9 6 301.5 12 p.m. 7 301.6 1 p.m. 8 301.3 2 p.m. 12 301.7 3/11 9 303 300.5 11 299.3

64 Range R Mean Time Date 13 300 11 a.m. 3/16 8 299.1 12 p.m. 300.1 1 p.m. 10 303.5 2 p.m. 14 297.2 3/22 9 296.2 11 297.4 12 296

65

66

67 304 U c L= 302 C L=300 300 L c L= 298 296 294 292 290 1\3 3\11 5\3 3\9 3\16 3\22

68 U c L=19.283 18 16 14 12 C L=10.833 10 8 6 4 L c L=2.383 3\1 3\5 3\9 3\11 3\16 3\22


Download ppt "Introduction to the Statistical Analysis Using SPSS"

Similar presentations


Ads by Google