Presentation is loading. Please wait.

Presentation is loading. Please wait.

Click to edit Master text styles Second level Third level Fourth level Fifth level Test of Categorical Data / Proportion 1.

Similar presentations


Presentation on theme: "Click to edit Master text styles Second level Third level Fourth level Fifth level Test of Categorical Data / Proportion 1."— Presentation transcript:

1 Click to edit Master text styles Second level Third level Fourth level Fifth level Test of Categorical Data / Proportion 1

2 2 Estimation Estimate population means Estimate population proportion Estimate population variance Hypothesis testing Testing population means Testing categorical data / proportion Testing population variances Hypothesis about many population means One-way ANOVA Two-way ANOVA

3 3 Test the interested proportion in population e.g. Proportion of defect in production Proportion of people travelling by Skytrain in BKK Test if the proportion in population is as expected using data collected from sample

4 4 Binomial proportion Single population Two populations Multiple groups proportion : Chi-square Test Single population Test of homogeneity Goodness of fit test Two Populations Test of Homogeneity Test of Independence

5 5 The sample is categorized into two groups. Single population Determining if the proportion in one of two categories is different from a specified proportion Two populations Compare the difference between the proportions of two populations Steps are similar to testing of population mean Assumptions Normal distribution of population proportion (or proportion difference in the case of two populations) Number of sample (of both populations in the case of two populations) is sufficiently large (n ≥ 30)

6 6

7 7 A plastic product factory take a sample of 400 plastic containers, 12 of which are defective. From this data, test if the proportion of defect is more than 2% at significant level 0.05. Hypotheses α = 0.05

8 8 Calculate test statistic z-score from table: z 0.05 = 1.645 The calculated z-score is 1.429 < 1.645, not falling in right- tailed critical region. Accept H 0 and reject H 1 The proportion of defect is not more than 2% at significant level 0.05

9 9

10 10 From the observation of students wearing and not wearing safety helmet when riding motorcycles, among 500 sample students, 75 students wear helmet. Can a conclusion be drawn that the proportion of students wearing helmet is less than 20% at significant level 0.01? Hypothesis H 0 : P ≥ 0.2 H 1 : P < 0.2 α = 0.01

11 11 Calculate test statistic z-score from table: z 0.01 = 2.326 The calculated z-score is -2.80 < -2.326, falling in left- tailed critical region. Reject H 0 and accept H 1 The proportion of students wearing helmet is less than 20% at significant level 0.01

12 12 A shampoo company expects that after advertising the new product for 2 months, the product will be popular among 60% of consumers. Thus, after the advertisement period, 300 bottles are given out to 300 sample consumers, 220 of which respond positively. Test if the assumption is true at significant level 0.05.

13 13 From a sample of 90 students, 28 students have private cars. Test if the proportion of the students having private cars is more than 25% at significant level 0.05.

14 14

15 15

16 16 If d 0 = 0 e.g. H 0 : P 1 = P 2 or P 1 – P 2 = 0 Pooled estimated proportion x 1, x 2 : numbers of interested in the first and second samples n 1, n 2 : sizes of the first and second samples Additional assumption

17 17 If d 0 = 0 e.g. H 0 : P 1 = P 2 or P 1 – P 2 = 0 Estimated variance of proportion difference Z-score calculation adjusted to

18 18 From a survey, among 100 IT students, 70 have a smart phone. And among 150 art students, 72 have a smart phone. Test if the proportion of IT students who have a smart phone is more than 10% higher than that of art students at significant level 0.05. Hypothesis α = 0.05

19 19 Calculate test statistic z-score from table: z 0.05 = 1.645 The calculated z-score is 1.956 > 1.645, falling in right-tailed critical region. Reject H 0 and accept H 1 The proportion of IT students who have a smart phone is 10% higher than that of art students at significant level 0.05

20 20 From a survey, among 200 university students, 120 have a notebook computer. And among 500 high school students, 240 have a notebook computer. Test if the proportion of university students who have a notebook computer is higher than that of high school students at significant level 0.025. Hypothesis *d 0 = 0 α = 0.025

21 21 Calculate pooled estimated proportion n 1 p = 200*0.51 = 102 n 1 q = 200*0.49 = 98 n 2 p = 500*0.51 = 255 n 2 q = 500*0.49 = 245 Calculate test statistic

22 22 z-score from table: z 0.025 = 1.96 The calculated z-score is 2.9 > 1.96, falling in right-tailed critical region. Reject H 0 and accept H 1 The proportion of university students who have a notebook computer is higher than that of high school students at significant level 0.025

23 23 From the previous observation of students wearing and not wearing safety helmet when riding motorcycles, the 500 sample students are grouped by gender as shown in the table. Can a conclusion be drawn that the proportion of female students wearing helmet is higher than the male counterpart at significant level 0.05? MaleFemale Wearing helmet4035 Not wearing helmet260165 Total300200

24 24 Hypothesis H 0 : P f ≤ P m H 1 : P f > P m

25 25 Calculate test statistic z-score from table: z 0.05 = 1.645 The calculated z-score is 1.28 < 1.645, not falling in right- tailed critical region. Accept H 0 and reject H 1 The proportion of female students wearing helmet is not higher than male at significant level 0.05

26 26

27 27 According to a polio vaccination program in a school, 16 out of 100 vaccinated female students are infected, and 20 out of 200 vaccinated male students are infected. Test if the proportion of the infected female students is 5% higher than the proportion of the infected male students at significant level 0.10.

28 28 Categorical data cannot be measured in terms of number but can be grouped e.g. 5-rating scale, religion, occupation, and gender The data of each group is then frequency, which can be tested using Chi-square test (χ 2 ) Determine if the observed proportion of groups is different from a specified expected ratio

29 29 Assumptions Sample size must be sufficiently large: 4-5 times the number of groups The frequency of each group must not be less than 5. If exist, combine that group with an adjacent group (reducing degree of freedom) Cannot be applied to repeated measures design Measuring the same sample after a time period e.g. measuring the effect of a drug after the 1 st, 2 nd, and 3 rd hour. Measuring the same variable after changing treatment e.g. measuring blood pressure of the same sample after administering different drug dosages.

30 30 If the sample contain 2 groups (degree of freedom = 1) and total frequency is less than 50, Frank Yate suggested using Corrected Chi-square *If the total frequency is 50 or more, no need to use Corrected Chi-square But we leave this matter here

31 31 Single variable Test of homogeneity Goodness of fit test Two variables Test of Homogeneity Test of Independence df = k-1-m

32 32 Used to determine whether the proportion of two or more groups in a population is similar Hypothesis O i : observed frequency in each group E i : expected frequency in each group k: number of groups

33 33 Reject H 0 when the calculated from table Rejection region Acceptance region

34 34 In the teaching evaluation of a course, from the total of 200 students, 72 are very satisfied, 60 are satisfied, 22 are indifferent and 46 are unsatisfied. Is the proportion of the satisfaction levels similar at significant level 0.01? Hypothesis H 0 : Frequency of each satisfaction level is not different H 1 : Frequency of each satisfaction level is different

35 35 Calculate test statistic Frequency Level O E(O-E)(O-E) 2 (O-E) 2 /E Very satisfied 7250224849.68 Satisfied 6050101002 Indifferent 2250-2878415.68 Unsatisfied 4650-4160.32 Total200 27.68

36 36 Critical Chi-square Degree of freedom = k - 1 = 4 – 1 = 3 The calculated Chi-square is 27.68 > 11.34 falling in critical region Reject H 0 and accept H 1 The proportion of the satisfaction levels is not similar at significant level 0.01

37 37

38 38 A coffee bean reseller assumes that the sale proportion of 4 types of coffee beans are equal. 500 customers are sampled and the number of sale of each type of coffee bean is shown in the table. Test if the assumption is true at significant level 0.01. TypeSale count ABCDABCD 110 162 98 130

39 39 Used to determine whether the proportion of two or more groups in a population fits a specified proportion Hypothesis O i : observed frequency in each group E i : expected frequency in each group E i = np i ; n = total freq, p = probability of distribution of the group k: number of groups m: number of parameters to be estimated (we only study non-parametric chi-square so ignore this) df = k-1-m

40 40 A financial institute studies history of loan clients. It is found that 80% of the clients can return their loan in 1 year, 10% in 2 years, 6% in 3 years, and 4% in over 3 years. To assess the current situation, 400 recent loan clients are sampled, 287 of which can return their loan in 1 year, 49 in 2 years, 30 in 3 years, and 34 in over 3 years. Test if the clients’ ability to return loans changes.

41 41 Hypothesis H 0 : p 1 :p 2 :p 3 :p 4 = 0.8: 0.1: 0.06: 0.04 H 1 : p 1 :p 2 :p 3 :p 4 ≠ 0.8: 0.1: 0.06: 0.04 OR H 0 : Clients’ ability to return loan does not change H 1 : Clients’ ability to return loan changes α = 0.05 Calculate test statistic

42 42 Degree of freedom = 4-1 = 3 The calculated Chi-square is 27.178 > 7.81 falling in critical region Reject H 0 and accept H 1 Clients’ ability to return loan changes at significant level 0.05 TimeOiOi PiPi E i = np i O i – E i (O i - E i ) 2 (O i - E i ) 2 /E i 1 year 2 years 3 years > 3 years 287 49 30 34.80.10.06.04 320 40 24 16 -33 9 6 18 1,089 81 36 324 3.403 2.025 1.500 20.250 Total4001.0040027.178

43 43

44 44 In an exam of a sale training program with 150 participant, the manager expects that the proportion of the results, which is categorized in 3 groups: very good, good, and fair, will be 2:1:2. After the exam, the actual frequency in the 3 groups are 70, 30, and 50 participants respectively. Are the actual and the expected proportions different at significant level 0.05?

45 45 Test of Homogeneity Used to determine whether the proportions of groups in a variable is similar when grouped by another variable Two or more groups in each variable H 0 : p 1 = p 2 = p 3 = … = p n H 1 : p 1 ≠ p 2 ≠ p 3 ≠ … ≠ p n E.g. proportion of occupations between three countries

46 46 Test of Independence Used to determine whether the effects of one variable depend on the value of another variable (2 variables) H 0 : Variable x and variable y are independent of each other (are not related) H 1 : Variable x and variable y are dependent of each other (are related)

47 47 Data is grouped in rows and columns of two-way table Country Occupation Sum ResearcherBusinessProgrammer ThailandO 11 O 12 O 13 R1R1 USAO 21 O 22 O 23 R2R2 AustraliaO 31 O 32 O 33 R3R3 SumC1C1 C2C2 C3C3 N

48 48 r: number of rows c: number of columns O ij : observed frequency of row i column j E ij : expected frequency of row i column j

49 49

50 50 Reject H 0 when the calculated from table Rejection region Acceptance region

51 51 According to a survey of 1200 sample individuals grouped by four occupations, the number of smokers and non-smokers are listed in the table. Test if the proportion in each occupation is different. OccupationNon-smokerSmokerFreq. Engineer32268300 Educator51199250 Accountant67233300 Scientist83267350 Total2339671200

52 52 Hypothesis H 0 : p 1 = p 2 = p 3 = p 4 H 1 : p 1 ≠ p 2 ≠ p 3 ≠ p 4 α = 0.05

53 53 Calculated expected frequencies E 11 = (300*233)/1200 = 58.25 E 12 = (300*967)/1200 = 241.75 E 21 = (250*233)/1200 = 48.54 E 22 = (250*967)/1200 = 201.46 E 31 = (300*233)/1200 = 58.25 E 32 = (300*967)/1200 = 241.75 E 41 = (350*233)/1200 = 67.96 E 42 = (350*967)/1200 = 282.04 Occup.Non- smoker SmokerFreq. Engineer32268300 Educator51199250 Account.67233300 Scientist83267350 Total2339671200

54 54 Calculate test statistic Row- Column O ij E ij O ij – E ij (O ij - E ij ) 2 (O ij - E ij ) 2 /E ij 1-1 1-2 2-1 2-2 3-1 3-2 4-1 4-2 32 268 51 199 67 233 83 267 58.25 241.75 48.54 201.46 58.25 241.75 67.96 282.04 -26.25 26.25 2.46 -2.46 8.75 -8.75 15.04 -15.04 689.0625 6.0516 76.5625 226.2016 11.83 2.85 0.12 0.03 1.31 0.32 3.33 0.80 Total1200 20.59

55 55 Degree of freedom = (r-1)(c-1) = 3*1 = 3 The calculated Chi-square is 20.59 > 7.81 falling in critical region Reject H 0 and accept H 1 The proportion between smokers and non-smokers in each occupation is different at significant level 0.05

56 56 To test if the achievement score of a training program is related to the achievement score of the actual operation at significant level 0.01, 400 employees are sampled. The scores are listed in the table. Operation score Training score Total Below Average Average Above Average Fair 236029112 Good 287960167 Very good 94963121 Total 60188152400

57 57 Hypothesis H 0 : score of the training program and the score of the actual operation are not related H 1 : score of the training program and the score of the actual operation are related α = 0.01

58 58 Calculated test statistic = 20.178 Row- Column O ij E ij =r i c j /NO ij – E ij (O ij - E ij ) 2 (O ij - E ij ) 2 /E ij 1-1 1-2 1-3 2-1 2-2 2-3 3-1 3-2 3-3 23 60 29 28 79 60 9 49 63 112*60/400=16.80 112*188/400=52.64 112*152/400=42.56 167*60/400=25.05 167*188/400=78.49 167*152/400=63.46 121*60/400=18.15 121*188/400=56.87 121*152/400=45.98 6.20 7.36 -13.56 2.95 0.51 -3.46 -9.15 -7.87 17.02 38.44 54.17 183.87 8.70 0.26 11.97 83.72 61.94 289.68 2.288 1.029 4.320 0.347 0.003 0.189 4.613 1.089 6.300 Total40020.178

59 59 Degree of freedom = (r-1)(c-1) = 2*2 = 4 The calculated Chi-square is 20.178 > 13.28 falling in critical region Reject H 0 and accept H 1 The score of the training program and the score of the actual operation are related (or are dependent on each other) at significant level 0.01

60 60

61 61 A factory manager believes that the efficiency of workers depends on how long they have worked in the factory. To test this belief, 100 sample products are inspected. The quality of the sample are listed in table. Test the belief at significant level 0.05. Product QualityEmployee Experience (year)Total 12 – 56 – 10 Good Minor damaged Major damaged 697697 9 19 8 9 23 10 24 51 25 Total223642100

62 62 A toothpaste company wants to know if the color of the toothpaste is related to the gender of buyers. Sample of 500 male and 500 female are randomly selected to examine their favored toothpaste color. Test if the color of the toothpaste is related to the gender at significant level 0.01. Gender Color Total WhiteOther Male Female 328 138 172 362 500 Total4665341000

63 63 Test of Homogeneity and Test of Independence use the same calculation Test of Homogeneity tells if the proportion is the same H 0 : Proportion is similar for all groups H 1 : Proportion not similar for some/all groups Test of Independence tells if two variables are dependent H 0 : Two variables are independent H 1 : Two variables are dependent

64 64 Consider this The proportion of selected major is the same for any gender That means no matter the gender, the proportions remain the same That means gender has no effect of selection of major and therefore the two are independent


Download ppt "Click to edit Master text styles Second level Third level Fourth level Fifth level Test of Categorical Data / Proportion 1."

Similar presentations


Ads by Google