Presentation is loading. Please wait.

Presentation is loading. Please wait.

A random sample of 300 doctoral degree

Similar presentations


Presentation on theme: "A random sample of 300 doctoral degree"— Presentation transcript:

1 A random sample of 300 doctoral degree
In recent years, a national effort has been made to enable more members of minority groups to have increased educational opportunities. You want to know if the policy of “affirmative action” and similar initiatives have ad any effect in this regard. You obtain information on the ethnicity distribution of holders of the highest academic degree, the doctor of philosophy degree for 1981: A random sample of 300 doctoral degree recipients in 2004 showed the following freq. distribution: Perform a goodness of test fit to determine if the distribution of dr. degrees in ’04 is significantly different from the distribution in In which categories have the greatest changes occurred, and in what direction? Race/Ethnicity Percent White, Non-Hispanic 78.9 Black, Non-Hispanic 3.9 Hispanic 1.4 Asian/Pac. Islander 2.7 American Indian/Alaskan Native 0.4 Nonresident Alien 12.8 Race/Ethnicity Count White, Non-Hispanic 189 Black, Non-Hispanic 10 Hispanic 6 Asian/Pac. Islander 14 American Indian/Alaskan Native 1 Nonresident Alien 80

2 Inference for Two-Way Tables
For comparing 2 groups from a two-way table = CATEGORICAL! Recall: A 2-way table describes relationships between 2 categorical variables.

3 2 types of Chi-Squared Tests
Test of Homogeneity of populations (14.1) Ho: Distributions of response variable = in all populations Ha: Distributions are not equal 2) Independence/Association Test (14.2) Ho: There is not an association between two categorical variables (i.e., categorical variables are independent) Ha: There is an association between two categorical variables (i.e., categorical variables are not independent) (Single sample from a single population; individuals classified according to two categorical variables)

4 Statistical Methods for many comparisons (2 parts)
An overall test to see if there is good evidence of any differences among the parameters that we want to compare. A detailed follow-up analysis to decide which of the parameters differ and to estimate how large the differences are.

5 Example Chronic users of cocaine need the drug to feel pleasure. A 3-year study compared an antidepressant called desipramine with lithium and a placebo. The subjects were 72 chronic users of cocaine who wanted to break their drug habit. 24 of the subjects were randomly assigned to each treatment. Above are the counts and proportions who avoided relapse. Is this data good evidence that the proportion of successes for the 3 treatments differ in the population of all cocaine users?

6 “Fix” the table (if it needs it!)
Arrange the data in a 2-way table with complete categories to show relationship between 2 categorical variables (relapse: yes/no; drug: desipramine, lithium, placebo) Calculate marginal totals Describe/look at data prior to jumping into the hypothesis test (preliminary analysis); comment on any relationship you see R x c table: R = rows, c = columns Each of the 6 counts occupies a cell of the table.

7 Hypothesis (Part I: Test)
Ho: There is no association between treatment method and rate of relapse (i.e., treatment method and rate of relapse are independent). Ha: There is an association between treatment method and rate of relapse (i.e., treatment method and rate of relapse are dependent).

8 Expected Counts The expected count in any cell of a two-way table when Ho is true is: expected count = row total X column total table total If we have n independent tries and the probability of a success on each try is p, we expect np successes. Find the expected count for the cell in row 1 (despiramine) an column 2 (relapse).

9 Table Observations Because 2/3 of all subjects relapsed, we expect 2/3 of the 24 subjects in each group to relapse (if there are no differences among the treatments). Desipramine has fewer relapses (10) and more successes (14) than expected Placebo has fewer successes (4) and more relapses (20).  Overall: Despiramine does the best, followed by lithium, and then placebo.

10 The chi-square statistic
Measure of how far the observed counts in a two-way table are from the expected counts. A sum of terms, one for each cell in the table (in our example, we have 6). Large chi-squared values: evidence against Ho because they say that the observed counts are far from what we would expect if Ho were true. Always 0 or positive Only 0 when the observed counts are exactly equal to the expected counts.

11 Degrees of Freedom/P-Value
R = rows, C = columns, df = (r-1)(c-1) P-value (using the table) Area to the right of the chi-square under the chi-square density curve. Probability that chi-square would take a value as large as 10.5 if Ho were really true.

12 Chi-Square with Technology (note on p-value)

13 TI-83 Matrix (2nd x-1)/Edit/1:A  3 x 2 Matrix Entries: 14 10 6 18
6 18 4 20 Stat/Tests/C:Chi-Squared Test (Calc. then Draw) For Expected Counts, look in Matrix B (Matrix/2 (B), Enter) Verify that this agrees with Minitab results.

14 Part II: Follow up Analysis: Comment on observed and expected counts
The study found significant differences among the proportion of successes for three treatments for cocaine addiction. Examine Sample proportions… The major differences between the proportions is that Desipramine had a higher success rate than Lithium or Placebo Comparing observed and expected counts in Minitab:  Desipramine has more successes, fewer failures that we would expect if all 3 treatments had the same success rate in population.  Lithium and placebo had fewer successes and more failures than expected. Largest component contributing to the overall chi-square: Desipramine (4.50) had more success than expected.

15 The chi-square test of association/independence
The cocaine study is an experiment that assigned 24 addicts to each of 3 groups Each group was a sample from a separate population corresponding to a separate treatment The study design fixed the size of each sample (24) in advance, and the data recorded which of two outcomes occurred for each subject (relapse, no relapse); Ho: “no difference” took the form of “equal proportions of successes” in the 2 populations (but we can have different settings…)

16 Is high blood pressure dangerous
Is high blood pressure dangerous? Medical researchers classified each of a group of men as “high” or “low” blood pressure, then watched them for 5 years. (Men with systolic blood pressure 140 mm Hg or higher were “high”; the others, “low.”) The following two-way table gives the results of the study. 1. Is high blood pressure associated with a higher death rate? Calculate and compare percents to answer this question. Make an appropriate graph to display the association. 2. Write null and alternative hypotheses for a chi-square analysis of these data. 3. State and verify the conditions for performing the inference procedure. 4. The chi-square statistic is Complete the chi-square analysis of the blood pressure data and write your conclusion(s). Died Survived Low bp 21 2655 High bp 55 3283

17 A study of the career plans of young women and men sent questionnaires to all 722 members of the senior class in the College of Business Administration at the University of Illinois. One question asked which major within the business program the student had chosen. Here are the data from the students who responded: This is an example of a single sample classified according to two categorical variables (gender and major). 1. Describe the differences between the distributions of majors for women and men with percents, with a bar graph, and in words. 2. Verify that the expected cell counts satisfy the requirement for use of chi-square. Test the null hypothesis that there is no relationship between the gender of students and their choice of major. Give a P-value.  3. Which two cells have the largest terms of the chi-square statistic? How do the observed and expected counts differ in these cells? (This should strengthen your conclusions in #3.)What % of the students did not respond to the questionnaire? Why does this nonresponse weaken conclusions drawn from these data?  Female Male Accounting 68 56 Administration 91 40 Economics 5 6 Finance 61 59

18 The Chi-Square Test of Association/Independence
Use the chi-square test of association/independence to test Ho: There is no relationship between two categorical variables when you have a 2-way table from a single SRS, with each individual classified according to both of two categorical variables.

19 Smoking and SES In a study of heart disease in male federal employees, researchers classified 356 volunteer subjects according to their SES (socioeconomic status) and their smoking habits. There were 3 categories of SES: high, middle, and low. Individuals were asked whether they were current smokers, former smokers, or had never smoked, producing three categories for smoking habits as well. We want to see if there is an association between SES and smoking habits.

20 C. S. Test of Association/Indep
C.S. Test of Association/Indep.! Here is a 2 way table that summarizes the data: Does not compare several populations; it has classified observations from a single population in two ways (by smoking habits and SES). Both of these variables have 3 levels, so Ho:

21 Comparing the distribution of smoking habits among male federal employees having a specific SES


Download ppt "A random sample of 300 doctoral degree"

Similar presentations


Ads by Google