An Integrated Approach to Teaching with Real Data Joint Mathematics Meetings, January 2005 MAA Contributed Paper Session Using Real-World Data to Illustrate Statistical Concepts Sarah Knapp Abramowitz, Drew University Sharon Lawner Weinberg, New York University

The National Education Longitudinal Study of 1988 based on a survey conducted by the National Center of Education Statistics (NCES) of a nationally representative sample of eighth graders Initiated in 1988, additional waves in 1990, 1992, and 1994 The goal of the study was to measure achievement outcomes in four core subject areas (English, history, mathematics, and science), and personal, familial, social, institutional, and cultural factors that might relate to these outcomes

Our NELS Sub-sample of 500 cases and 48 variables Sampled randomly from the approximately 5,000 students who responded to all four administrations of the survey and who pursued some form of post-secondary education

Beneficial Properties of NELS Contains a variety of variables Can be used throughout the course because it can be analyzed by multiple methods Is appropriately analyzed using a computer statistics package, modeling practical data analytic skills. Demonstrates some of the subtleties in selecting the appropriate statistical technique for a given research question Contains real values, many which are intuitive, so that interpretation is emphasized and students gain number sense

Selected Variables in the NELS Naturally numeric: FAMSIZE, the number of members in the student’s household Instrument based composites: SLFCNC08, eighth grade self- concept, and SES, socio-economic status Coded categories: GENDER, HOMELANG, the home language background of the student with 1 representing non-English only, 2 representing non-English dominant, 3 representing English dominant, and 4 representing English only, and CUTS12 that represents the number of times the student skipped or cut classes in twelfth grade on an ordinal scale with 0 representing never, 1 representing one to two times, 2 representing three to six times, etc Likert-type variables: TCHERINT, which measures the level of agreement with the statement “my teachers are interested in students” on a four-point scale.

Variety of Distributions Scale Variables Approximately symmetric: SES and achievement variables like ACHMAT12 Negatively skewed: SLFCNC08 and SCHATTRT Positively skewed: EXPINC30, the estimate the student makes in eighth grade for his or her income at age 30 and APOFFER, the number of advanced placement courses offered by the school the student attends

Variety of Distributions Categorical Variables Fairly evenly distributed between categories: GENDER Unevenly distributed: HOMELANG (81% speak only English at home) and CIGARETT, whether or not the student had ever smoked a cigarette by eighth grade (85% indicated that they had not).

Examples Using NELS in Paper Graphical displays of a single variable Measures of central tendency Describing relationships between variables Independent samples t-test

Describing relationships between variables Exemplify a variety of magnitudes for the Pearson correlation Exemplify relationships between variables with a variety of levels of measurement

Pearson Correlations of different directions and magnitudes Between ACHMAT12 and TCHERINT, r = -.18. For TCHERINT, a low score indicates greater perceived teacher interest. Between ACHMAT12 and ACHRDG12, r =.64. Between ACHMAT12 and FAMSIZE, r =.02.

Other cases of Pearson Point-biserial: Between ACHMAT12 and NURSERY, r =.13 Phi-coefficient: Between NURSERY and COMPUTER, r =.20

Other types of relationships Dichotomous variables and those that are nominal or ordinal with fewer than five categories. Method: contingency table Ordinal variables and those that are dichotomous, ordinal, interval, or ratio. Method: Spearman correlation Nominal or ordinal with fewer than five categories variables and those that are interval or ratio. Method: Measures of central tendency

Examples of other types of relationships Between REGION and NURSERY Method: Contingency table: Conclusion: Approximately 34 percent of the children who had not attended nursery school owned a computer in eighth grade, whereas approximately 56 percent of those who had attended nursery school owned a computer in eighth grade

Examples of other types of relationships Between HWKIN12 and HWKOUT12 Method: Spearman correlation, rho =.38 Conclusion: Students who spend more time in school on homework tend to do so outside of school too.

Examples of other types of relationships Between ACHMAT12 and REGION Method: Measures of central tendency. Because the distribution of twelfth grade math achievement is skewed for the Northeast and the North Central, we compare medians. Conclusion: We see that among students in the NELS data set, the highest typical achievement is found in the West (median = 59.03), followed by the Northeast (median = 58.74), the North Central (median = 56.50), and then the South (median = 55.29).

Benefits of the Approach Correlation magnitudes are typical Can easily study the effects of transformations such as translation and reflection Emphasizes choosing an appropriate statistical technique and the importance of the level of measurement and the shape of the distribution of the variable Demonstrates that several analytical approaches may be possible

Obtaining the NELS data set The following website contains a copy of the paper, this Powerpoint presentation, and the NELS data set formatted for SPSS. http://www.users.drew.edu/sabramow/ http://www.users.drew.edu/sabramow/ Send an e-mail request to sabramow@drew.edusabramow@drew.edu Make your own version of the NELS through the NELS88 page of the National Center for Education Statistics website, http://nces.ed.gov/surveys/nels88/ http://nces.ed.gov/surveys/nels88/

