Presentation on theme: "Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest."— Presentation transcript:
2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest way to learn about the world
3 Data in the Sciences are messy At first glance, data often look like an incoherent jumble of numbers How do we make sense of data? Statistical procedures are tools for learning about the world by Learning from Data.
4 Real Data! To help you understand the power and usefulness of statistics, we will explore two real and interesting data sets “The Smoking Study” “The Maternity Study”
5 The Smoking Study From the University of Wisconsin Center for Tobacco Research and Intervention 608 participants provided data on smoking, addiction, withdrawal, and how best to quit smoking The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book
6 The Maternity Study From Wisconsin Maternity Leave and Health Project 244 families provided data on marital satisfaction, child-rearing styles, and other household events The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book
7 Variability Why are data messy? Consider a concrete example: Depression scores (“CESD”) for participants in the Smoking Study Some participants (each has a different ID number) have CESD scores of 0, while others have scores of 2, 11 or 7, or some other value These data are messy in that the scores are different from one another Variability is the statistical term for the degree to which scores (such as the depression scores) differ from one another. Variability is the statistical term for the degree to which scores (such as the depression scores) differ from one another.
8 Sources of Variability It is easy to see that depression scores are variable, by why? –Individual differences Some people are more depressed than others Some people have difficulty reading the and understanding the questions on the test Some people answer the questions more honestly than others –Procedure Differences in the ways the data were collected –Conditions or Treatments The conditions that are imposed on the participants of the study
9 Populations and Samples Statistical Population – a collection or set of measurements of a variable that share some common characteristic Statistical Population – a collection or set of measurements of a variable that share some common characteristic Sample – a subset of measurements from a population Sample – a subset of measurements from a population Random sample – a sample selected such that every score in the population has an equal chance of being included Random sample – a sample selected such that every score in the population has an equal chance of being included
Chapter 2 Frequency Distributions and Percentiles
Variability (revisited) Collecting Data means measuring a variable Those measurements differ (vary) from one another One way to organize and summarize a set of measurements is to construct a frequency distribution These methods can be applied to both populations and samples
Example 513172019352128322 261330 324027144 27332845292538353339 54202425271625389 36201811122322273249 22300324239292223 YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
Example 03444559910 1113 14 1617181920 21 22 23 24 25 2627 28 29 30 32 33 35 3638 39404549 YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
A Better Summary? Class Interval Frequency Relative Frequency Cumulative Frequency Cumulative Proportion 0 - 45.0835 5 - 94.0679.150 10 - 145.08314.233 15 - 194.06718.300 20 - 2412.20030.500 25 - 2912.20042.700 30 - 349.15051.850 35 - 396.10057.950 40 - 441.01758.967 45 – 492.033601.00 Total (n)601.000 YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
Percentiles We have been focusing on distributions rather than individual scores Sometimes, individual scores are of great importance Computing Percentiles, when n=608 The 50-th percentile is the “middle” score. It is the 304-th sorted score. The 32-th percentile is the 608*0.32=194.56, i.e., the 195-th sorted score.
Percentile Rank The percentile rank of a score is the percent (the proportion times 100) of the measurements in the distribution below that score value Computing percentile rank for YRSMK: Sort the variable, called YRSMK_sorted The percentile rank of 9 is 50/608 = 0.082, so it is the 8-th percentile The percentile rank of 21 is 246/608 = 0.4046053, so it is the 40-th percentile
Graphing Distributions Graphing distributions is a very valuable tool for highlighting features of the data –Shape –Range –Central Tendency –Variability
Shape We classify the shape of distributions in three ways: –Symmetry – is one half a mirror image of the other half? –Skew – are there high/low frequencies of low/high scores? –Modality – how many humps or modes?
Symmetry Is one half of the distribution a mirror image of the other (along a vertical axis)? Three examples of symmetrical distributions:
Skew Positive – high frequencies of low values and low frequencies of high values Negative – low frequencies of low values and high frequencies of high values
Modality How many humps (or modes)? UnimodalBimodal
Central Tendency and Variability In addition to shape, distributions differ in terms of: –Central Tendency - scores near the center of the distributions; where the scores “tend” to be –Variability – the degree to which scores differ from one another; the “spread” of the scores
Comparing Distributions It is very useful to be able to compare and contrast (name similarities and differences) of distributions Distributions can differ in terms of shapes, central tendencies, and variability
Comparing Distributions How do these distributions differ?