Download presentation

Presentation is loading. Please wait.

Published byChristopher Teagle Modified over 2 years ago

1
Chapter 1 Why Statistics?

2
2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest way to learn about the world

3
3 Data in the Sciences are messy At first glance, data often look like an incoherent jumble of numbers How do we make sense of data? Statistical procedures are tools for learning about the world by Learning from Data.

4
4 Real Data! To help you understand the power and usefulness of statistics, we will explore two real and interesting data sets “The Smoking Study” “The Maternity Study”

5
5 The Smoking Study From the University of Wisconsin Center for Tobacco Research and Intervention 608 participants provided data on smoking, addiction, withdrawal, and how best to quit smoking The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book

6
6 The Maternity Study From Wisconsin Maternity Leave and Health Project 244 families provided data on marital satisfaction, child-rearing styles, and other household events The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book

7
7 Variability Why are data messy? Consider a concrete example: Depression scores (“CESD”) for participants in the Smoking Study Some participants (each has a different ID number) have CESD scores of 0, while others have scores of 2, 11 or 7, or some other value These data are messy in that the scores are different from one another Variability is the statistical term for the degree to which scores (such as the depression scores) differ from one another. Variability is the statistical term for the degree to which scores (such as the depression scores) differ from one another.

8
8 Sources of Variability It is easy to see that depression scores are variable, by why? –Individual differences Some people are more depressed than others Some people have difficulty reading the and understanding the questions on the test Some people answer the questions more honestly than others –Procedure Differences in the ways the data were collected –Conditions or Treatments The conditions that are imposed on the participants of the study

9
9 Populations and Samples Statistical Population – a collection or set of measurements of a variable that share some common characteristic Statistical Population – a collection or set of measurements of a variable that share some common characteristic Sample – a subset of measurements from a population Sample – a subset of measurements from a population Random sample – a sample selected such that every score in the population has an equal chance of being included Random sample – a sample selected such that every score in the population has an equal chance of being included

10
Chapter 2 Frequency Distributions and Percentiles

11
Variability (revisited) Collecting Data means measuring a variable Those measurements differ (vary) from one another One way to organize and summarize a set of measurements is to construct a frequency distribution These methods can be applied to both populations and samples

12
Example YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

13
Example YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

14
A Better Summary? Class Interval Frequency Relative Frequency Cumulative Frequency Cumulative Proportion – Total (n) YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

15
Graphing Distributions

16
Percentiles We have been focusing on distributions rather than individual scores Sometimes, individual scores are of great importance Computing Percentiles, when n=608 The 50-th percentile is the “middle” score. It is the 304-th sorted score. The 32-th percentile is the 608*0.32=194.56, i.e., the 195-th sorted score.

17
Percentile Rank The percentile rank of a score is the percent (the proportion times 100) of the measurements in the distribution below that score value Computing percentile rank for YRSMK: Sort the variable, called YRSMK_sorted The percentile rank of 9 is 50/608 = 0.082, so it is the 8-th percentile The percentile rank of 21 is 246/608 = , so it is the 40-th percentile

18
Graphing Distributions Graphing distributions is a very valuable tool for highlighting features of the data –Shape –Range –Central Tendency –Variability

19
Shape We classify the shape of distributions in three ways: –Symmetry – is one half a mirror image of the other half? –Skew – are there high/low frequencies of low/high scores? –Modality – how many humps or modes?

20
Symmetry Is one half of the distribution a mirror image of the other (along a vertical axis)? Three examples of symmetrical distributions:

21
Skew Positive – high frequencies of low values and low frequencies of high values Negative – low frequencies of low values and high frequencies of high values

22
Modality How many humps (or modes)? UnimodalBimodal

23
Characterizing Shape Asymmetric Negatively Skewed Bimodal Asymmetric Positively Skewed Unimodal

24
Central Tendency and Variability In addition to shape, distributions differ in terms of: –Central Tendency - scores near the center of the distributions; where the scores “tend” to be –Variability – the degree to which scores differ from one another; the “spread” of the scores

25
Comparing Distributions It is very useful to be able to compare and contrast (name similarities and differences) of distributions Distributions can differ in terms of shapes, central tendencies, and variability

26
Comparing Distributions How do these distributions differ?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google