Statistics as a Tool in Scientific Research

Statistics as a Tool in Scientific Research
PY209 Stats Fall 2001, Dr. L. Henkel Statistics as a Tool in Scientific Research Types of Research Questions Descriptive (What does X look like?) Correlational (Is there an association between X and Y? As X increases, what does Y do?) Experimental (Do changes in X cause changes in Y?) Different statistical procedures allow us to answer the different kinds of research questions

PY209 Stats Fall 2001, Dr. L. Henkel Statistics as a Tool in Scientific Research  Start with the science and use statistics as a tool to answer the research question Get your students to formulate a research question first: How often does this happen? Did all plants/people/chemicals act the same? What happens when I add more sunlight, give more praise, pour in more water?

PY209 Stats Fall 2001, Dr. L. Henkel Statistics as a Tool in Scientific Research Helping students to formulate research question: * Laura, Should we try to say something more here

Types of Statistical Procedures
PY209 Stats Fall 2001, Dr. L. Henkel Types of Statistical Procedures Descriptive: Organize and summarize data Inferential: Draw inferences about the relations between variables; use samples to generalize to population

Descriptive Statistics
PY209 Stats Fall 2001, Dr. L. Henkel Descriptive Statistics The first step is ALWAYS getting to know your data  Summarize and visualize your data It is a big mistake to just throw numbers into the computer and look at the output of a statistical test without any idea what those numbers are trying to tell you

PY209 Stats Fall 2001, Dr. L. Henkel Descriptive Statistics Numerical Summaries: Measures of central tendency Measures of variability Frequencies Contingency tables Representing numerical summaries in Tables Graphical Summaries: Bar graphs Histograms Scatterplots Time series

Types of Data: Measurement Scales
PY209 Stats Fall 2001, Dr. L. Henkel Types of Data: Measurement Scales In order to choose the appropriate ways to summarize and visualize your data, you have to know what type of data you have Categorical: male/female, blood type (A, B, AB, O) Numerical: weight, # of white blood cells

PY209 Stats Fall 2001, Dr. L. Henkel Types of Data: Measurement Scales Categorical: Nominal (name/label) Ordinal (rank order) Numerical: Interval (equal intervals) Ratio (equal intervals and absolute zero)

PY209 Stats Fall 2001, Dr. L. Henkel Types of Data: Measurement Scales Nominal: numbers are arbitrary; 1= male, 2 = female Ordinal: numbers have order (I.e., more or less) but you do not know how much more or less; 1st place runner was faster but you do not know how much faster than 2nd place runner Interval: numbers have order and equal intervals so you know how much more or less; A temperature of 102 is 2 points higher than one of 100 Ratio: same as interval but because there is an absolute zero you can talk meaningfully about twice as much and half as much; Weighing 200 pounds is twice as heavy as 100 pounds

PY209 Stats Fall 2001, Dr. L. Henkel Descriptive Statistics Once you know what type of measurement scale the data were measured on, you can choose the most appropriate statistics to summarize them: Measures of central tendency: Most representative score Measures of dispersion: How far spread out scores are

PY209 Stats Fall 2001, Dr. L. Henkel Descriptive Statistics LAURA: Do we need to have them look at distribution first – shape of distribution or we will cover that later? I feel like I keep referencing the normal curve when talking about central tendency and variability Do you have something on excel that visualizes the shape of the distribution? Or maybe this is too much – too in depth. We have slated for issues later to talk about normality and outliers

The Normal Curve Most scores in center, tapering off symmetrically in both tails Amount of peakedness (kurtosis) can vary

Variations to Normal Distribution
Skew = Asymmetrical distribution Positive skew = greater frequency of low scores than high scores (tail is less pronounced on high end) Negative skew = greater frequency of high scores than low scores (tail is less pronounced on low end)

Examples of Different Skews

Variations to Normal Distribution
Bimodal distribution: two peaks Rectangular: all scores (high and low) occur with equal frequency Important to determine normality (skewness & kurtosis) so you can choose appropriate measures of central tendency and variability

MEASURES OF CENTRAL TENDENCY
PY209 Stats Fall 2001, Dr. L. Henkel MEASURES OF CENTRAL TENDENCY Central tendency = Typical or representative value of a group of scores Mean: Average score Median: middlemost score; score at 50th percentile; half the scores are above, half are below Mode: Most frequently occurring score

MEASURES OF CENTRAL TENDENCY
PY209 Stats Fall 2001, Dr. L. Henkel MEASURES OF CENTRAL TENDENCY Measure Definition Takes Every Value Into Account? When to Use Mean M =  X / n Yes Interval or ratio data BUT… Can be heavily influenced by extreme scores (outliers) so can give inaccurate view if distribution is not normal Median Middle value No Ordinal data or for interval/ratio data that are skewed Mode Most frequent data value Nominal data

MEASURES OF VARIABILITY
PY209 Stats Fall 2001, Dr. L. Henkel MEASURES OF VARIABILITY Knowing what the center of a set of scores is is useful but…. How far spread out are all the scores? Were all scores that exact value, or did they have some variability? Range, Standard deviation, Interquartile range

PY209 Stats Fall 2001, Dr. L. Henkel

Variability = extent to which scores in a distribution differ from each other; are spread out

The Range as a Measure of Variability
Difference between lowest score in the set and highest score Ages ranged from years of age There was a 29-year age range The number of calories ranged from 256 to 781

Sample Standard Deviation
Standard deviation = How far on average do the scores deviate around the mean? s = SD = (X-M) 2 n-1 In a normal distribution, this is the midmost 68% of the scores The bigger the SD is, the more spread out the scores are around the mean

Variations of the normal curve (larger SD= wider)

Interquartile Range: Quartile 1: 25th percentile
Quartile 2: 50th percentile (median) Quartile 3: 75th percentile Quartile 4: 100th percentile Interquartile range = Score at 75th percentile – score at 25th percentile So this is the midmost 50% of the scores LAURA: DO WE NEED THIS? IS IT TOO MUCH?

Interquartile range on a positively skewed distribution
IQR is often used for interval or ratio data that are skewed (do not want to consider ALL the scores)

MEASURES OF VARIABILITY
PY209 Stats Fall 2001, Dr. L. Henkel MEASURES OF VARIABILITY Measure Definition Takes Every Value Into Account? When to Use Range Highest score – lowest score No, only based on two most extreme values To give crude measure of spread Standard Deviation Midmost 68% of scores Yes but describes majority For interval or ratio data that are normally distributed Interquartile Range Midmost 50% of scores Yes but describes most When interval/ratio data are skewed

Presenting Measures of Central Tendency and Variability in Text
PY209 Stats Fall 2001, Dr. L. Henkel Presenting Measures of Central Tendency and Variability in Text The number of fruit flies observed each day ranged from 0-57 (M = 25.32, SD = 5.08). Plants exposed to moderate amounts of sunlight were taller (M = 6.75 cm, SD = 1.32) than plants exposed to minimal sunlight (M = 3.45, SD = 0.95).  Sentences should always be grammatical and sensible. Do not just list a bunch of numbers. Use the statistical information to supplement what you are saying

Presenting Measures of Central Tendency and Variability in Tables
PY209 Stats Fall 2001, Dr. L. Henkel Presenting Measures of Central Tendency and Variability in Tables Range M SD Number of Fruit Flies 0-57 25.32 5.08 Weight 160.31 10.97 Speed of Pitch (mph) 25-96 45.37 12.66

Presenting Measures of Central Tendency and Variability in Tables
PY209 Stats Fall 2001, Dr. L. Henkel Presenting Measures of Central Tendency and Variability in Tables Range Median IQR Number of Fruit Flies MAKE UP #S Weight Speed of Pitch (mph)

Describing Your Data: Simple Frequency
Frequency = number of times each score occurs in a set of data

SIMPLE FREQUENCY TABLE
Score f A 6 A- 3 B+ 10 B 13 B- 8 C+ 7 C 5 C- 6 D 3 F 2 Total: N = 63

GROUPED FREQUENCY TABLE
Score f A (90-100) 9 B (80-89) C (70-79) 18 D (60-69) 3 F (0-59) 2 Total: N = 63  Choose class intervals to maximize informativeness with truthfulness

RELATIVE FREQUENCY DISTRIBUTIONS
Considers frequency relative to number of scores Relative frequency = frequency/number of scores in set rel f = f/N

Score f Rel f Total: N = 15 Sum = 1.0 (100%)

Examples of relative frequency distributions

Graphical Representations of Data
STUFF TO COVER AND MAKE SLIDES FOR: Bar graphs Histograms Scatterplots Time series

Simple frequency bar graph for nominal and ordinal data

Histogram showing the simple frequency of parking tickets in a sample

Simple frequency polygon showing the frequency of parking tickets in a sample

Guidelines to Making Graphs
Y axis should be ¾ as tall as X axis When the number of score values on X axis is large, scores should be collapsed so there are at least 5 intervals but no more than 12 The width of each interval on the X axis should be equal Frequency on the Y axis must be continuous and regular Range on the Y axis and X axis must neither unduly compress nor unduly stretch the data LAURA: Lets add to this from different sources; e.g., unit of measure should be labeled on Y axis…

Choosing the Appropriate Type of Graph
Laura: I pilfered this from a book. I think we need to think this out better: One interval/ratio variable (with frequencies): Histogram or frequency polygon One interval independent variable and one interval/ration dependent variable: scatterplot or line graph One nominal independent variable and one interval/ratio dependent variable: bar graph Two or more nominal independent variables and one interval/ration dependent variable: bar graph

Choosing the Appropriate Type of Graph
STUFF TO COVER Issues: Use and misuse of graphs Outliers and normality Dangers of low n Association vs. causation

Statistics as a Tool in Scientific Research

Similar presentations

Presentation on theme: "Statistics as a Tool in Scientific Research"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics as a Tool in Scientific Research

Similar presentations

Presentation on theme: "Statistics as a Tool in Scientific Research"— Presentation transcript:

Similar presentations

About project

Feedback