Psych 230 Psychological Measurement and Statistics Pedro Wolf September 2, 2009
Previously on “let’s learn statistics in five weeks” the logic of research – samples, populations, and variables descriptive and inferential statistics – statistics and parameters understanding experiments – experimental and correlational studies – independent and dependent variables characteristics of scores – nominal, ordinal, interval, and ratio scales – continuous and discrete
Which Scale? Does the variable have an intrinsic value? Does the variable have equal values between scores? Does the variable have a real zero point? Nominal YES Ordinal NO YESNO YESNO IntervalRatio
Continuous A continuous scale allows for fractional amounts – it ‘continues’ between the whole-number amount – decimals make sense Examples: – Height – Weight – IQ
Discrete In a discrete scale, only whole-number amounts can be measured – decimals do not make sense – usually, nominal and ordinal scales are discrete – some interval and ratio variables are also discrete number of children in a family Special type of discrete variable: dichotomous – only two amounts or categories – pass/fail; living/dead; male/female
Today…. Why graphical representations of data? Stem and leaf plots. Box plots. Frequency – what is it – how a frequency distribution is created Graphing frequency distributions – bar graphs, histograms, polygons Types of distribution – normal, skewed, bimodal Relative frequency and the normal curve – percentiles, area under the normal curve
“… look at the data” (Robert Bolles, 1998) Raw data is often messy, overwhelming, and un-interpretable. Many data sets can have thousands of measurements and hundreds of variables. Graphical representations of data can make data interpretable Looking at the data can inspire ideas.
What in the world could these data mean? Imagine over 30,000 observations TimeLatLong :23: :04: :04: :05: :06: :06: :06: :09: :09: :10: :10: :11:
After plotting those data By plotting the data and superimposing it on map data, suddenly the previous slide’s data can tell a story Of course not all data can tell such a story People have developed various ways to visualize their data graphically
Stem and Leaf Plots 5 | | | | | 0 10 | 6 1 N = 18 data - 54, 56, 57, 59, 59, 63, 64, 66, 68, 72 … preserves the data in tact. is a way to see the distribution numbers on the left of the line are called the stems and represent the leading edge of each of the numbers numbers on the right of the line are called the leaves and represent the individual numbers indicate their value by completing the stem.
Box Plots Each of the lines in a box plot represents either quartiles or the range of the data. In this particular plot the dots represent outliers.
Frequency distributions - why? Standard method for graphing data – easy way of visualizing group data Introduction to the Normal Distribution – underlies all of the statistical tests we will be studying this semester – understanding the concepts behind statistical testing will make life a lot easier later on
Frequency
Frequency - some definitions Raw scores are the scores we initially measure in a study The number of times a score occurs in a set is the score’s frequency A distribution is the general name for any organized set of data A frequency distribution organizes the scores based on each score’s frequency N is the total number of scores in the data
Understanding Frequency Distributions A frequency distribution table shows the number of times each score occurs in a set of data The symbol for a score’s frequency is simply f N = ∑f
Raw Scores The following is a data set of raw scores. We will use these raw scores to construct a frequency distribution table
Frequency Distribution Table
Frequency Distribution Table - Example Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4
Frequency Distribution Table - Example Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value Frequency 71
Frequency Distribution Table - Example Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value Frequency 71 61
Frequency Distribution Table - Example Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value Frequency
Frequency Distribution Table - Example Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 Value Frequency
Frequency Distribution Table - Example Make a frequency distribution table for the following scores: 5, 7, 4, 5, 6, 5, 4 X f
Learning more about our data What are the values for N and ∑X for the scores below?
Results via Frequency Distribution Table What is N? N = ∑f
Results via Frequency Distribution Table What is ∑X?
Results via Frequency Distribution Table What is ∑X? (17 * 1) = 17 (16 * 0) = 0 (15 * 4) = 60 (14 * 6) = 84 (13 * 4) = 52 (12 * 1) = 12 (11 * 1) = 11 (10 * 1) = 10 __________ Total = 246
Graphing Frequency Distributions
A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis
Graphing Frequency Distributions A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis Why? – Because it’s not easy to make sense of this:
Graphing Frequency Distributions A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis Why? – Because it’s not easy to make sense of this: On a scale of 0-10, how excited are you about this class: __________ 0=absolutely dreading it10=extremely excited/highlight of my semester Data (raw scores)
Graphing Frequency Distributions Xf
Graphing Frequency Distributions A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis The type of measurement scale (nominal, ordinal, interval, or ratio) determines whether we use: – a bar graph – a histogram – a frequency polygon
Graphs - bar graph A frequency bar graph is used for nominal and ordinal data
Graphs - bar graph A frequency bar graph is used for nominal and ordinal data Values on the x-axis
Graphs - bar graph A frequency bar graph is used for nominal and ordinal data Frequencies on the y-axis
Graphs - bar graph A frequency bar graph is used for nominal and ordinal data In a bar graph, bars do not touch
Graphs - histogram A histogram is used for a small range of different interval or ratio scores
Graphs - histogram A histogram is used for a small range of different interval or ratio scores Values on the x-axis
Graphs - histogram A histogram is used for a small range of different interval or ratio scores Frequencies on the y-axis
Graphs - histogram A histogram is used for a small range of different interval or ratio scores In a histogram, adjacent bars touch
Graphs - frequency polygon A frequency polygon is used for a large range of different scores
Graphs - frequency polygon A frequency polygon is used for a large range of different scores In a freq. polygon, there are many scores on the x-axis
Constructing a Frequency Distribution Step 1: make a frequency table Step 2: put values along x-axis (bottom of page) Step 3: put a scale of frequencies along y-axis (left edge of page) Step 4 (bar graphs and histograms) – make a bar for each value Step 4 (frequency polygons) – mark a point above each value with a height for the frequency of that value – connect the points with lines
Graphing - example A researcher observes driving behavior on a road, noting the gender of drivers, type of vehicle driven, and the speed at which they are traveling. Which type of graph should be used for each variable? Gender? nominal: bar graph Vehicle Type? nominal: bar graph Speed? ratio: frequency polygon
Use and Misuse of Graphs -2
Use and Misuse of Graphs Which graph is correct? Neither does a very good job at summarizing the data Beware of graphing tricks
Types of Distributions
Distributions Frequency tables, bar-graphs, histograms and frequency polygons describe frequency distributions
Distributions - Why? Describing the shape of this frequency distribution is important for both descriptive and inferential statistics The benefit of descriptive statistics is being able to understand a set of data without examining every score
Distributions : The Normal Curve It turns out that many, many variables have a distribution that looks the same. This has been called the ‘normal distribution’. A bell-shaped curve Symmetrical Extreme scores have a low frequency – extreme scores: scores that are relatively far above or far below the middle score
The Ideal Normal Curve
Symmetrical
The Ideal Normal Curve Most scores in middle range
The Ideal Normal Curve Few extreme scores
The Ideal Normal Curve In theory, tails never reach the x-axis
Normal Curve - height
Normal Curve - hours slept
Normal Curve - GPA
Normal Distributions While the scores in the population may approximate a normal distribution, it is not necessarily so for a sample of scores
Skewed Distributions A skewed distribution is not symmetrical. It has only one pronounced tail A distribution may be either negatively skewed or positively skewed Negative or positive depends on whether the tail slopes towards or away from zero – the side with the longer tail describes the distribution Tail on negative side : negatively skewed Tail on positive side : positively skewed
Negatively Skewed Distributions
Tail on negative side: Negatively skewed
Negatively Skewed Distributions Contains extreme low scores
Negatively Skewed Distributions Does not contain extreme high scores
Negatively Skewed Distributions Can occur due to a “ceiling effect”
Positively Skewed Distributions
Tail on positive side: Positively skewed
Positively Skewed Distributions Contains extreme high scores
Positively Skewed Distributions Does not contain extreme low scores
Positively Skewed Distributions Can occur due to a “floor effect”
Positively Skewed Distributions
Bimodal Distributions a symmetrical distribution containing two distinct humps
Bimodal - birth month
Distributions - data How many alcoholic drinks do you have per week?
Distributions - data How many alcoholic drinks do you have per week?
Distributions - data How many alcoholic drinks do you have per week? Positively skewed
Distributions - data How much did you spend on textbooks for this semester?
Distributions - data How much did you spend on textbooks for this semester?
Distributions - data How much did you spend on textbooks for this semester? Normal – one outlier
Kurtosis meso- Forming chiefly scientific terms with the sense ‘middle, intermediate’ lepto- Small, fine, thin, delicate platy- Forming nouns and adjectives, particularly in biology and anatomy, with the sense ‘broad, flat ’
Relative Frequency and the Normal Curve
Relative Frequency Another way to organize scores is by relative frequency Relative frequency is the proportion of time that a particular score occurs – remember: a proportion is a number between 0 and 1 Simple frequency: the number of times a score occurs Relative frequency: the proportion of times a score occurs
Relative Frequency - Why? We are still asking how often certain scores occurred Sometimes, relative frequency is easier to interpret than simple frequency Example: 82 people in the class reported drinking no alcohol weekly – Simple frequency 0.42 of the class (42%) reported drinking no alcohol – Relative frequency
Relative Frequency The formula for a score’s relative frequency is: relative frequency =
Relative Frequency Distribution
Example Using the following data set, find the relative frequency of the score
Example The frequency table for this set of data is:
Example The frequency for the score of 12 is 1, and N = 18 Therefore, the relative frequency of 12 is:
Example The frequency for the score of 12 is 1, and N = 18 Therefore, the relative frequency of 12 is:
Relative Frequencies We can also add relative frequencies together. – For example, what proportion of people scored a passing mark in this exam (>3): Value FrequencyRelative Frequency 65 5/18 = /18 = /18 = /18 = /18 = /18 = 0.06 N=18 Total=1.00
Relative Frequencies We can also add relative frequencies together. – For example, what proportion of people scored a passing mark in this exam (>3): =0.78 Value FrequencyRelative Frequency 65 5/18 = /18 = /18 = /18 = /18 = /18 = 0.06 N=18 Total=1.00
Relative Frequency and the Normal Curve When the data are normally distributed (as most data are), we can use the normal curve directly to determine relative frequency. There is a known proportion of scores above or below any point For example, exactly 0.50 of the scores lie above the mean
Relative Frequency and the Normal Curve The proportion of the total area under the normal curve at certain scores corresponds to the relative frequency of those scores.
Relative Frequency and the Normal Curve Normal distribution showing the area under the curve to the left of selected scores
Percentiles A percentile is the percent of all scores in the data that are at or below a score – Example: 98th percentile - 98% of the scores lie below this.
Homework Complete exercises 1, 6, and 9 for chapter 3. Read chapter 4 and 5 for next week.