Inferential Statistics

Inferential Statistics
Descriptive Statistics Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was drawn. Generalizability is important is this type of statistic because it is the ability to use the results of data collected from a sample to reach conclusions about the characteristics of the population. Definition: Statistics used to described the characteristics of a distribution of scores. They apply only to the members of a sample or population from which data have been collected. Generalizability to the population is not the objective of descriptive statistics

Population Definition: The collection of cases that comprise the entire set of cases with the specified characteristics (e.g., “All living adult males in the United States”) Example: In order to find the average salary of Psychology majors who graduated from college in 2004, collect information about the salaries of all the 2004 Psychology graduates and derive an average from that data. Any value generated from or applied to the population is a parameter.

Sample Definition: A collection of cases selected from a larger population Example: In order to find the average salary of Psychology majors who graduated from college in 2004, you select (randomly or non-randomly) some of these graduates and derive a mean from their salaries. Any value derived from the sample, such as the mean, is a statistic.

Sampling Methods RANDOM REPRESENTATIVE CONVENIENCE
Definition: Selecting cases from a population in a manner that ensures each member of the population has an equal chance of being selected into the sample. One of the most useful, but most difficult to use. The major benefit of random sampling is that any differences between the sample and the population from which the sample was selected will not be systematic. REPRESENTATIVE Definition: A method of selecting a sample in which members are purposely selected to create a sample that represents the population on some characteristic(s) of interest (e.g., when a sample is selected to have the same percentages of various ethnic groups as the larger population). This type of sampling can be expensive and time consuming, however it ensures that your sample looks the population on some important variables, therefore increasing the generalizability of the sample. CONVENIENCE Definition: Selecting a sample based on ease of access or availability. This method of selecting a sample is less labor-intensive than selecting a random or representative sample. In order for it to be an acceptable method, it cannot differ from my population of interest in ways that influence the outcome of the study.

Variable Any construct with more than one value that is examined in research. Examples include income, gender, age, height, attitudes about school, score on a measure of depression, etc.

Types of Variables Quantitative (continuous) variable A variable that has assigned values and the values are ordered and meaningful, such that 1 is less than 2, 2 is less than 3, etc. Qualitative (categorical) variable A variable that has discrete categories. If the categories are given numerical values, the values have meaning as nominal references but not as numerical values (e.g., in 1 = “male” and 2 = “female” 1 is not more or less than 2).

Scales of Measurement for Variables
Nominally (or categorical) scaled variable: A variable in which the numerical values assigned to each category are simply labels rather than meaningful numbers. Ordinal variable: Variables measured with numerical values where the numbers are meaningful (e.g., 2 is larger than 1) but the distance between the numbers is not constant. Interval or Ratio variable: Variables measured with numerical values with equal distance, or space, between each number (e.g., 2 is twice as much as 1, 4 is twice as much as 2, the distance between 1 and 2 is the same as the distance between 2 and 3).

Collecting Data Collecting data produces a group of scores on one or more variables To get the distribution of scores you must arrange the scores from lowest to highest Researchers are usually interested in central tendency, a set of distribution characteristics that consist of the mean, median, and mode

The Mean Definition: The arithmetic average of a distribution of scores Provides a single, simple number that gives a rough summary of the distribution The most commonly used statistic in all social science research Useful, but does not tell you anything about how spread out the scores are (i.e., variance) or how many scores in the distribution are close to the mean

The Median Definition: The score in a distribution that marks the 50th percentile. It is the score at which 50% of the distribution falls below and 50% fall above Used when dividing distribution scores into two groups (median split) Useful statistic to examine when the scores in a distribution are skewed or when there are a few extreme scores at the high end or the low end of the distribution

The Mode Definition: The score in the distribution that occurs most frequently Least used of the measures of central tendency; provides the least amount of information

Formula for calculating the mean of a distribution
Add, or sum, all of the scores in a distribution Divide by the number of scores Formula for calculating the mean of a distribution S X n N is the sample mean is the population mean means “the sum of” is an individual score in the distribution is the number of scores in the sample is the number of scores in the population OR Multiply each value by the frequency for which the value occurred Add all of these products Divide by the number of scores

Calculating The Median
Arrange all of the scores in the distribution in order, from smallest to largest Find the middle score in the distribution If there is an odd number of scores... there will be a single score that marks the middle of the distribution If there are an even number of scores in the distribution... the median is the average of the two scores in the middle of the distribution (as long as the scores are arranged in order, from largest to smallest) Finding the average add the two scores in the middle together and divide by two

1—————2—————3—————4—————5 Frequency of Responses
Finding The Mode Example of bimodal distribution On the following scale, please indicate how you feel about capital punishment. Remember, the mode is simply the category in the distribution that has the highest number of scores, or the highest frequency Multimodal: When a distribution of scores has two or more values that have the highest frequency of scores Example - Bimodal distribution: A distribution that has two values that have the highest frequency of scores; often occurs when people respond to controversial questions that tend to polarize the public 1—————2—————3—————4—————5 Strongly Opposed Strongly In Favor Frequency of Responses Category of Responses on the Scale 1 2 3 4 5 Frequency of Responses in Each Category 45

Example: The Mean, Median, and Mode of a Distribution
The following distribution of test scores are given: Mean = = 8 Calculating the mean: Add up all the scores, then divide by the number of scores. In this case, there are 8 IQ scores. Median = = 98 2 Calculating the median: Because there is an even amount of scores, sum the two scores that are found in the middle of the distribution when it is put into numerical order, then divide by two. Mode = 96 Calculating the mode: 96 is the most frequent number that occurs

Skewed Distribution Definition: A distribution of scores has a high number of scores clustered at one end of the distribution with relatively few scores spread out toward the other end of the distribution, forming a tail. When working with a skewed distribution, the mean, median, and mode are usually all at different points rather than at the center of distribution. Similarities between a skewed and normal distribution: The procedures used to calculate a mean, median, and mode are the same Differences between a skewed and normal distribution: The position of the three measures of central tendency in the distribution Left or Negative Right or Positive

Skewness Skewness Ranges
If skewness is less than −1 or greater than +1, the distribution is highly skewed. If skewness is between −1 and −½ or between +½ and +1, the distribution is moderately skewed. If skewness is between −½ and +½, the distribution is approximately symmetric.

If a distribution is symmetric, the next question is about the central peak: is it high and sharp, or short and broad The reference standard is a normal distribution, which has a kurtosis of 3. Often the excess kurtosis is presented: excess kurtosis = kurtosis−3. A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution with kurtosis ≈3 (excess ≈0) is called mesokurtic. A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a normal distribution, its central peak is lower and broader, and its tails are shorter and thinner. A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a normal distribution, its central peak is higher and sharper, and its tails are longer and fatter. Kurtosis kurtosis = 3, excess = 0 kurtosis = 1.8, excess = −1.2 kurtosis = 4.2, excess = 1.2

Measures of Central Tendency vs. Measures of Variability
Measures of central tendency provide useful information, but are limited. Measures of central tendency provide insufficient information on the dispersion of scores in a distribution or, in other words, the variety of the scores in a distribution. 3 measures of dispersion that researchers typically examine: range, variance, and standard deviation. Standard deviation is the most informative and widely used of the three.

Range Definition: The range is the difference between the largest (maximum value) score and the smallest score (minimum value) of a distribution Gives researchers a quick sense of how spread out the scores of a distribution are Not practical; misleading at times Helps see whether all or most of the points on a scale, such as a survey, were covered

Interquartile Range (IQR)
Definition: The difference between the 75th percentile (third quartile) and 25th percentile (first quartile) scores in a distribution IQR contains scores in the two middle quartiles if scores in a distribution were arranged in order numerically

Variance Definition: The sum of the squared deviations divided by the number of cases in the population, or by the number of cases minus one in the sample Provides a statistical average of the amount of dispersion in a distribution of scores Rarely look at variance by itself because it does not use the same scales as the original measure of a variable; although this is true, it is helpful for the calculation of other statistics (i.e., analysis of variance, regression)

Standard Deviation When combined, the mean and standard deviation provide a pretty good picture of what the distribution of scores is like Definition: The average deviation between the individual scores in the distribution and the mean for the distribution To understand standard deviation, consider the meanings of the two words: Standard: typical or average Deviation: refers to the difference between an individual score and the average score for the distribution Useful statistic; provides handy measure of how spread out the scores are in the distribution

Sample Statistics as Estimates of Population Parameters
For the most part, researchers are concerned with what a sample tells us about the population from which the sample was drawn. This is important because most of the statistics, although generated from sample data, are used to make inferences about the population The formulas for calculating the variance and standard deviation of sample data are actually designed to make sample statistics better estimates of the population parameters (i.e., the population variance and standard deviation)

Making Sense of the Formulas for Calculating the Variance
Not interested in the average score of the distribution, rather in the average difference, or deviation, between each score in the distribution and the mean of the distribution First, calculate a deviation score for each individual score in the distribution See next slide for formula

Similarities Between the Variance and Standard Deviation Formulas
Population Estimate Based on Sample Variance  X  N sum a score in the distribution the population mean the number of cases in the population the sample mean the number of cases in the sample Standard Deviation to sum the number of cases in the sample Formulas for calculating the variance and the standard deviation are virtually identical. Square root in standard deviation formula is only difference. Calculating the variance is the same for both sample and population data except the denominator for the sample formula, which is n-1 Formula for calculating the variance is known as deviation score formula

Differences Between the Variance and Standard Deviation Formulas: Why n – 1?
Brief explanation: If population mean is unknown, use the sample mean as an estimate. But sample mean probably will differ from the population mean Whenever using a number other than the actual mean to calculate the variance, a larger variance will be found. This will be true regardless of whether the number used in the formula is smaller or larger than the actual mean Because the sample mean usually differs from the population mean, the variance and standard deviation will probably be smaller than it would have been if used the population mean When using the sample mean to generate an estimate of the population variance or standard deviation, it will actually underestimate the size of the population mean To adjust underestimation: use n – 1 in the denominator in sample formulas Smaller denominators produce larger overall variance and standard deviation statistics, making it a more accurate estimate of the population parameters

Working with a Population Distribution
Researchers usually assume they are working with a sample that represents a larger population How much of a difference between using N and n-1 in the denominator depends on size of sample If sample is large, virtually no difference If sample is small, relatively large difference between the results produced by the population and sample formulas

Why Have Variance? Why not go straight to standard deviation?
We need to calculate the variance before finding the standard deviation. That is because we need to square the deviation scores (so they will not sum to zero). These squared deviations produce the variance. Then we need to take the square root to find the standard deviation. The fundamental piece of the variance formula, which is the sum of the squared deviations, is used in a number of other statistics, most notably analysis of variance (ANOVA)

Students’ responses to the item “I would feel really good if I were the only one who could answer the teacher’s question in class.” Sample Size = 491 Mean = 2.92 Standard Deviation = 1.43 Variance = (1.43)2 = 2.04 Range = 5 – 1 = 4 Range does not provide very much information. The mean of 2.92 not particularly informative because from the mean it is impossible to determine whether: Most students circled a 3 on the scale Roughly equal numbers of students circled each of the five numbers on the response scale Almost half of the students circled 1 whereas the other half circled 5

Drawing Conclusions… 6 5 4 3 2 1 Consider the standard deviation in conjunction with the mean Predicting what the size of the standard deviation will be: If almost all of the students circled a 2 or a 3 on the response scale, expect a fairly small standard deviation If half of the students circled 1 whereas the other half circled 5, expect a large standard deviation (about 2.0) because each score would be about two units away from the mean If the responses are fairly evenly spread out across the five response categories, expect a moderately sized standard deviation (about 1.50) Boxplot for the desire to appear able variable Presented for the same variable that is represented in the previous graph, wanting to demonstrate ability Conclusions: The distribution looks somewhat symmetrical due to the mean of 2.92 being somewhat in the middle From the standard deviation of 1.43, we know that the scores are pretty well spread out across the five response categories

Inferential Statistics

Similar presentations

Presentation on theme: "Inferential Statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inferential Statistics

Similar presentations

Presentation on theme: "Inferential Statistics"— Presentation transcript:

Similar presentations

About project

Feedback