 # Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.

## Presentation on theme: "Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take."— Presentation transcript:

Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take on different values for different individuals. Examples: age, height, gender, favorite class, speed, moisture, etc.

Types of Variables Quantitative: numerical values, can be added, subtracted, averaged, etc. ________: takes on values which are spaced. That is, for two values of a discrete variable that are adjacent, there is no value that goes between them. ________: values are all numbers in a given interval. That is, for two values of a continuous variable that are adjacent, there is another value that can go between the two. Categorical: An individual is placed into one of several groups or categories. These groups or categories are not usually numerical.

Types of Variables Examples: Numeric
Variable Discrete Continuous Categorical Length Hours Enrolled Major Zip Code

Distribution of a Variable
The distribution of a variable tells us the possible values for the variable and the probability that the variable takes these values. Two ways to describe a distribution Numerically Graphically

Categorical Variables
Suppose we poll 46 people on an issue. How can we exhibit their response? Numerically: Counts Proportions Percentages Graphically: Frequency Tables Bar Charts Pie Charts

Categorical Variables
Suppose we poll 46 people on an issue. How can we exhibit their response? Frequency Tables: counts (14 agree) proportions (14/46 = .304 agree) percents (30.4% agree)

Categorical Variables
Suppose we poll 46 people on an issue. How can we exhibit their response? Bar Chart: can have counts, percents or proportions on vertical axis

Categorical Variables
Suppose we poll 46 people on an issue. How can we exhibit their response? Pie Chart:

Examining a Distribution
To describe a distribution we need 3 items: Shape: modes, symmetric, skewed Center: mean, median Spread: range, standard deviation, IQR Look for the overall pattern and for striking deviations Outlier-individual value that falls outside the overall pattern

Numeric Variable Distributions
Shape: Modes: Major peaks in the distribution Symmetric: The values smaller and larger than the midpoint are mirror images of each other Skewed to the right: Right tail is much longer than the left tail Skewed to the left: Left tail is much longer than the right tail Center: Mean: The arithmetic average. Add up the numbers and divide by the number of observations. Median: List the data from smallest to largest. If there are an odd number of data values, the median is the middle one in the list. If there are an even number of data values, average the middle two in the list

Numeric Variable Distributions
Spread: Range: The difference in the largest and smallest value. (Max – Min) Standard Deviation: Measures spread by looking at how far observations are from their mean. The computational formula for the standard deviation is Interquartile Range (IQR): Distance between the first quartile (Q1) and the third quartile (Q3). IQR = Q3 – Q1 Q1 – 25% of the observations are less than Q1 and 75% are greater than Q1. Q3 – 75% of the observations are less than Q3 and 25% are greater than Q3.

Numeric Variable Distributions
Example 1.5 on page 11 of the book shows how much 50 consecutive shoppers spent in a store. The data appear as follows: \$3.11 \$18.30 \$24.50 \$36.30 \$50.30 \$8.88 \$18.40 \$25.10 \$38.60 \$52.70 \$9.26 \$19.20 \$26.20 \$39.10 \$54.80 \$10.80 \$19.50 \$41.00 \$59.00 \$12.60 \$27.60 \$42.90 \$61.20 \$13.70 \$20.10 \$28.00 \$44.00 \$70.30 \$15.20 \$20.50 \$44.60 \$82.70 \$15.60 \$22.20 \$28.30 \$45.40 \$85.70 \$17.00 \$23.00 \$32.00 \$46.60 \$86.30 \$17.30 \$24.40 \$34.90 \$48.60 \$93.30

Numerical Variables How can we describe the distribution of these 50 numbers? Numerically Center: Mean or Median Spread: Quartiles, Range, IQR, or Standard deviation Graphically Frequency Table Histogram Boxplot Stem and Leaf Normal Quantile Plot

Descriptive statistics
The descriptives box from SPSS gives the mean, median, variance, standard deviation, minimum, maximum, range, and IQR.

Percentiles 50th percentile is also called the median – the middle data value if ordered smallest to largest 25th and 75th percentiles are also called the quartiles: Q1 and Q3 respectively – the middle data value of each half

Frequency Table Notice the amount spent is broken into
categories or groups Recall, frequency tables can be used for categorical variables as well Category Count or Frequency Percent 0 - 10 3 6.00% 12 24.00% 13 26.00% 5 10.00% 7 14.00% 4 8.00% 1 2.00%

Histogram Breaks the range of values of a variable into intervals
(midpoint is displayed here) Displays only the count or percent of the observations that fall into each interval

Box Plot Minimum, Q1, Median, Q3, and Maximum These five numbers
are called the ____________________ What are these points?

Stem and Leaf Plot Works best for smaller data sets
Example 1.4 on pg 10 Here are the numbers of homeruns that Babe Ruth hit in each of his 15 years with the New York Yankees from : 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22

Normal Quantile Plot Normal Quantile Plot (This compares the distribution of the sample to the Normal Distribution): the straight line is normal, compare dots to the line If dots fall close to the normal line then the data comes from a normal distribution.

Describing Numeric Variable Distributions
Now, we examine the appearance of other data: Modes are major peaks in the distribution The histogram below The histogram below has one has two modes-bimodal mode-unimodal

Describing Numeric Variable Distributions
Now, we examine the appearance of other data: This example is called right This is an example of a boxplot skewed since the distribution has that is skewed to the _______. a long right tail.

Describing Numeric Variable Distributions
________: observations that are unusually far from the bulk of the data. What are some possible explanations for outliers? The data point was recorded wrong. The data point wasn’t actually a member of the population we were trying to sample. We just happened to get an extreme value in our sample. The 1.5 x IQR Criterion for Outliers: Designate an observation a suspected outlier if it falls more than 1.5 x IQR below the first quartile or above the third quartile.

1.5*IQR Criterion Example
Suppose you had the following data set: -2, 15, 3, 7, 10, 21, 1, 5, 12, 8, 1, 35, 10 List data from smallest to largest: Find Q1, Median, Q3, Min, and Max: IQR = Q3 – Q1 = ______ 1.5*IQR = _______ Q1 – 1.5*IQR = ________If less than this number, then outlier Q *IQR = ________If more than this number, then outlier Are there any outliers in this data set?

Describing Numeric Variable Distributions
Symmetry versus Skewness: __________ _________ ___________

Mean versus Median: Symmetric Left Skewed Right Skewed
For a skewed distribution, the mean is farther out in the longer tail than is the median. mean<median mean=median mean>median To describe distributions use: Median and IQR Mean and standard deviation Median and IQR Symmetric Left Skewed Right Skewed

Strategy for Exploring Data on a Single Quantitative Variable
Always plot your data: make a graph usually a stem and leaf or histogram Look for overall pattern and for outliers Calculate an appropriate numerical summary to briefly describe center and spread Sometimes the overall pattern of a large number of observations is so regular that it can be described by a smooth curve

Introducing the Normal Distribution
It is customary to describe a normal distribution in the following way: Properties of the Normal Distribution: Symmetric, bell-shaped Mean, μ and standard deviation, σ Area under the curve is 1 s m

The Normal Distribution
Normal distributions can take on many different means and standard deviations. Only the general bell shape must remain the same. Here are some examples of normal distributions: m = -2 m = 0 m = 3 s = 0.5 s = 1 s = 2 3 -2

Distribution Properties
Introducing: The Standard Normal Distribution Properties: 1. _________________ 2. _________________ 3. _________________

Distribution Properties
Empirical Rule (The Rule): If the distribution is normal, then Approximately 68% of the data falls within one standard deviation of the mean Approximately 95% of the data falls within two standard deviations of the mean Approximately 99.7% of the data falls within three standard deviations of the mean

Distribution Properties Empirical Rule

Percentiles of a Standard Normal Curve

Empirical Rule Example
If the grades on an exam are normally distributed with a mean of 68 and a variance of 16, what grade do you have to make to be in the top 15% of the class?

Distribution Properties
Shift Changes: adding or subtracting a number from the each of the values. mean mean + c mean - c

Distribution Properties
The mean, median, Q1, Q3, minimum, and maximum all shift when there is a shift change. The shift change, say c, is added or subtracted to each of the statistics accordingly. The measures of spread (standard deviation, variance, IQR, and range) do not change when there is a shift change.

Distribution Properties
Scale Changes: multiplying or dividing each of the values by a number. mean

Distribution Properties
Scale Changes: multiplying or dividing each of the values by a number. mean*c

Distribution Properties
Scale Changes: multiplying or dividing each of the values by a number. mean/c

Distribution Properties
The mean, median, Q1, Q3, minimum, and maximum all change when there is a scale change unless they are zero. Each is multiplied or divided by the scale change c. The measures of spread (standard deviation, variance, IQR, and range) always change when there is a scale change. The standard deviation, IQR, and range are multiplied or divided by the scale change c. The variance is multiplied or divided by c2.

Shift Change Example Suppose we measure the weight of everyone on a football team and obtain the following statistics for a team report: Mean: 230 lbs. Median: 240 lbs. Std. Dev.: 50 lbs. Q1: 200 lbs., Q3: 280 lbs. Variance: 2500 sq. lbs. IQR: 80 lbs Min.: 170 lbs. Range: 180 lbs. Max.: 350 lbs.

Shift Change Example Now suppose we found out the scale was 10 lbs. under so we need to add 10 lbs. to every weight. What would happen to each of the following statistics? Original After Shift Change Mean: 230 lbs Mean:________ Median: 240 lbs Median:_________ s: 50 lbs s:_______ Q1: 200 lbs Q1:________ Q3: 280 lbs Q3:________

Shift Change Example Now suppose we found out the scale was 10 lbs. under so we need to add 10 lbs. to every weight. What would happen to each of the following statistics? Original After Shift Change Variance: 2500 sq. lbs. Variance: ________ IQR: 80 lbs. IQR: _________ Min: 170 lbs. Min: _________ Max: 350 lbs. Max: _________ Range: 180 lbs. Range: _________

Shift and Scale Change Example
Further, suppose we found out that we are supposed to report the weights and statistics in kilograms, not lbs (Remember, 1 lb = 0.6 kilograms). What would happen to each of the following statistics? After Shift Change After Shift and Scale Change Mean: 240 lbs. Mean: ______________ Median: 250 lbs. Median: ______________ s: 50 lbs. s: _____________ Q1: 210 lbs. Q1: _____________ Q3: 290 lbs. Q3: _____________

Shift and Scale Change Example
Further, suppose we found out that we are supposed to report the weights and statistics in kilograms, not lbs (Remember, 1 lb = 0.6 kilograms). What would happen to each of the following statistics? After Shift Change After Shift and Scale Change Variance: 2500 sq. lbs. Variance: _______________ IQR: 80 lbs. IQR: _______________ Min: 180 lbs. Min: _______________ Max: 360 lbs. Max: ________________ Range: 180 lbs. Range: _________________

Linear Transformations
If you are given a mean, (or ), and a standard deviation, s (or ), and want to convert your data so you have a new mean, (or new), and new standard deviation, snew (or new), all you need is to remember what shift and scales changes affect. In our linear transformation formula: a is the shift change b is the scale change Standard deviation are only affected by scale changes, but means are affected by both shift and scales changes.

Linear Transformation Example
For example: = 12 and s = 7 but we want = 25 and = 10. snew = scale*s 10 = scale*7 scale = 10/7 scale = 1.43 substituting in: = shift + scale* 25 = shift *12 shift = 25  1.43*12 shift = 7.84 So our linear transformation equation is: x new = *x

Download ppt "Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take."

Similar presentations