Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
Published byModified over 5 years ago
Presentation on theme: "Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take."— Presentation transcript:
1 Chapter 1 IntroductionIndividual: objects described by a set of data (people, animals, or things)Variable: Characteristic of an individual. It can take on different values for different individuals.Examples: age, height, gender, favorite class, speed, moisture, etc.
2 Types of VariablesQuantitative: numerical values, can be added, subtracted, averaged, etc.________: takes on values which are spaced. That is, for two values of a discrete variable that are adjacent, there is no value that goes between them.________: values are all numbers in a given interval. That is, for two values of a continuous variable that are adjacent, there is another value that can go between the two.Categorical: An individual is placed into one of several groups or categories. These groups or categories are not usually numerical.
4 Distribution of a Variable The distribution of a variable tells us the possible values for the variable and the probability that the variable takes these values.Two ways to describe a distributionNumericallyGraphically
5 Categorical Variables Suppose we poll 46 people on an issue. How can we exhibit their response?Numerically:CountsProportionsPercentagesGraphically:Frequency TablesBar ChartsPie Charts
6 Categorical Variables Suppose we poll 46 people on an issue. How can we exhibit their response?Frequency Tables:counts (14 agree)proportions (14/46 = .304 agree)percents (30.4% agree)
7 Categorical Variables Suppose we poll 46 people on an issue. How can we exhibit their response?Bar Chart:can have counts,percents orproportions onvertical axis
8 Categorical Variables Suppose we poll 46 people on an issue. How can we exhibit their response?Pie Chart:
9 Examining a Distribution To describe a distribution we need 3 items:Shape: modes, symmetric, skewedCenter: mean, medianSpread: range, standard deviation, IQRLook for the overall pattern and for striking deviationsOutlier-individual value that falls outside the overall pattern
10 Numeric Variable Distributions Shape:Modes: Major peaks in the distributionSymmetric: The values smaller and larger than the midpoint are mirror images of each otherSkewed to the right: Right tail is much longer than the left tailSkewed to the left: Left tail is much longer than the right tailCenter:Mean: The arithmetic average. Add up the numbers and divide by the number of observations.Median: List the data from smallest to largest. If there are an odd number of data values, the median is the middle one in the list. If there are an even number of data values, average the middle two in the list
11 Numeric Variable Distributions Spread:Range: The difference in the largest and smallest value. (Max – Min)Standard Deviation: Measures spread by looking at how far observations are from their mean.The computational formula for the standard deviation isInterquartile Range (IQR): Distance between the first quartile (Q1) and the third quartile (Q3). IQR = Q3 – Q1Q1 – 25% of the observations are less than Q1 and 75% are greater than Q1.Q3 – 75% of the observations are less than Q3 and 25% are greater than Q3.
12 Numeric Variable Distributions Example 1.5 on page 11 of the book shows how much 50 consecutive shoppers spent in a store. The data appear as follows:$3.11$18.30$24.50$36.30$50.30$8.88$18.40$25.10$38.60$52.70$9.26$19.20$26.20$39.10$54.80$10.80$19.50$41.00$59.00$12.60$27.60$42.90$61.20$13.70$20.10$28.00$44.00$70.30$15.20$20.50$44.60$82.70$15.60$22.20$28.30$45.40$85.70$17.00$23.00$32.00$46.60$86.30$17.30$24.40$34.90$48.60$93.30
13 Numerical VariablesHow can we describe the distribution of these 50 numbers?NumericallyCenter: Mean or MedianSpread: Quartiles, Range, IQR, or Standard deviationGraphicallyFrequency TableHistogramBoxplotStem and LeafNormal Quantile Plot
14 Descriptive statistics The descriptives box from SPSS gives the mean, median, variance, standard deviation, minimum, maximum, range, and IQR.
15 Percentiles50th percentile is also called the median – the middle data value if ordered smallest to largest25th and 75th percentiles are also called the quartiles: Q1 and Q3 respectively – the middle data value of each half
16 Frequency Table Notice the amount spent is broken into categories or groupsRecall, frequencytables can be used forcategorical variablesas wellCategoryCount or FrequencyPercent0 - 1036.00%1224.00%1326.00%510.00%714.00%48.00%12.00%
17 Histogram Breaks the range of values of a variable into intervals (midpoint is displayed here)Displays only the countor percent of the observationsthat fall into each interval
18 Box Plot Minimum, Q1, Median, Q3, and Maximum These five numbers are called the____________________What are these points?
19 Stem and Leaf Plot Works best for smaller data sets Example 1.4 on pg 10Here are the numbers of homeruns that Babe Ruth hit in each of his 15 years with the New York Yankees from :54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22
20 Normal Quantile PlotNormal Quantile Plot (This compares the distribution of the sample to the Normal Distribution):the straight lineis normal,compare dotsto the lineIf dots fall close to the normalline then the data comesfrom a normal distribution.
21 Describing Numeric Variable Distributions Now, we examine the appearance of other data:Modes are major peaks in the distributionThe histogram below The histogram below has onehas two modes-bimodal mode-unimodal
22 Describing Numeric Variable Distributions Now, we examine the appearance of other data:This example is called right This is an example of a boxplot skewed since the distribution has that is skewed to the _______.a long right tail.
23 Describing Numeric Variable Distributions ________: observations that are unusually far from the bulk of the data.What are some possible explanations for outliers?The data point was recorded wrong.The data point wasn’t actually a member of the population we were trying to sample.We just happened to get an extreme value in our sample.The 1.5 x IQR Criterion for Outliers: Designate an observation a suspected outlier if it falls more than 1.5 x IQR below the first quartile or above the third quartile.
24 1.5*IQR Criterion Example Suppose you had the following data set:-2, 15, 3, 7, 10, 21, 1, 5, 12, 8, 1, 35, 10List data from smallest to largest:Find Q1, Median, Q3, Min, and Max:IQR = Q3 – Q1 = ______1.5*IQR = _______Q1 – 1.5*IQR = ________If less than this number, then outlierQ *IQR = ________If more than this number, then outlierAre there any outliers in this data set?
25 Describing Numeric Variable Distributions Symmetry versus Skewness:______________________________
26 Mean versus Median: Symmetric Left Skewed Right Skewed For a skewed distribution, the mean is farther out in the longer tail than is the median.mean<median mean=median mean>medianTo describe distributions use:Median and IQR Mean and standard deviation Median and IQRSymmetricLeft SkewedRight Skewed
27 Strategy for Exploring Data on a Single Quantitative Variable Always plot your data: make a graph usually a stem and leaf or histogramLook for overall pattern and for outliersCalculate an appropriate numerical summary to briefly describe center and spreadSometimes the overall pattern of a large number of observations is so regular that it can be described by a smooth curve
28 Introducing the Normal Distribution It is customary to describe a normal distribution in the following way:Properties of the Normal Distribution:Symmetric, bell-shapedMean, μ and standard deviation, σArea under the curve is 1sm
29 The Normal Distribution Normal distributions can take on many different means and standard deviations. Only the general bell shape must remain the same.Here are some examples of normal distributions:m = -2m = 0m = 3s = 0.5s = 1s = 23-2
30 Distribution Properties Introducing:The Standard Normal DistributionProperties:1. _________________2. _________________3. _________________
31 Distribution Properties Empirical Rule (The Rule): If the distribution is normal, thenApproximately 68% of the data falls within one standard deviation of the meanApproximately 95% of the data falls within two standard deviations of the meanApproximately 99.7% of the data falls within three standard deviations of the mean
34 Empirical Rule Example If the grades on an exam are normally distributed with a mean of 68 and a variance of 16, what grade do you have to make to be in the top 15% of the class?
35 Distribution Properties Shift Changes: adding or subtracting a number from the each of the values.meanmean + cmean - c
36 Distribution Properties The mean, median, Q1, Q3, minimum, and maximum all shift when there is a shift change. The shift change, say c, is added or subtracted to each of the statistics accordingly.The measures of spread (standard deviation, variance, IQR, and range) do not change when there is a shift change.
37 Distribution Properties Scale Changes: multiplying or dividing each of the values by a number.mean
38 Distribution Properties Scale Changes: multiplying or dividing each of the values by a number.mean*c
39 Distribution Properties Scale Changes: multiplying or dividing each of the values by a number.mean/c
40 Distribution Properties The mean, median, Q1, Q3, minimum, and maximum all change when there is a scale change unless they are zero. Each is multiplied or divided by the scale change c.The measures of spread (standard deviation, variance, IQR, and range) always change when there is a scale change. The standard deviation, IQR, and range are multiplied or divided by the scale change c. The variance is multiplied or divided by c2.
41 Shift Change ExampleSuppose we measure the weight of everyone on a football team and obtain the following statistics for a team report:Mean: 230 lbs. Median: 240 lbs.Std. Dev.: 50 lbs. Q1: 200 lbs., Q3: 280 lbs.Variance: 2500 sq. lbs. IQR: 80 lbsMin.: 170 lbs. Range: 180 lbs.Max.: 350 lbs.
42 Shift Change ExampleNow suppose we found out the scale was 10 lbs. under so we need to add 10 lbs. to every weight. What would happen to each of the following statistics?OriginalAfter Shift ChangeMean: 230 lbs Mean:________Median: 240 lbs Median:_________s: 50 lbs s:_______Q1: 200 lbs Q1:________Q3: 280 lbs Q3:________
43 Shift Change ExampleNow suppose we found out the scale was 10 lbs. under so we need to add 10 lbs. to every weight. What would happen to each of the following statistics?OriginalAfter Shift ChangeVariance: 2500 sq. lbs.Variance: ________IQR: 80 lbs.IQR: _________Min: 170 lbs.Min: _________Max: 350 lbs.Max: _________Range: 180 lbs.Range: _________
44 Shift and Scale Change Example Further, suppose we found out that we are supposed to report the weights and statistics in kilograms, not lbs (Remember, 1 lb = 0.6 kilograms). What would happen to each of the following statistics?After Shift ChangeAfter Shift and Scale ChangeMean: 240 lbs.Mean: ______________Median: 250 lbs.Median: ______________s: 50 lbs.s: _____________Q1: 210 lbs.Q1: _____________Q3: 290 lbs.Q3: _____________
45 Shift and Scale Change Example Further, suppose we found out that we are supposed to report the weights and statistics in kilograms, not lbs (Remember, 1 lb = 0.6 kilograms). What would happen to each of the following statistics?After Shift ChangeAfter Shift and Scale ChangeVariance: 2500 sq. lbs.Variance: _______________IQR: 80 lbs.IQR: _______________Min: 180 lbs.Min: _______________Max: 360 lbs.Max: ________________Range: 180 lbs.Range: _________________
46 Linear Transformations If you are given a mean, (or ), and a standard deviation, s (or ), and want to convert your data so you have a new mean, (or new), and new standard deviation, snew (or new), all you need is to remember what shift and scales changes affect.In our linear transformation formula:a is the shift changeb is the scale changeStandard deviation are only affected by scale changes, but means are affected by both shift and scales changes.
47 Linear Transformation Example For example: = 12 and s = 7 but we want = 25 and = 10.snew = scale*s10 = scale*7scale = 10/7scale = 1.43substituting in: = shift + scale*25 = shift *12shift = 25 1.43*12shift = 7.84So our linear transformation equation is: x new = *x