Download presentation
Presentation is loading. Please wait.
Published byAlberta Bridges Modified over 8 years ago
1
Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk
2
Means and Variances What happens to means and variances when data is manipulated? Let’s check by manipulating data from the survey.
3
Data Height in inches (HT) Shoe size (Shoe) Age (Age) Additional Columns: Height with a 1 inch heel (HeightPlus1) Height in centimeters (2.5TimesHeight) Sum of height and shoe size (HeightPlusShoe) Sum of height and age (HeightPlusAge)
4
Statistics VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913
5
VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 1 The mean of heel heights is one inch larger than then mean of heights
6
Why? If every element is modified by a constant number the mean follows the same pattern.
7
VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 2 The standard deviation of heel heights equals the standard deviation of heights
8
Why? Standard deviation is relative to the mean, and the shape of the distribution didn’t change
9
VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 3 The standard deviation of heights is 2.5 times the standard deviation of heights in centimeters
10
Why? By multiplying all data values by a constant value we are increasing the spread of the histogram by the same value, therefore modifying the properties that depend on the spread (like standard deviation.)
11
VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 4 Mean of HeightPlusShoe = Mean of Height + Mean of Shoe
12
VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 5 Mean of HeightPlusAge = Mean of Height + Mean of Age
13
Why? Since
14
Variances Variance = σ 2 Variances apply to a probability distribution Variance is a way to capture the degree of spread of a distribution
15
Variances VariableVariance HT15.50784 Shoe3.796263 Age8.479744 HeightPlusShoe32.41025 HeightPlusAge24.13757
16
Dependence Are shoe sizes and heights dependent? Are age and height dependent? Let’s check using scatter plots
17
Height vs. Shoe Size
18
Height vs. Age
19
Back to variances Variance of HeightPlusShoe is much greater than Var(Height) + Var(Shoe) Variance of HeightPlusAge is very close to Var(Height) + Var(Age) VariableVariance HT15.50784 Shoe3.796263 Age8.479744 HeightPlusShoe32.41025 HeightPlusAge24.13757
20
Why? Can you see a difference in relationships (Height vs. Shoe Size) and (Height vs. Age?)
21
Dependence Adding two dependent data distributions produces extremes (adding small values with corresponding small values and adding large values to correspondent large values) This makes the variance much larger.
22
Dependence In case of independent sets, values do not necessarily correspond by relative value (large values can be added to small values) This does not alter the spread of the distribution much
23
Variance of sample mean Mean = (X 1 + X 2 + … + X n )/n Variance [(X 1 + X 2 + … +X n )/n] = (Variance[X 1 ] + Variance[X 2 ]+ … + Variance[X n ])/n
24
Dependence? Would this work for dependent values of X 1, X 2 … X n ? Would the variance produced by this formula be larger or smaller than actual? Sampling without replacement Would the variance formula hold true? Why?
25
Dependence Adding variances of dependent values will produce a smaller result than expected because adding dependent data sets will produce extremes, altering the spread Sampling without replacement on smaller populations (n < 10) will produce dependence
26
The End
27
Extra Credit (Dr. Pfenning) Use Minitab Calculator to create column “Birthyear” Plot Earned vs. Birthyear, note relationship Create column “EarnedPlusBirthyear” Find sds of Earned, Birthyear, EarnedPlusBirthyear, square to variances Compare variances Explain results
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.