Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk.

Similar presentations


Presentation on theme: "Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk."— Presentation transcript:

1 Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

2 Means and Variances  What happens to means and variances when data is manipulated?  Let’s check by manipulating data from the survey.

3 Data  Height in inches (HT)  Shoe size (Shoe)  Age (Age)  Additional Columns: Height with a 1 inch heel (HeightPlus1) Height in centimeters (2.5TimesHeight) Sum of height and shoe size (HeightPlusShoe) Sum of height and age (HeightPlusAge)

4 Statistics VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913

5 VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 1 The mean of heel heights is one inch larger than then mean of heights

6 Why?  If every element is modified by a constant number the mean follows the same pattern.

7 VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 2 The standard deviation of heel heights equals the standard deviation of heights

8 Why?  Standard deviation is relative to the mean, and the shape of the distribution didn’t change

9 VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 3 The standard deviation of heights is 2.5 times the standard deviation of heights in centimeters

10 Why? By multiplying all data values by a constant value we are increasing the spread of the histogram by the same value, therefore modifying the properties that depend on the spread (like standard deviation.)

11 VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 4 Mean of HeightPlusShoe = Mean of Height + Mean of Shoe

12 VariableNMeanStDev HT44466.9283.938 Shoe4459.10561.9484 Age44420.3712.912 HeightPlus144467.9283.938 2.5TimesHeight444167.329.84 HeightPlusShoe44476.0355.693 HeightPlusAge44487.2994.913 Observation 5 Mean of HeightPlusAge = Mean of Height + Mean of Age

13 Why? Since

14 Variances  Variance = σ 2  Variances apply to a probability distribution  Variance is a way to capture the degree of spread of a distribution

15 Variances VariableVariance HT15.50784 Shoe3.796263 Age8.479744 HeightPlusShoe32.41025 HeightPlusAge24.13757

16 Dependence  Are shoe sizes and heights dependent?  Are age and height dependent?  Let’s check using scatter plots

17 Height vs. Shoe Size

18 Height vs. Age

19 Back to variances  Variance of HeightPlusShoe is much greater than Var(Height) + Var(Shoe)  Variance of HeightPlusAge is very close to Var(Height) + Var(Age) VariableVariance HT15.50784 Shoe3.796263 Age8.479744 HeightPlusShoe32.41025 HeightPlusAge24.13757

20 Why?  Can you see a difference in relationships (Height vs. Shoe Size) and (Height vs. Age?)

21 Dependence  Adding two dependent data distributions produces extremes (adding small values with corresponding small values and adding large values to correspondent large values)  This makes the variance much larger.

22 Dependence  In case of independent sets, values do not necessarily correspond by relative value (large values can be added to small values)  This does not alter the spread of the distribution much

23 Variance of sample mean  Mean = (X 1 + X 2 + … + X n )/n  Variance [(X 1 + X 2 + … +X n )/n] = (Variance[X 1 ] + Variance[X 2 ]+ … + Variance[X n ])/n

24 Dependence?  Would this work for dependent values of X 1, X 2 … X n ?  Would the variance produced by this formula be larger or smaller than actual?  Sampling without replacement Would the variance formula hold true? Why?

25 Dependence  Adding variances of dependent values will produce a smaller result than expected because adding dependent data sets will produce extremes, altering the spread  Sampling without replacement on smaller populations (n < 10) will produce dependence

26 The End

27 Extra Credit (Dr. Pfenning)  Use Minitab Calculator to create column “Birthyear”  Plot Earned vs. Birthyear, note relationship  Create column “EarnedPlusBirthyear”  Find sds of Earned, Birthyear, EarnedPlusBirthyear, square to variances  Compare variances  Explain results


Download ppt "Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk."

Similar presentations


Ads by Google