Presentation on theme: "F IND THE S TANDARD DEVIATION FOR THE FOLLOWING D ATA OF GPA: 4, 3, 2, 3.5, 3 1. 3.1 2. 1.48 3. 2.2 4. 0.74 5. 0.44."— Presentation transcript:
F IND THE S TANDARD DEVIATION FOR THE FOLLOWING D ATA OF GPA: 4, 3, 2, 3.5, 3 1. 3.1 2. 1.48 3. 2.2 4. 0.74 5. 0.44
U PCOMING I N C LASS Homework #2 due Monday at 5pm Quiz #1 in class Jan. 26 th (open book) Part 1 of the Data Project due Jan. 31 st Slide 4- 2
C HAPTER 5 Understanding and Comparing Distributions
W HY WE NEED TO UNDERSTAND AND COMPARE DISTRIBUTIONS ? In order to examine your model, you need to know what your data looks like. It is a connection between your data and statistical results. Understanding the distributions provides us the preliminary descriptive data information and help you get a sense of models for further explanations.
T HE B IG P ICTURE We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram of the Average Wind Speed for every day in 1989.
T HE F IVE -N UMBER S UMMARY The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum). Example: The five- number summary for the daily wind speed is: Max8.67 Q32.93 Median1.90 Q11.15 Min0.20
C OMPARING G ROUPS It is always more interesting to compare groups. With histograms, note the shapes, centers, and spreads of the two distributions. What does this graphical display tell you?
D AILY W IND S PEED : M AKING B OXPLOTS A boxplot is a graphical display of the five- number summary. Boxplots are particularly useful when comparing groups. Slide 1- 8
M EN VS W OMEN ECO 138 Men Women Min0.059 Q10.895 Q2 - Median0.962 Q31 Max1 Min0.775 Q10.993 Q2 - Median1 Q31 Max1
M EN VS W OMEN S TARTING S ALARIES Men Women Min18,000 Q125,000 Q2 - Median45,000 Q365,000 Max70,000 Min18,000 Q135,000 Q2 - Median42,000 Q345,000 Max50,000
C ONSTRUCTING B OXPLOTS 1.Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.
C ONSTRUCTING B OXPLOTS ( CONT.) Slide 1- 12 2.Erect “fences” around the main part of the data. The upper fence is 1.5 IQRs above the upper quartile. Q3 + 1.5*IQR The lower fence is 1.5 IQRs below the lower quartile. Q1 - 1.5*IQR Note: the fences only help with constructing the boxplot and should not appear in the final display.
C ONSTRUCTING B OXPLOTS ( CONT.) Slide 1- 13 3.Use the fences to grow “whiskers.” Draw lines from the ends of the box up and down to the most extreme data values found within the fences. If a data value falls outside one of the fences, we do not connect it with a whisker.
C ONSTRUCTING B OXPLOTS ( CONT.) Slide 1- 14 4.Add the outliers by displaying any data values beyond the fences with special symbols. We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.
W IND S PEED : M AKING B OXPLOTS ( CONT.) Compare the histogram and boxplot for daily wind speeds: How does each display represent the distribution? Slide 1- 15
C OMPARING G ROUPS ( CONT ) Boxplots offer an ideal balance of information and simplicity, hiding the details while displaying the overall summary information. We often plot them side by side for groups or categories we wish to compare. What do these boxplots tell you? Slide 1- 16
W HAT A BOUT O UTLIERS ? If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing. Note: The median and IQR are not likely to be affected by the outliers.
W HAT C AN G O W RONG ? ( CONT.) Beware of outliers Be careful when comparing groups that have very different spreads. Consider these side-by- side boxplots of cotinine levels:
A CLASS OF FOURTH GRADERS TAKES A DIAGNOSTIC READING TEST, AND THE SCORES ARE REPORTED BY READING GRADE LEVEL. T HE 5- NUMBER SUMMARY FOR THE BOYS AND GIRLS ARE SHOWN BELOW. Girls Min: 2.5 Q1: 3.7 Q2: 4.3 Q3: 4.7 Max: 5.8 Boys Min: 2.7 Q1: 4.1 Q2: 4.5 Q3: 4.9 Max: 5.9
W HICH GROUP HAD THE HIGHEST SCORE Slide 1- 20 1. Girls 2. Boys
W HICH GROUP HAD THE GREATEST RANGE Slide 1- 21 1. Girls 2. Boys 3. They are the same
W HICH GROUP HAD THE GREATEST IQR Slide 1- 22 1. Girls 2. Boys 3. They are the same
W HICH GROUP ’ S SCORES APPEAR MORE SKEWED ? Slide 1- 23 1. The boy’s scores are more skewed. The quartiles are the same distance from the mean. 2. The girl’s scores are more skewed. The quartiles are not the same distance from the median. 3. The boy’s scores are more skewed. The quartiles are not the same distance from the median. 4. The girl’s scores are more skewed. The quartiles are the same distance from the median.
W HICH GROUP GENERALLY DID BETTER ON THE TEST ? Slide 1- 24 1. Girls did better b/c the mean for girls was higher. 2. Girls did better b/c the median for girls was higher. 3. Boys did better b/c the mean for boys was higher. 4. Boys did better b/c the median for boys was higher.
T IMEPLOTS : O RDER, P LEASE ! For some data sets, we are interested in how the data behave over time. In these cases, we construct timeplots of the data.
W HAT C AN G O W RONG ? ( CONT.) Slide 1- 26 Avoid inconsistent scales, either within the display or when comparing two displays. Label clearly so a reader knows what the plot displays. Good intentions, bad plot:
*R E - EXPRESSING S KEWED D ATA TO I MPROVE S YMMETRY Slide 1- 27
T RANSFORMING D ATA y=Log(x) To get original data back x=10^y =10 y y=Sqrt(x) To get original data back x=y^2 = y*y Slide 1- 28
*R E - EXPRESSING S KEWED D ATA TO I MPROVE S YMMETRY ( CONT.) One way to make a skewed distribution more symmetric is to re-express or transform the data by applying a simple function (e.g., logarithmic function). Note the change in skewness from the raw data (previous slide) to the transformed data (right):
N EXT T IME Chapter 6 How we use the Standard Deviation to make comparisons….