Presentation on theme: "DESCRIBING DISTRIBUTION NUMERICALLY"— Presentation transcript:
1DESCRIBING DISTRIBUTION NUMERICALLY MEASURES OF CENTER:MIDRANGE = (MAX + MIN) / 2MEDIAN IS THE MIDDLE VALUE WITH HALF OF THE DATA ABOVE AND HALF BELOW IT.MEAN = (SUM OF DATA) / (NUMBER OF COUNTS n)EXAMPLE:DATA: 45, 46, 49, 35, 76, 80, 89, 94, 37, 61, 62, 64, 68, 56, 57, 57, 59, 71, 72.SORTED DATA: 35, 37, 45, 46, 49, 56, 57, 59, 61, 62, 64, 68, 71, 72, 76, 80, 89, 94.MIDRANGE = ( ) / 2 = 64.5MEDIAN = 61MEAN = ( … + 94) / 19 = 62NOTE: FOR SKEWED DISTRIBUTIONS THE MEDIAN IS A BETTER MEASURE OF THE CENTER THAN THE MEAN.
2MEASURES OF THE SPREAD RANGE = MAX – MIN INTERQUARTILE RANGE (IQR) = Q3 – Q1Q3 = UPPER QUARTILE= MEDIAN OF UPPER HALF OF DATA(INCLUDE MEDIAN IF n IS ODD)Q1 = LOWER QUARTILEMEDIAN OF LOWER HALF OF DATA(INCLUDE MEDIAN IF n IS ODD)VARIANCE (later)STANDARD DEVIATION (later)
3Note: Include the median in the calculation of both quartiles EXAMPLE: (odd number of observations, 19)Median = 61UPPER HALF[ ]Q3 = (71 +72) / 2 = 71.5LOWER HALF[ ]Q1 = ( ) / 2 = 52.5IQR = 71.5 – 52.5 = 19Note: Include the median in the calculation of both quartiles
4Quartiles EXAMPLE: (even number of observations, 18)  [ ]60 = Median = (59+61)/2 (Average of the middle two numbers)UPPER HALF [ ]Q3 = 71LOWER HALF[ ]Q1 = 49IQR = 71 – 49 = 42
55 – NUMBER SUMMARY:THE 5-NUMBER SUMMARY OF A DISTRIBUTION REPORTS ITS MEDIAN, QUARTILES, AND EXTREMES(MINIMUM AND MAXIMUM)MAX = 94Q3 = 71.5MEDIAN = 61Q1 = 52.5MIN=35OUTLIERS: DATA VALUES WHICH ARE BEYOND FENCESIQR = Q3 – Q1 = 19UPPER FENCE = Q IQR = x19 = 100LOWER FENCE = Q1 – 1.5IQR = 52.5 – 1.5x19 = 24IN THE EXAMPLE CONSIDERED ABOVE, THERE ARE NO OUTLIERS.
6BOXPLOTS WHENEVER WE HAVE A 5-NUMBER SUMMARY OF A\ (QUANTITATIVE) VARIABLE, WE CAN DISPLAY THEINFORMATION IN A BOXPLOT.THE CENTER OF A BOXPLOT IS A BOX THAT SHOWS THE MIDDLE HALF OF THE DATA, BETWEEN THE QUARTILES.THE HEIGHT OF THE BOX IS EQUAL TO THE IQR.IF THE MEDIAN IS ROUGHLY CENTERED BETWEEN THE QUARTILES, THEN THE MIDDLE HALF OF THE DATA IS ROUGHLY SYMMETRIC. IF IT IS NOT CONTERED, THE DISTRIBUTION IS SKEWED.THE MAIN USE FOR BOXPLOTS IS TO COMPARE GROUPS.
8Examples:1. Here are costs of 10 electric smoothtop ranges rated very good or excellent by Consumers Reports in August 2002.Find the following statistics by hand:a) meanb) median and quartilesc) range and IQR
9VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE MEAN DEVIATION = (each data value) – meanVARIANCE = 4648 / (19 -1) = 258.8STANDARD DEVIATION = SQUARE ROOT ( VARIANCE)= 16.1
10VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE MEAN Step 1: Sort Data:565 Mean =750 Median =1025850 Q1=850900 Q3=1200Range = 835IQR= 3501050120012501400
11VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE MEAN Computing the VarianceDEVIATION = (each data value) – meanSquared Deviation= ((each data value) – mean)^2Sum all squared deviationsVariance = (sum of all squared deviations)/(n-1), where n = is the number of observations
12Variance Example: Variance = 147.2/4 = 36.8 Data Squared Deviations6.76Mean = 42.4Variance = 147.2/4 = 36.8Std Deviation = square root of varianceStd dev = 6.06
13Some Remarks If the shape is skewed, report the median and IQR. Mean and median will be very differnet.You may want to include the mean and std deviation, but you should point out why the mean and the median differ.If the histogram is symmetric, report the mean and the std deviation and possibly the median and IQR.