Skewness & Kurtosis: Reference

Name: Skewness & Kurtosis: Reference
Uploaded: 2017-08-12T15:33:24+00:00
Duration: PTM27S25
Channel: Geoffrey Short
Description: Skewness & Kurtosis: Reference

Skewness & Kurtosis: Reference
Source:

Further Moments – Skewness
Skewness measures the degree of asymmetry exhibited by the data If skewness equals zero, the histogram is symmetric about the mean Positive skewness vs negative skewness Skewness measured in this way is sometimes referred to as “Fisher’s skewness”

Further Moments – Skewness
Source:

Mode Median Mean A B

Median Mean n = 26 mean = median = 3.5 mode = 8

Value Occurrences Deviation Cubed deviation Occur*Cubed
1 1 (1 – 4.23) = (-3.23)3 = 2 4 (2 – 4.23) = (-2.23)3 = 3 8 (3 – 4.23) = (-1.13)3 = 4 4 (4 – 4.23) = (-0.23)3 = 5 3 (5 – 4.23) = (+0.77)3 = 6 2 (6 – 4.23) = (+1.77)3 = 7 1 (7 – 4.23) = (+2.77)3 = 8 1 (8 – 4.23) = (+3.77)3 = 9 1 (9 – 4.23) = (+4.77)3 = 10 1 ( )= (+5.77)3 = Sum = Mean = 4.23 s = 2.27 Skewness = 0.97

Skewness > 0 (Positively skewed)
Mode Median Mean Skewness > 0 (Positively skewed)

Skewness < 0 (Negatively skewed)
Mode Median Mean A B Skewness < 0 (Negatively skewed)

Skewness = 0 (symmetric distribution)
Source: Skewness = 0 (symmetric distribution)

Skewness – Review Positive skewness Negative skewness
There are more observations below the mean than above it When the mean is greater than the median Negative skewness There are a small number of low observations and a large number of high ones When the median is greater than the mean

Kurtosis – Review Kurtosis measures how peaked the histogram is (Karl Pearson, 1905) The kurtosis of a normal distribution is 0 Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution

Kurtosis – Review Platykurtic– When the kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is more flat and wide) Thus, negative kurtosis indicates a relatively flat distribution Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small part of the curve (i.e, the curve is more peaked) Thus, positive kurtosis indicates a relatively peaked distribution

Source: http://espse. ed. psu. edu/Statistics/Chapters/Chapter3/Chap3

Measures of central tendency – Review
Measures of the location of the middle or the center of a distribution Mean Median Mode

Mean – Review Mean – Average value of a distribution; Most commonly used measure of central tendency Median – This is the value of a variable such that half of the observations are above and half are below this value, i.e., this value divides the distribution into two groups of equal size Mode - This is the most frequently occurring value in the distribution

An Example Data Set Daily low temperatures recorded in Chapel Hill (01/18-01/31, 2005, °F) Jan. 18 – 11 Jan. 25 – 25 Jan. 19 – 11 Jan. 26 – 33 Jan. 20 – 25 Jan. 27 – 22 Jan. 21 – 29 Jan. 28 – 18 Jan. 22 – 27 Jan. 29 – 19 Jan. 23 – 14 Jan. 30 – 30 Jan. 24 – 11 Jan. 31 – 27 For these 14 values, we will calculate all three measures of central tendency - the mean, median, and mode

Mean – Review Mean –Most commonly used measure of central tendency
Procedures (1) Sum all the values in the data set (2) Divide the sum by the number of values in the data set Watch for outliers

Mean – Review (1) Sum all the values in the data set
 = 302 (2) Divide the sum by the number of values in the data set  Mean = 302/14 = 21.57 Is this a good measure of central tendency for this data set?

Median – Review Median - 1/2 of the values are above it & 1/2 below
(1) Sort the data in ascending order (2) Find the value with an equal number of values above and below it (3) Odd number of observations  [(n-1)/2]+1 value from the lowest (4) Even number of observations  average (n/2) and [(n/2)+1] values (5) Use the median with asymmetric distributions, particularly with outliers

Median – Review (1) Sort the data in ascending order:
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Find the value with an equal number of values above and below it Even number of observations  average the (n/2) and [(n/2)+1] values  (14/2) = 7; [(14/2)+1] = 8  (22+25)/2 = 23.5 (°F) Is this a good measure of central tendency for this data?

Mode – Review Mode – This is the most frequently occurring value in the distribution (1) Sort the data in ascending order (2) Count the instances of each value (3) Find the value that has the most occurrences If more than one value occurs an equal number of times and these exceed all other counts, we have multiple modes Use the mode for multi-modal data

Mode – Review (1) Sort the data in ascending order:
 11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Count the instances of each value: 3x x 1x 1x 1x x x 1x 1x 1x (3) Find the value that has the most occurrences  mode = 11 (°F) Is this a good measure of the central tendency of this data set?

Measures of Dispersion – Review
In addition to measures of central tendency, we can also summarize data by characterizing its variability Measures of dispersion are concerned with the distribution of values around the mean in data: Range Interquartile range Variance Standard deviation z-scores Coefficient of Variation (CV)

An Example Data Set Daily low temperatures recorded in Chapel Hill (01/18-01/31, 2005, °F) Jan. 18 – 11 Jan. 25 – 25 Jan. 19 – 11 Jan. 26 – 33 Jan. 20 – 25 Jan. 27 – 22 Jan. 21 – 29 Jan. 28 – 18 Jan. 22 – 27 Jan. 29 – 19 Jan. 23 – 14 Jan. 30 – 30 Jan. 24 – 11 Jan. 31 – 27 For these 14 values, we will calculate all measures of dispersion

Range – Review Range – The difference between the largest and the smallest values (1) Sort the data in ascending order (2) Find the largest value  max (3) Find the smallest value  min (4) Calculate the range  range = max - min Vulnerable to the influence of outliers

Range – Review Range – The difference between the largest and the smallest values (1) Sort the data in ascending order  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Find the largest value  max = 33 (3) Find the smallest value  min = 11 (4) Calculate the range  range = 33 – 11 = 22

Interquartile Range – Review
Interquartile range – The difference between the 25th and 75th percentiles (1) Sort the data in ascending order (2) Find the 25th percentile – (n+1)/4 observation (3) Find the 75th percentile – 3(n+1)/4 observation (4) Interquartile range is the difference between these two percentiles

Interquartile Range – Review
(1) Sort the data in ascending order  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Find the 25th percentile – (n+1)/4 observation  (14+1)/4 = 3.75  11+(14-11)*0.75 = (3) Find the 75th percentile – 3(n+1)/4 observation  3(14+1)/4 =  27+(29-27)*0.25 = 27.5 (4) Interquartile range is the difference between these two percentiles  27.5 – =

Variance – Review Variance is formulated as the sum of squares of statistical distances (or deviation) divided by the population size or the sample size minus one:

Variance – Review (1) Calculate the mean
 (2) Calculate the deviation for each value (3) Square each of the deviations (4) Sum the squared deviations (5) Divide the sum of squares by (n-1) for a sample

Variance – Review (1) Calculate the mean 
(2) Calculate the deviation for each value Jan (11 – 25.7) = Jan (25 – 25.7) = 3.43 Jan (11 – 25.7) = Jan (33 – 25.7) = 11.43 Jan (25 – 25.7) = Jan (22 – 25.7) = 0.43 Jan (29 – 25.7) = Jan (18 – 25.7) = -3.57 Jan (27 – 25.7) = Jan (19 – 25.7) = -2.57 Jan (14 – 25.7) = Jan (30 – 25.7) = 8.42 Jan (11 – 25.7) = Jan (27 – 25.7) = 5.42

Variance – Review (3) Square each of the deviations 
Jan (-10.57)^2 = Jan (3.43)^2 = 11.76 Jan (-10.57)^2 = Jan (11.43)^2 = Jan (3.43)^2 = Jan (0.43)^2 = 0.18 Jan (7.43)^2 = Jan (-3.57)^2 = 12.76 Jan (5.43)^2 = Jan (-2.57)^2 = 6.61 Jan (7.57)^2 = Jan (8.43)^2 = 71.04 Jan (-10.57)^2 = Jan (5.43)^2 = 29.57 (4) Sum the squared deviations =

Variance – Review (5) Divide the sum of squares by (n-1) for a sample
 = / (14-1) = 57.8 The variance of the Tmin data set (Chapel Hill) is 57.8

Standard Deviation – Review
Standard deviation is equal to the square root of the variance Compared with variance, standard deviation has a scale closer to that used for the mean and the original data

(1) Calculate the mean  (2) Calculate the deviation for each value (3) Square each of the deviations (4) Sum the squared deviations (5) Divide the sum of squares by (n-1) for a sample (6) Take the square root of the resulting variance

(1) – (5)  s2 = 57.8 (6) Take the square root of the variance  The standard deviation (s) of the Tmin data set (Chapel Hill) is 7.6 (°F)

z-score – Review Since data come from distributions with different means and difference degrees of variability, it is common to standardize observations One way to do this is to transform each observation into a z-score May be interpreted as the number of standard deviations an observation is away from the mean

z-scores – Review z-score is the number of standard deviations an observation is away from the mean (1) Calculate the mean  (2) Calculate the deviation (3) Calculate the standard deviation (4) Divide the deviation by standard deviation

z-scores – Review Z-score for maximum Tmin value (33 °F)
(1) Calculate the mean  (2) Calculate the deviation (3) Calculate the standard deviation (SD) (4) Divide the deviation by standard deviation

Coefficient of Variation – Review
Coefficient of variation (CV) measures the spread of a set of data as a proportion of its mean. It is the ratio of the sample standard deviation to the sample mean It is sometimes expressed as a percentage There is an equivalent definition for the coefficient of variation of a population

Coefficient of Variation – Review
(1) Calculate mean  (2) Calculate standard deviation (3) Divide standard deviation by mean CV =

Histograms – Review We may also summarize our data by constructing histograms, which are vertical bar graphs A histogram is used to graphically summarize the distribution of a data set A histogram divides the range of values in a data set into intervals Over each interval is placed a bar whose height represents the percentage of data values in the interval.

Building a Histogram – Review
(1) Develop an ungrouped frequency table  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33  11 3 14 1 18 19 22 25 2 27 29 30 33

2. Construct a grouped frequency table  Select a set of classes  11-15 4 16-20 2 21-25 3 26-30 31-35 1

3. Plot the frequencies of each class

Box Plots – Review We can also use a box plot to graphically summarize a data set A box plot represents a graphical summary of what is sometimes called a “five-number summary” of the distribution Minimum Maximum 25th percentile 75th percentile Median Interquartile Range (IQR) Rogerson, p. 8. 75th %-ile max. median 25th %-ile min.

Boxplot – Review

Further Moments of the Distribution
While measures of dispersion are useful for helping us describe the width of the distribution, they tell us nothing about the shape of the distribution Source: Earickson, RJ, and Harlin, JM Geographic Measurement and Quantitative Analysis. USA: Macmillan College Publishing Co., p. 91.

Skewness – Review Skewness measures the degree of asymmetry exhibited by the data Positive skewness – More observations below the mean than above it Negative skewness – A small number of low observations and a large number of high ones For the example data set: Skewness =

Skewness = -0.1851 (Negatively skewed)

Kurtosis – Review Kurtosis measures how peaked the histogram is
Leptokurtic: a high degree of peakedness Values of kurtosis over 0 Platykurtic: flat histograms Values of kurtosis less than 0 For the example data set: Kurtosis = < 0

Kurtosis = -1.54 < 0 (Platykurtic)

Skewness & Kurtosis: Reference

Similar presentations

Presentation on theme: "Skewness & Kurtosis: Reference"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Skewness & Kurtosis: Reference

Similar presentations

Presentation on theme: "Skewness & Kurtosis: Reference"— Presentation transcript:

Similar presentations

About project

Feedback