Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source: www.unc.edu/courses/2006spring/geog/090/001/www/Lectures/

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Agricultural and Biological Statistics
Measures of Dispersion
Review of Previous Lecture Range –The difference between the largest and smallest values Interquartile range –The difference between the 25th and 75th.
Descriptive Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Data Summary Using Descriptive Measures Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Data observation and Descriptive Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Describing Data: Numerical
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3 Statistical Concepts.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Measures of Dispersion
Practice Page 65 –2.1 Positive Skew Note Slides online.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
GROUPED DATA LECTURE 5 OF 6 8.DATA DESCRIPTIVE SUBTOPIC
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
Descriptive Statistics
Exploratory Data Analysis
Analysis and Empirical Results
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Chapter 2: Methods for Describing Data Sets
Statistics Unit Test Review
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
Review of Previous Lecture
Basic Statistical Terms
Numerical Measures: Skewness and Location
STA 291 Summer 2008 Lecture 4 Dustin Lueker.
Honors Statistics Review Chapters 4 - 5
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
STA 291 Spring 2008 Lecture 4 Dustin Lueker.
Presentation transcript:

Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source: Geog090-Week03-Lecture02-SkewsnessKurtosis.pptwww.unc.edu/courses/2006spring/geog/090/001/www/Lectures/

Measures of Dispersion – Coefficient of Variation Coefficient of variation (CV) measures the spread of a set of data as a proportion of its mean. It is the ratio of the sample standard deviation to the sample mean It is sometimes expressed as a percentage There is an equivalent definition for the coefficient of variation of a population

Coefficient of Variation (CV) It is a dimensionless number that can be used to compare the amount of variance between populations with different means

Histogram: Frequency & Distribution A histogram is one way to depict a frequency distribution Frequency is the number of times a variable takes on a particular value Note that any variable has a frequency distribution e.g. roll a pair of dice several times and record the resulting values (constrained to being between and 2 and 12), counting the number of times any given value occurs (the frequency of that value occurring), and take these all together to form a frequency distribution

Frequency & Distribution Frequencies can be absolute (when the frequency provided is the actual count of the occurrences) or relative (when they are normalized by dividing the absolute frequency by the total number of observations [0, 1]) Relative frequencies are particularly useful if you want to compare distributions drawn from two different sources (i.e. while the numbers of observations of each source may be different)

Histograms We may summarize our data by constructing histograms, which are vertical bar graphs A histogram is used to graphically summarize the distribution of a data set A histogram divides the range of values in a data set into intervals Over each interval is placed a bar whose height represents the frequency of data values in the interval.

Building a Histogram To construct a histogram, the data are first grouped into categories The histogram contains one vertical bar for each category The height of the bar represents the number of observations in the category (i.e., frequency) It is common to note the midpoint of the category on the horizontal axis

1. Develop an ungrouped frequency table –That is, we build a table that counts the number of occurrences of each variable value from lowest to highest: TMI ValueUngrouped Freq …… We could attempt to construct a bar chart from this table, but it would have too many bars to really be useful Building a Histogram – Example

2. Construct a grouped frequency table –Select an appropriate number of classes Percentage

Building a Histogram – Example 3. Plot the frequencies of each class –All that remains is to create the bar graph A proxy for Soil Moisture

Further Moments of the Distribution While measures of dispersion are useful for helping us describe the width of the distribution, they tell us nothing about the shape of the distribution Source: Earickson, RJ, and Harlin, JM Geographic Measurement and Quantitative Analysis. USA: Macmillan College Publishing Co., p. 91.

There are further statistics that describe the shape of the distribution, using formulae that are similar to those of the mean and variance 1 st moment - Mean (describes central value) 2 nd moment - Variance (describes dispersion) 3 rd moment - Skewness (describes asymmetry) 4 th moment - Kurtosis (describes peakedness) Further Moments of the Distribution

Skewness measures the degree of asymmetry exhibited by the data S: sample standard deviation If skewness equals zero, the histogram is symmetric about the mean Positive skewness vs negative skewness Further Moments – Skewness

Source:

Positive skewness –There are more observations below the mean than above it –When the mean is greater than the median Negative skewness –There are a small number of low observations and a large number of high ones –When the median is greater than the mean Further Moments – Skewness

Further Moments – Kurtosis Kurtosis measures how peaked the histogram is The kurtosis of a normal distribution is 0 Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution

Further Moments – Kurtosis Platykurtic– When the kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is more flat and wide) Thus, negative kurtosis indicates a relatively flat distribution Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small part of the curve (i.e, the curve is more peaked) Thus, positive kurtosis indicates a relatively peaked distribution

Source: Kurtosis is based on the size of a distribution's tails. Negative kurtosis (platykurtic) – distributions with short tails Positive kurtosis (leptokurtic) – distributions with relatively long tails Further Moments – Kurtosis leptokurtic platykurtic

Why Do We Need Kurtosis? These two distributions have the same variance, approximately the same skew, but differ markedly in kurtosis. Source:

How to Graphically Summarize Data? Histograms Box plots

Functions of a Histogram The function of a histogram is to graphically summarize the distribution of a data set The histogram graphically shows the following: 1. Center (i.e., the location) of the data 2. Spread (i.e., the scale) of the data 3. Skewness of the data 4. Kurtosis of the data 4. Presence of outliers 5. Presence of multiple modes in the data.

Functions of a Histogram The histogram can be used to answer the following questions: 1. What kind of population distribution do the data come from? 2. Where are the data located? 3. How spread out are the data? 4. Are the data symmetric or skewed? 5. Are there outliers in the data?

Source: (First three) (Last)

Box Plots We can also use a box plot to graphically summarize a data set A box plot represents a graphical summary of what is sometimes called a “five-number summary” of the distribution –Minimum –Maximum –25 th percentile –75 th percentile –Median Interquartile Range (IQR) Rogerson, p. 8. min. max. 25 th %-ile 75 th %-ile median

Box Plots Example – Consider first 9 Commodore prices ( in $,000) 6.0, 6.7, 3.8, 7.0, 5.8, 9.975, 10.5, 5.99, 20.0 Arrange these in order of magnitude 3.8, 5.8, 5.99, 6.0, 6.7, 7.0, 9.975, 10.5, 20.0 The median is Q 2 = 6.7 (there are 4 values on either side) Q 1 = 5.9 (median of the 4 smallest values) Q 3 = 10.2 (median of the 4 largest values) IQR = Q 3 – Q 1 = = 4.3

Example (ranked) 3.8, 5.8, 5.99, 6.0, 6.7, 7.0, 9.975, 10.5, 20.0 The median is Q 1 = 6.7 Q 1 = 5.9 Q 3 = 10.2 IQR = Q 3 – Q 1 = = 4.3

Ranked commuting times: 5, 5, 6, 9, 10, 11, 11, 12, 12, 14, 16, 17, 19, 21, 21, 21, 21, 21, 22, 23, 24, 24, 26, 26, 31, 31, 36, 42, 44, 47 Box Plots Example: Table 1.1 Commuting data (Rogerson, p5) 25th percentile is represented by observation (30+1)/4= th percentile is represented by observation 3(30+1)/4= th percentile: th percentile: 26 Interquartile range: 26 – = 14.25

Example (Ranked commuting times): 5, 5, 6, 9, 10, 11, 11, 12, 12, 14, 16, 17, 19, 21, 21, 21, 21, 21, 22, 23, 24, 24, 26, 26, 31, 31, 36, 42, 44, 47 25th percentile: th percentile: 26 Interquartile range: 26 – = 14.25

Other Descriptive Summary Measures Descriptive statistics provide an organization and summary of a dataset A small number of summary measures replaces the entirety of a dataset We’ll briefly talk about other simple descriptive summary measures

Other Descriptive Summary Measures You're likely already familiar with some simple descriptive summary measures –Ratios –Proportions –Percentages –Rates of Change –Location Quotients

Other Descriptive Summary Measures Ratios – # of observations in A # of observations in B e.g., A - 6 overcast, B - 24 mostly cloudy days Proportions – Relates one part or category of data to the entire set of observations, e.g., a box of marbles that contains 4 yellow, 6 red, 5 blue, and 2 green gives a yellow proportion of 4/17 or color count = {yellow, red, blue, green} a count = {4, 6, 5, 2} =

Other Descriptive Summary Measures Proportions - Sum of all proportions = 1. These are useful for comparing two sets of data w/different sizes and category counts, e.g., a different box of marbles gives a yellow proportion of 2/23, and in order for this to be a reasonable comparison we need to know the totals for both samples Percentages - Calculated by proportions x 100, e.g., 2/23 x 100% = 8.696%, use of these should be restricted to larger samples sizes, perhaps 20+ observations

Other Descriptive Summary Measures Location Quotients - An index of relative concentration in space, a comparison of a region's share of something to the total Example – Suppose we have a region of 1000 Km 2 which we subdivide into three smaller areas of 200, 300, and 500 km 2 (labeled A, B, & C) The region has an influenza outbreak with 150 cases in A, 100 in B, and 350 in C (a total of 600 flu cases): Proportion of AreaProportion of CasesLocation Quotient A 200/1000= /600= /0.2=1.25 B 300/1000= /600= /0.3 = 0.57 C 500/1000= /600= /0.5=1.17