Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.

Similar presentations


Presentation on theme: "Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the."— Presentation transcript:

1 Lecture 2 Describing Data II ©

2 Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the shape of the distribution Measures of variability Measures of variability

3 1. Frequency distribution and the shape of the distribution 4In the previous lecture, we saw that the mean of the household savings gives an inflated image of the saving of a “normal household”. 4This was because the shape of the histogram was not symmetric. 4It is important to look at how the observations are distributed.

4 Japanese household savings

5 1-1 Frequency Distribution frequency distribution. Histogram The frequency table that we used in the previous lecture is also called the frequency distribution. A frequency distribution is usually referred to how observations are distributed. When we plot the frequency table, it is called a Histogram. A histogram usually shows the number of observations in a specific range. However, sometimes, it shows the percentage of observations in a specific range.

6 1-2 Shape of the Distribution The shape of the distribution refers to the shape of the Histogram.

7 1-3 Symmetric Distribution symmetric 4The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the mean. The shape of the distribution is symmetric if the shape of the histogram is symmetric

8 Symmetric Distribution Note: For a symmetric distribution, the mean and median are equal.

9 Symmetric Distribution: An example 4The age distribution of the clients (from the previous lecture note) is nearly symmetric.

10 1-4 Skewed Distribution skewed positively skewed negatively skewed A distribution is skewed if the observations are not symmetrically distributed above and below the mean. A positively skewed (or skewed to the right) distribution has a tail that extends to the right in the direction of positive values. A negatively skewed (or skewed to the left) distribution has a tail that extends to the left in the direction of negative values.

11 Positively skewed distribution

12 Positively skewed distribution: An example 4The household saving histogram (from the previous lecture) is an example of a positively skewed distribution.

13 Positively skewed distribution: A note 4For a positively skewed distribution the mean is greater than the median.

14 Negatively skewed distribution Note: For a negatively skewed distribution, the mean is less than the median.

15 2. Measures of Variability Variance Standard deviation

16 Example 4Data “Sales at two different stores” contain daily sales data for two different stores. Data are collected for 60 days. 4Store A’s average daily sales is 231,800 yen. Store B’s average daily sales is 230,500 yen. 4Can we say that they are similar stores? 4Look at the following graphs.

17 Daily sales of the two stores

18 4The difference between the two stores is that, Store A’s sales have much higher variation than Store B’s sales. 4We need a measure of variability in data.

19 2-1 How to measure the variability (1) 4 Take the Store A’s data as an example, variability of each observation can be seen from the difference between the observation and the mean. 4But, how do we measure the overall variability of the data?

20 How to measure the variability (2) Overall variability 4How about taking the average of all differences? This is not a good idea, since the differences can be both positive or negative, so they would sum up to zero. 4Therefore, we take the square of each difference. This is the first step to compute the “Variance”, a measure of overall variability.

21 2-2 Variance A measure of variability 4Variance is computed in the following way. 1.Subtract the mean from each observation (compute the difference between each observation and the mean. Note that the difference can be minus) 2.Then, square each difference 3.Sum all the squared differences 4.Divide the sum of squared differences by n-1 (the number of observations minus 1) 4We will learn the reason why we divide the sum of squares by n-1 after we learn the concept of the expectation.

22 Computation of the variance: Exercise 4Open the data “Computation of Variance”, and compute the variance of Store A’s daily sales 4Compute the variance of Store B’s daily sales

23 Computation of the variance: Exercise 4Store A: Average daily sales =231.8 thousand yen Variance =4979.9 4Store B: Average daily sales=230.5 thousand yen Variance =335.9 4Notice that variance for Store A is higher than that for Store B. This is because the variation in the daily sales is higher for Store A.

24 Variance: note 4In the previous slide, we did not use any unit of measurement for variance. (For example, we do not say that the variance for Store A is 4979.9 thousand yen.) 4This is because, when we compute the variance, we square the data. Therefore, the unit of measurement for variance is “square of thousand yen”, which is not a meaningful unit. 4Therefore, we use the Standard Deviation, another measure of variation.

25 2-3 A measure of variability: Standard deviation 4Standard deviation is the square root of the variance. Exercise: Compute the standard deviation of the daily sales for Store A and Store B.

26 Standard Deviation: Store sales data example 4Standard deviation of Store A’s daily sales=70.57 thousand yen. 4Standard deviation for Store B’s daily sales= 18.33 thousand yen. This means that the average variation of the store A’s sales is about 70.6 thousand yen, and the average variation of the store B’s sales is about 18.3 thousand yen.

27 Standard deviation and variance as measures of risk (or uncertainty) 4Often standard deviation and variance are used as measures of uncertainty or risk. 4If you would like to work as a store manager, then store B may be a better store to work for; although the average sales is almost the same as store A, the uncertainty is lower (low standard deviation)

28 Standard deviation and variance as measures of risk (or uncertainty) 4In the store sales data, the average sales for both stores are similar. 4However, in many other occasions, higher return (higher average sales) comes with higher risk (higher standard deviation). 4One makes a decision by choosing a good combination of return and risk. For example, if you invest in a stock, you would choose a stock with a combination of return and risk that suits your preference. 4Therefore, standard deviation and variance are important numerical measures of summarizing data for a decision making purpose.

29 2-4. Understanding the mathematical notation of the variance 4Most of the time, we only have sample data (not population data). 4Variance computed from a sample is called sample variance. We denote sample variance by s 2. 4When we have population data (which does not happen often), we can compute the population variance. We denote the population variance by σ 2.

30 Understanding the mathematical notation of sample variance Observation idVariable X 1x1x1 2x2x2 3x3x3........ nxnxn The typical data we use comes in this format. Using this format, we would like to represent variance in a mathematical form.

31 Understanding the mathematical notation of sample variance Obs id Variabl e X Each data- the mean (Each data- the mean) 2 1X1X1 X1 -X1 - (X 1 - ) 2 2X2X2 X2 -X2 - (X 2 - ) 2 3X3X3 X 3 -(X 3 - ) 2 : : : nXnXn X n - (X n - ) 2 Averag e The first steps of computing variance are written in the table. The variance can be computed by summing the last column, and divide the sum by (n-1) Therefore, mathematically, a sample variance, s 2, can be written as next page

32 Understanding the mathematical notation for sample variance Mathematically, sample variance, denoted as s 2, can be written as

33 Mathematical notation for population variance 4Though not often, we may have population data. Then we can compute the population variance. We use the notation, σ 2, to denote the population variance. We also use upper case N to denote the number of observations. The mathematical notation for the population variance is Unlike the case for sample variance, we do not have to divide the sum of squares by N-1. We simply divide it by N.

34 2-5. Mathematical notation for the sample standard deviation sample standard deviation, s, The sample standard deviation, s, is written as

35 Mathematical Notation for population standard deviation population standard deviation, , The population standard deviation, , is written as

36 2-6. Short-cut formula for sample variance sample variance The short-cut formula for the sample variance is:

37 Exercise 4Compute the variance for the sales of Store A by applying the short-cut formula for sample variance, and show that this indeed coincides with our previous calculation.

38 Other Measures of Variability 1. The Range range The range in a set of data is the difference between the largest and smallest observations

39 Other Measures of Central Tendency 2. Mode mode, The mode, if one exists, is the most frequently occurring observation in the sample or population.

40 This lecture note covers: 4Textbook P23~P28: Frequency distribution 4Textbook 3.1, 3.2: Measures of central tendency and variability


Download ppt "Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the."

Similar presentations


Ads by Google