Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 1 Introduction Sigma Notation Graphical Descriptions of Data Numerical Descriptions of Data.

Similar presentations


Presentation on theme: "Class 1 Introduction Sigma Notation Graphical Descriptions of Data Numerical Descriptions of Data."— Presentation transcript:

1

2 Class 1 Introduction Sigma Notation Graphical Descriptions of Data Numerical Descriptions of Data

3 Sigma Notation Representation of a sum Uses the Greek letter sigma, , and a variable of summation

4 This is used in many situations to represent a computation performed with a data set. Let x i represent the i th value in a data set of size n. Then the sum of the data set can be written as: Sigma Notation

5 Graphical Representations of Data Frequently, there is too much information in raw data. It is common to attempt to reduce the amount of information. Examples include: Histograms Line graphs Bar charts Pie charts

6 Graphical Representations of Data This is an art form. Creativity is a key to success. Some dimensions that can be used include: Vertical dimension Horizontal dimension Color Size Icon Animation

7

8

9

10 Numerical Representations of Data It is absolutely critical to distinguish between a population and a sample. A population is the entire body of data from which a sample may be drawn. A sample is a specific subset of a population.

11 A parameter is a numerical measure of a population. Parameters are frequently represented with Greek letters. A statistic is a numerical measure of a sample. Numerical Representations of Data

12 Population Sample Parameters Statistics

13 Measures of Central Tendency in a population The median is the middle value of a population where the values have been ordered in size. The mode is the most frequently occurring value. The most important one is the mean (average). Let x i be the i th data point in a population of size N. Then Numerical Representations of Data

14 Note that the median and mode are insensitive to outliers, while the mean is not. What might this imply about using means, medians, and modes? In a sample of size n, the mean is computed by Numerical Representations of Data

15 Measures of Central Tendency might not reflect important attributes of the data What are the measures of central tendency for the following two populations? {31000, 40000, 40000, 49000} and {39000, 40000, 40000, 41000} Numerical Representations of Data 310004000049000 39000 40000 41000

16 Measures of Variability or Dispersion The range is the difference between the largest and smallest values in a population (sample). »Consider the populations {0, 0, 0, 0, 4} and {0, 1, 2, 3, 4} How can we include all of the data in a measure of dispersion? We can try to measure how far from some point they are, but if we fix that point (say 0), then we will get non-intuitive results. Numerical Representations of Data

17 If we select  (for a population), then at least we will be measuring the distance from the middle of the population. Note that the distance must be positive (unsigned) or we always get 0! How can we make the distance positive? Numerical Representations of Data

18 The variance of a population is the average (mean) squared distance of the values to the mean. The standard deviation is the square root of the variance. Numerical Representations of Data

19 The sample variance is computed in a slightly different way: The sample standard deviation, s, is computed by taking the square root of the variance. Numerical Representations of Data

20 Chebyshev’s Theorem At least (1 - 1/k 2 ) of the values in a data set must be within k standard deviations of the mean, where k>1. As an example, if k = 2, we can say that at least (1 - 1/2 2 ) = (1 - 1/4) = 3/4 of the values will be within 2 standard deviations of the mean. For a population, this is the interval [  - 2 ,  + 2  ]. For a sample, this is the interval Numerical Representations of Data

21 In fact, many data sets are unimodal (mound or bell shaped). In this case, the following approximation is found to hold empirically: About 68% of the values will be within 1 standard deviation of the mean. About 95% of the values will be within 2 standard deviation of the mean. About 99% of the values will be within 3 standard deviation of the mean. Numerical Representations of Data

22 A z-score for the i th data point in a sample is computed by How would we define it for a population? Looking for Outliers: z-scores


Download ppt "Class 1 Introduction Sigma Notation Graphical Descriptions of Data Numerical Descriptions of Data."

Similar presentations


Ads by Google