Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic 1: Descriptive Statistics CEE 11 Spring 2001 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering.

Similar presentations


Presentation on theme: "Topic 1: Descriptive Statistics CEE 11 Spring 2001 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering."— Presentation transcript:

1 Topic 1: Descriptive Statistics CEE 11 Spring 2001 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering and the Sciences by Jay L. Devore, Duxbury 1995 (4th edition)

2 definitions n A population consists of all objects of a certain type that are relevant to a particular study or analysis. u all students at UCI represent a population n A sample is a subset or portion of the population u students in this class represent a sample of the population of students at UCI

3 frequency distributions and histograms n A frequency is a count, the number of occurrences in the sample of a particular value which are within a particular class. n Classes must be mutually exclusive (no overlap allowed) and collectively exhaustive (the full range of the data must be covered). n A histogram is a bar chart of the frequency distribution.

4 guidelines for forming class intervals n Use intervals of equal length with midpoints at convenient round numbers. n For large data sets use more intervals n For small data sets use a small number of intervals

5 Example n 30 students are asked to submit their weights with these results Men (18 in sample) 140145160190155165 170157130185190155 130155150148150140 Women (12 in sample) 140120130138121125 118122115102115150

6 Example n We might break the sample into classes and construct the following frequency table class 100-<120 120-<140 140-<160 160-<180 180-<200 frequency 4 8 11 3 4 rel freq. 0.133 0.267 0.367 0.10 0.133 Class mid. pt 110 130 150 170 190

7 From the table we can easily construct a histogram for the sample

8 mean n The mean of a sample or data set is simply the arithmetic average of the values in the set, obtained by summing the values and dividing by the number of values. The mean of the sample of weights is 144.63 pounds

9 mean of a frequency distribution n When we summarize a data set in a frequency distribution, we are approximating the data set by "rounding" each value in a given class to the class mark. The mean of the weight data obtained in this way is 146.67

10 median n The median is the value that is roughly in the middle of the data set. If n is odd, the median is the single value in the middle, namely the value with rank (n + 1)/2. n If n is even, there is not a single value in the middle, so the median is defined to be the average of the two middle values, namely the values with ranks n/2 and n/2 + 1. The median for our example is (140+145)/2 = 142.5 lbs.

11 mode n The mode of a data set is the value that appears most often. The modal values for our sample are 130 and 140 -- the mode need not be a single value n If data are broken into classes, the modal class is the class with the most members. The modal class for our sample is 140-<160

12 range n The range or spread of of a data set is the difference between its largest and smallest values The range for the weight data is 102 to 190 or 88 lbs

13 variance n The variance of a population is the average of the squared deviations from the mean n The variance of a sample is approximately the average of the squared deviations from the mean (note that we divide the sum of the squared deviations by n-1 rather than n)

14 standard deviation n The standard deviation is the square root of the variance n The standard deviation is useful because it is in the same units as the mean (and the original data) therefore it provides better insight into the relative variability a sample. n The variance and standard deviation of the weight data are 559.14 lbs 2 and 23.64 lbs

15 coefficient of variation n The coefficient of variation is the standard deviation divided by the mean n The coefficient of variation is used to examine the relative variability of more than one data set for the weight data the coefficient of variation is 0.163

16 shortcut formula for the variance n Its sometimes more convenient to use the following formula for the variance

17 Class exercise n The national weather service maintains and publishes historical weather data for 100 US cities. The average annual rain fall in inches for the cities in the data base beginning with A are listed below. n Calculate the mean, median, range, variance and standard deviation for the following data

18 properties of S 2 n Let x 1, x 2, x,...,x n be a sample and c be any nonzero constant. n If y 1 = x 1 + c, y 2 = x 2 + c,...,y n = x n + c, then S 2 y = S 2 x n If y 1 = cx 1, y 2 = cx 2,...,y n = cx n, then S 2 y = c 2 S 2 x, S y = |c|S 2 x In other words -- if we add a constant to a sample we do not increase the variance -- if we multiply by a constant we increase the variance by the square of the constant

19 related properties of the sample mean n Let x 1, x 2, x,...,x n be a sample and c be any nonzero constant. n If y 1 = x 1 + c, y 2 = x 2 + c,...,y n = x n + c then n If y 1 = cx 1, y 2 = cx 2,...,y n = cx n, then In other words if we add or multiply the sample by a constant we add or multiply the mean by the same constant

20 Class exercise n Without using your calculators, calculate the mean and variance of the following data Xi |3540455055 ---------------------------------------------- fi |1311141312 n Hint, shift the observations “to the left” by subtracting a constant and then divide by another constant


Download ppt "Topic 1: Descriptive Statistics CEE 11 Spring 2001 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering."

Similar presentations


Ads by Google