3 Summarizing DataData would be observations on one or more variables selected from a population ( or a sample)
4 Summarizing DataQualitative Variables - Variables that express attributes about a sample or populationQuantitative Variables - Variables that are the result of measurement or counting.
5 Qualitative Variables give rise to nominal or ordinal data. Summarizing DataQualitative Variables give rise to nominal or ordinal data.Quantitative Variables give rise to ratio or intervals
6 It is also important to categorize observations as to their source. Summarizing DataIt is also important to categorize observations as to their source.Primary Data- Collected by means of experiment or survey.Secondary Data- acquired from a source that did not collect the data even though the source may have published them.
7 Summarizing DataData collection allows us to make informed decisions about the problem at hand.Manipulation and statistical analysis of the data allows for improved decision making.Generally speaking statistical data is collected in random order. Since there is no real order to the data it is difficult to obtain any valuable information upon inspection.Data: 3, 1, 7, 22, 9, 10, 4, 17, 19
8 Definition of ArrayArray- an array reorders the data from the smallest to the largest value.Data: 3, 1, 7, 22, 9, 10, 4, 17, 19Arrayed data: 1, 3, 4, 7, 9, 10, 17, 19, 22
9 Measures of Dispersion Definition: Range - a range is computed by subtracting the smallest from the largest observation.Range: 22-1=21An array also indicates something about the distribution the units between the two extremes and their tendency to cluster toward some central value.
10 Measures of Dispersion Data can further be summarized in the form of a frequency distribution.A number of classes are chosen (5 to 15 normally)The distribution has the classes on the vertical axis and frequencies on the horizontal axis.
11 Measures of Dispersion Cotton Yield215 to 235235 to 255255 to 275These don’t have to have equal width classes. (Income)Number of farms4613211575
12 Measures of Dispersion Histogram- A frequency distribution presented as a bar chestAdvantage- See its shapeFrequency polygon- A line graph used to display dataFrequencies on y-axis. Class midpoints on x-axis.
13 AveragesAverages- a number used to represent the central value of data set or distribution.1. Arithmetic Mean- most widely used.nµ = ∑ Xi1N
14 Example: 7 this is the population 32820/4 = 5 = Arithmetic Mean
15 Example cont: Now Take Some Samples X= ∑ Xi1S1 = 7310/2 = 5S2 = 7815/2 = 7.5
17 Crop Hourly Number wx Cucumbers 4.50 950 4,275 Melons 4.75 600 2,850 Weighted Mean ExampleCrop Hourly Number wxWage,x Workers,wCucumbers ,275Melons ,850Onions , ,3552, ,480x = 12,480 = The other way it is 4.832,570
18 Two Properties of Arithmetic Mean a. Sum of deviations from the mean are zero.= -17 x = = 3= 0= 2= -4
19 b. The sum of squares of the deviation’s from the mean is a minimum.
20 2. Midrange- ( or center) is the arithmetic mean of the smallest and the largest items in the data set.Unreliable as estimate of the population mean. Based on two values that change significantly from sample to sample
21 Example 2 where X1is smallest and Xn is the largest MR = 0 + 7 = 3.5 2 MR = X1 + Xn2 where X1is smallest and Xn is the largestMR = = 3.52
22 Median3. Median – a place average for ungrouped data, it is the value of the middle observation after the data is arrayed.When there is an even number of observations the middle two observations are averaged.Better measure when extreme values are encountered.Should not be used for small sample sizes.Half of observations are below half above.
23 Mode 4. Mode – It is most common observation in the data set. For ungrouped data we determine the mode by inspection.Ungrouped data may not have a mode.All values appear once.Several modes could occur as well.Use mode when we want to know what is in vogue.
24 An arithmetic mean might be meaningless. ABC show 1CBS show 2NBC show X = 2.3 meaningless
25 Characteristics of Mean, Median, Mode Use three averages together to determine relative symmetry of distribution.Perfect symmetry.. All three values (averages) are identical.If distribution has a tail on the right. Skewed positively.Arithmetic mean is largestMode smallestMedian 2/3 of the way in between. Toward mean.
26 Characteristics of Mean, Median, Mode Mean is the largest because its affected by large values.Median is sensitive to position of the values.Arithmetic mean is only one that can be used in algebraic calculations, which makes it most useful.Down side impossible to calculate with open ended classes. This does not affect the other two averages.
27 Measures of dispersion Range (R) = Xn-X1Can be used with mean, median, and midrange.Range indicates both how high and low the numbers go and the range of the data itself.Based on two extreme values of the data set. Not first choice for a measure of dispersion.
28 Quartile Deviation QD = Q3- Q1 2 Used only with the median. One half the distance between the first and the third quartiles.
30 Quartile DeviationQD is similar to the range but uses values in the middle half of the distribution rather than the endpoints.Poor measure when wide dispersion in the tails of the distribution!
31 Standard DeviationA measure of dispersion used with the arithmetic mean. Its value is based on all the observations of the data set.
32 SD For ungrouped dataSD is most widely used measure of dispersion. Arithmetic mean is most widely used average.s2- sample variance is an estimate of the population variance σ2 computed from sample data.
33 Standard DeviationIn repeated sampling, the sample variance is biased and underestimates to population variance by the fixed amountThus revision in the sample SD formula is needed; divide by n-1 for sample
34 Example for Calculating SD Days Absent7-39144168-25-525151116080x=10
35 Standard Deviation (Another Formula) This does not contain deviations from the mean.
36 Two properties of S & X1. If we add a constant to every element in the data set, the mean changes by that same value and the SD remains unchanged.2. Multiplying each value of x by a constant multiplies the mean and SD by the absolute value of the constant and the variance by the square of the constant.
37 Standardizing The Data Mean for every element1/s for every element in data set.0 mean 1SD becomes Z for population
38 Coefficient of Variation Uses the SD and mean to measure the variability of the data set.Gives a relative measure of variability in the data setStates how large the Standard Deviation is in comparison to the mean in percentage termsCV=100 would mean that the S & X are equal. When CV over 50 use caution in stating that mean represents population.