 # Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.

## Presentation on theme: "Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data."— Presentation transcript:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 2.1-2.3 Graphical Methods

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Graphing Categorical Data Categorical Data Pie Charts Pareto Diagram Bar Charts

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Bar and Pie Charts Bar charts and Pie charts are often used for qualitative (category) data Height of bar or size of pie slice shows the frequency or percentage for each category

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Pie Chart Example Percentages are rounded to the nearest percent Current Investment Portfolio Savings 15% CD 14% Bonds 29% Stocks 42% Investment Amount Percentage Type (in thousands \$) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110 100 (Variables are Qualitative)

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Bar Chart Example

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Pareto Diagram Example cumulative % invested (line graph) % invested in each category (bar graph)

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Dot Plots 9, 10, 15, 22, 9, 15, 16, 24,11 Observed values: 5 10 15 20 25 Represent data with dots.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 2.3 Graphs of Frequency Distributions Determine the frequency and relative frequency for each value of x. Then mark possible x values on a horizontal scale. Above each value, draw a rectangle whose height is the relative frequency of that value.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Ex. Students from a small college were asked how many charge cards that they carry. x is the variable representing the number of cards and the results are below. x#people 012 142 257 324 49 54 62 Rel. Freq 0.08 0.28 0.38 0.16 0.06 0.03 0.01 Frequency Distribution

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Histograms xRel. Freq. 00.08 10.28 20.38 30.16 40.06 50.03 60.01 Credit card results:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Histograms Continuous Data: Equal Class Widths Determine the frequency and relative frequency for each class. Then mark the class boundaries on a horizontal measurement axis. Above each class interval, draw a rectangle whose height is the relative frequency.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Frequency Distribution: Continuous Data Continuous Data: may take on any value in some interval Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Grouping Data by Classes Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Find range: 58 - 12 = 46 Select number of classes: 5 (usually between 5 and 20) Compute class width: 10 (46/5 then round off) Determine class boundaries:10, 20, 30, 40, 50 Compute class midpoints: 15, 25, 35, 45, 55 Count observations & assign to classes

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Frequency Distribution Example Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class Frequency 10 but under 20 3.15 20 but under 30 6.30 30 but under 40 5.25 40 but under 50 4.20 50 but under 60 2.10 Total 20 1.00 Relative Frequency Frequency Distribution

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Histograms The classes or intervals are shown on the horizontal axis frequency is measured on the vertical axis Bars of the appropriate heights can be used to represent the number of observations within each class Such a graph is called a histogram

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Class Midpoints Histogram Example Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 No gaps between bars, since continuous data

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Histogram Shapes symmetric unimodalbimodal positively skewednegatively skewed

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 2.4 Stem-and- Leaf Displays

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Stem-and- Leaf Displays 1. Select one or more leading digits for the stem values. The trailing digits become the leaves. 2. List stem values in a vertical column. 3. Record the leaf for every observation. 4. Indicate the units for the stem and leaf on the disply.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Stem-and-Leaf Example 9, 10, 15, 22, 9, 15, 16, 24,11 Observed values: 0 9 9 1 1 0 5 5 6 2 2 4 Stem: tens digitLeaf: units digit

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Stem-and- Leaf Displays Identify typical value Extent of spread about a value Presence of gaps Extent of symmetry Number and location of peaks Presence of outlying values

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 2.5 Descriptive Measures

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Types of Variables A variable is discrete if its set of possible values constitute a finite set or an infinite sequence. A variable is continuous if its set of possible values consists of an entire interval on a number line.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Measures of Center and Location Center and Location MeanMedian ModeWeighted Mean Overview

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Mean (Average) The Mean is the arithmetic average of data values –Sample mean –Population mean n = Sample Size N = Population Size

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Mean (Average) The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) (continued) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The sample median, is the middle value in a set of data that is arranged in ascending order. For an even number of data points the median is the average of the middle two. Median Population median:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Median Not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Mode A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 5 0 1 2 3 4 5 6 No Mode

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Five houses on a hill by the beach Review Example House Prices: \$2,000,000 500,000 300,000 100,000 100,000

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Summary Statistics Mean: (\$3,000,000/5) = \$600,000 Median: middle value of ranked data = \$300,000 Mode: most frequent value = \$100,000 House Prices: \$2,000,000 500,000 300,000 100,000 100,000 Sum 3,000,000

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. –Example: Median home prices may be reported for a region – less sensitive to outliers Which measure of location is the “best”?

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Shape of a Distribution Describes how data is distributed Symmetric or skewed Mean = Median = Mode Mean < Median < Mode Mode < Median < Mean Right-Skewed Left-Skewed Symmetric (Longer tail extends to left)(Longer tail extends to right)

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 2.6 Quartiles and Percentiles

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Other Location Measures Other Measures of Location PercentilesQuartiles 1 st quartile = 25 th percentile 2 nd quartile = 50 th percentile = median 3 rd quartile = 75 th percentile The p th percentile in a data array: p% are less than or equal to this value (100 – p)% are greater than or equal to this value (where 0 ≤ p ≤ 100)

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Percentiles The p th percentile in an ordered array of n values is the value in i th position, where Example: The 60 th percentile in an ordered array of 19 values is the value in 12 th position:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Quartiles Quartiles split the ranked data into 4 equal groups 25% Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Example: Find the first quartile (n = 9) Q1 = 25 th percentile, so find the (9+1) = 2.5 position so use the value half way between the 2 nd and 3 rd values, so Q1 = 12.5 25 100 Q1Q1 Q2Q2 Q3Q3

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Measures of Variability

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Measures of Variation Variation Variance Standard DeviationCoefficient of Variation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Measures of variation give information on the spread or variability of the data values. Variation Same center, different variation

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Range Simplest measure of variation Difference between the largest and the smallest observations: Range = x maximum – x minimum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Interquartile Range Can eliminate some outlier problems by using the interquartile range Eliminate some high-and low-valued observations and calculate the range from the remaining values. Interquartile range = 3 rd quartile – 1 st quartile

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Interquartile Range Median (Q2) X maximum X minimum Q1Q3 Example: 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Average of squared deviations of values from the mean –Sample variance: –Population variance: Variance

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Standard Deviation Most commonly used measure of variation Shows variation about the mean Has the same units as the original data –Sample standard deviation: –Population standard deviation:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Ex. Sample Standard Deviation Sample Data (X i ) : 10 12 14 15 17 18 18 24 n = 8 Mean = x = 16

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s =.9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Sample Variance Variance is a measure of the spread of the data. The sample variance of the sample x 1, x 2, …x n of n values of X is given by We refer to s 2 as being based on n – 1 degrees of freedom.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The sample standard deviation is the square root of the sample variance: Standard Deviation Standard deviation is a measure of the spread of the data using the same units as the data.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Computing Formula for s 2 An alternative expression for the numerator of s 2 is

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Properties of s 2 Let x 1, x 2,…,x n be any sample and c be any nonzero constant. where is the sample variance of the x’s and is the sample variance of the y’s.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of data measured in different units Population Sample

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Comparing Coefficient of Variation Stock A: –Average price last year = \$50 –Standard deviation = \$5 Stock B: –Average price last year = \$100 –Standard deviation = \$5 Both stocks have the same standard deviation, but stock B is less variable relative to its price

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. If the data distribution is bell-shaped, then the interval: contains about 68% of the values in the population or the sample The Empirical Rule X 68%

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. contains about 95% of the values in the population or the sample contains about 99.7% of the values in the population or th sample The Empirical Rule 99.7%95%

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. A standardized data value refers to the number of standard deviations a value is from the mean Standardized data values are sometimes referred to as z- scores Standardized Data Values

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. where: x = original data value μ = population mean σ = population standard deviation z = standard score (number of standard deviations x is from μ) Standardized Population Values

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. where: x = original data value x = sample mean s = sample standard deviation z = standard score (number of standard deviations x is from x ) Standardized Sample Values

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Box and Whisker Plot A Graphical display of data using 5- number summary: Minimum -- Q1 -- Median -- Q3 -- Maximum Example: 25% 25%

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Distribution Shape and Box and Whisker Plot Right-SkewedLeft-SkewedSymmetric Q1Q2Q3Q1Q2Q3 Q1Q2Q3

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Box-and-Whisker Plot Example Below is a Box-and-Whisker plot for the following data: 0 2 2 2 3 3 4 5 5 10 27 This data is very right skewed, as the plot depicts 0 2 3 5 27 Min Q1 Q2 Q3 Max

Download ppt "Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data."

Similar presentations