Ch 4 實習.

Ch 4 實習

Numerical Descriptive Techniques
Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient of Variation Measures of Relative Standing Percentiles, Quartiles Measures of Linear Relationship Covariance, Correlation

Sum of the observations Number of observations Mean =
The Arithmetic Mean This is the most popular and useful measure of central location Drawback Very sensitive to extreme values (outliers) Sum of the observations Number of observations Mean =

Sample and population medians are computed the same way.
The Median The Median of a set of observations is the value that falls in the middle when the observations are arranged in order of magnitude. Sample and population medians are computed the same way. Example Find the median of the time on the internet for the 10 adults Suppose only 9 adults were sampled Comment 0, 0, 5, 7, 8, 9, 12, 14, 22, 33 Even number of observations Odd number of observations 0, 0, 5, 7, 8, , 12, 14, 22, 33 8.5, 0, 0, 5, 7, 8 9, 12, 14, 22 8

The Mode The Mode of a set of observations is the value that occurs most frequently. Set of data may have one mode (or modal class), or two or more modes. When the number of all data appears only once, mode doesn’t exit. For large data sets the modal class is much more relevant than a single-value mode. The modal class

Example 1 The times (to the nearest minute) that a sample of 9 bank customers waited in line were recorded and are listed here. Determine the mean, median, and mode for these data.

Solution Mean=(7+4+0+2+7+3+1+9+12)/9=5 Ordered data：0 1 2 3 4 7 7 9 12
Median=4, Mode=7

Relationship among Mean, Median, and Mode
If a distribution is symmetrical, the mean, median and mode coincide If a distribution is asymmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mode Mean Median

Relationship among Mean, Median, and Mode
If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) A negatively skewed distribution (“skewed to the left”) Mode Mean Mean Mode Median Median

? ? ? The range But, how do all the observations spread out?
The range of a set of observations is the difference between the largest and smallest observations. Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points. But, how do all the observations spread out? ? ? ? The range cannot assist in answering this question Range Smallest observation Largest observation

Note! the denominator is sample size (n) minus one !
Variance… population mean The variance of a population is: The variance of a sample is: population size sample mean Note! the denominator is sample size (n) minus one !

思考!! 如何測量離散程度?? 每各數字跟平均值差的和? 每各數字跟平均值的差的平方的和?
都會是0 每各數字跟平均值的差的平方的和? 資料越多會越大所以每各數字跟平均值的差的平方的和再取平均來測量離散程度比較合理原來是這樣~~

Variance… As you can see, you have to calculate the sample mean (x-bar) in order to calculate the sample variance. Alternatively, there is a short-cut formulation to calculate sample variance directly from the data without the intermediate step of calculating the mean. Its given by:0

Coefficient of Variation…
The coefficient of variation of a set of observations is the standard deviation of the observations divided by their mean, that is: Population coefficient of variation = CV = Sample coefficient of variation = cv = CV是相對離勢量數（measure of relative disperson）比較幾組資料單位不同的差異情形。比較幾組資料單位相同，但平均數相差懸殊之差異情形。

The Empirical Rule… Approximately 68% of all observations fall
within one standard deviation of the mean. Approximately 95% of all observations fall within two standard deviations of the mean. Approximately 99.7% of all observations fall within three standard deviations of the mean.

Chebysheff’s Theorem…
A more general interpretation of the standard deviation is derived from Chebysheff’s Theorem, which applies to all shapes of histograms (not just bell shaped). The proportion of observations in any sample that lie within k standard deviations of the mean is at least: For k=2 (say), the theorem states that at least 3/4 of all observations lie within 2 standard deviations of the mean. This is a “lower bound” compared to Empirical Rule’s approximation (95%).

Example 2 Determine the variance, standard deviation, range, and the cv of the following sample.

Solution

Example 3 班上同學的第一次考試成績平均為60分，標準差為11分，全班共有100人，成績分配不為常態分配，請問班上至少有多少人位於82分和38分之間? Solution

Example 4 王先生最近打算將手中的閒錢用來購買股票，在要求的投資報酬率下，初步篩選可投資股票，王先生決定由A、B、C三家上市公司的股票中擇一購買。由最近30天的收盤資料得三股票的收盤平均值與標準差分別為： (單位：元) 若王先生是一個保守的投資人，則他會購買何家公司的股票?又他所依據的決策準則為何?

Solution 追求相對風險，即追求三者中Risk最小者因為王先生為一個保守的投資人所以他追求為相對風險最小，所以要用CV來衡量

Measures of Relative Standing and Box Plots
Percentile The pth percentile of a set of measurements is the value for which p percent of the observations are less than that value 100(1-p) percent of all the observations are greater than that value. Example Suppose your score is the 60% percentile of a SAT test. Then 60% of all the scores lie here 40% Your score

Quartiles Commonly used percentiles
First (lower)decile = 10th percentile First (lower) quartile, Q1, = 25th percentile Second (middle)quartile,Q2, = 50th percentile Third quartile, Q3, = 75th percentile Ninth (upper)decile = 90th percentile

Location of Percentiles
Find the location of any percentile using the formula

Example 5 Determine the first, second, and third quartiles of the following data

Solution 排序後數列 First quartile: L25=(13+1)*25/100=3.5; the first quartile is 13.05 Second quartile: L50=(13+1)*50/100=7; the first quartile is 14.7 Third quartile: L75=(13+1)*75/100=10.5; the first quartile is 15.6

Interquartile range = Q3 – Q1
This is a measure of the spread of the middle 50% of the observations Large value indicates a large spread of the observations Interquartile range = Q3 – Q1

Box Plot This is a pictorial display that provides the main descriptive measures of the data set: L - the largest observation Q3 - The upper quartile Q2 - The median Q1 - The lower quartile S - The smallest observation 1.5(Q3 – Q1) Whisker S Q1 Q2 Q3 L

Measures of Linear Relationship…
We now present two numerical measures of linear relationship that provide information as to the strength & direction of a linear relationship between two variables (if one exists). They are the covariance and the coefficient of correlation. Covariance - is there any pattern to the way two variables move together? Coefficient of correlation - how strong is the linear relationship between two variables?

Covariance… population mean of variable X, variable Y
sample mean of variable X, variable Y Note: divisor is n-1, not n as you may expect.

Covariance… In much the same way there was a “shortcut” for calculating sample variance without having to calculate the sample mean, there is also a shortcut for calculating sample covariance without having to first calculate the mean:

Covariance Illustrated…
Consider the following three sets of data (textbook §4.5)… In each set, the values of X are the same, and the value for Y are the same; the only thing that’s changed is the order of the Y’s. In set #1, as X increases so does Y; Sxy is large & positive In set #2, as X increases, Y decreases; Sxy is large & negative In set #3, as X increases, Y doesn’t move in any particular way; Sxy is “small”

Covariance… (Generally speaking)
When two variables move in the same direction (both increase or both decrease), the covariance will be a large positive number. When two variables move in opposite directions, the covariance is a large negative number. When there is no particular pattern, the covariance is a small number.

Coefficient of Correlation…
The coefficient of correlation is defined as the covariance divided by the standard deviations of the variables: Greek letter “rho” This coefficient answers the question: How strong is the association between X and Y?

The advantage of the coefficient of correlation over covariance is that it has fixed range from -1 to +1, thus: If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.

+1 -1 r or r = Strong positive linear relationship No linear relationship Strong negative linear relationship

Example 4.16 Calculate the coefficient of correlation for the three sets of data above.

Example 4.16 Because we’ve already calculated the covariances we only need compute the standard deviations of X and Y.

Example 4.16 The standard deviations are

Example 4.16 The coefficients of correlation are Set 1: Set 2: Set 3:

Example 6 A retailer wanted to estimate the monthly fixed and variable selling expenses. As a first step she collected data from the past 8 months. The total selling expenses (in $thousands) and the total sales (in $thousands were recorded and listed below. Compute the covariance and the coefficient of correlation Total sales Selling Expenses 20 14 40 16 60 18 50 17 55 70

Solution

鑑往知來樣本為96年學年度修郭老師統計課的考試成績

Ch 4 實習.

Similar presentations

Presentation on theme: "Ch 4 實習."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ch 4 實習.

Similar presentations

Presentation on theme: "Ch 4 實習."— Presentation transcript:

Similar presentations

About project

Feedback