Numerical Measures: Centrality and Variability

Numerical Measures: Centrality and Variability
PSYSTA1 – Week 5

Measures of Central Tendency
statistical measures used determine a single score that defines the “center” of a distribution Goal: find a single score that is most typical or most representative of the entire group Most Commonly used Measures: Mean Median Mode

Measures of Central Tendency

(Arithmetic) Mean defined as the sum of the scores divided by the number of scores computed by adding all the scores in the distribution and dividing by the number of scores Population Mean: (for a population of size 𝑁) 𝝁= 𝒊=𝟏 𝑵 𝒙 𝒊 𝑵 Sample Mean: (for a sample of size 𝑛) 𝒙 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒏

Median defined as the scale value below which 50% of the scores fall
is the value of the middle term in a data set that has been ranked in increasing order 𝐦𝐞𝐝 𝐱 = 𝑛 th smallest value if 𝑛 is odd 𝐦𝐞𝐝 𝐱 =average of the 𝑛 2 th and 𝑛 th smallest values if 𝑛 is even Caution: The observations must be arranged first! (preferably in an ascending order)

Mode defined as the most frequent score in the distribution
is the value that occurs with the highest frequency in a data set

Example 1 A student kept track of the number of hours she studied each day for a 2-week period. The following daily scores were recorded (scores are in hours): Compute and interpret the mean, the median, and the mode for the given data set.

Some PROPERTIES The mean is sensitive to outliers (i.e., the value of the mean is easily affected by the presence of outliers) while the median is not (i.e., robust against outliers). Note: Outliers, also referred to as extreme values, are the very small or the very large values observed relative to the majority of the other values in a data set. Illustration: Consider the data set 5, 7, 8, 11, 14 It can be showed that the mean is 9 and the median is 8. But if 14 is changed to 94, the mean will drastically change to 25 but the median is still 8.

Some PROPERTIES Note that the mode is not uniquely defined for some data sets (unlike the mean and the median). A data set may have no mode (i.e. the frequency of each value is the same). Also, there may exist more than one mode in a data set. Such data set is called multimodal. For the case that there are two modes, the data is said to be bimodal. Illustration:

Some PROPERTIES However, one relative advantage of the mode over both the mean and the median is that it can also be determined for a qualitative data, unlike the other aforementioned central measures which can only be calculated from a quantitative data. Illustration: A group of students is asked regarding their favorite event in the upcoming sports festival. Their answers are given below. Give the mode of the data set. Clearly for this data set, the modal response is basketball, but the mean and the median cannot be determined,

Some PROPERTIES In relation with histograms (i.e., locating the centers):

Measures of Variability
statistical measures which provide a quantitative measure of the differences between scores in a distribution and describes the degree to which the scores are spread out or clustered together Goal: quantify the extent of dispersion Most Commonly used Measures: Range Variance Standard Deviation Coefficient of Variation Absolute Measures Relative Measure

Measures of Variability

Range defined as the difference between the highest and lowest scores in the distribution the distance covered by the scores in a distribution, from the smallest score to the largest score 𝐑𝐚𝐧𝐠𝐞 𝐱 =𝐦𝐚𝐱 𝐱 −𝐦𝐢𝐧 𝐱 =𝐇𝐢𝐠𝐡𝐞𝐬𝐭 𝐕𝐚𝐥𝐮𝐞 −𝐋𝐨𝐰𝐞𝐬𝐭 𝐕𝐚𝐥𝐮𝐞

Variance the most used measure of variability which tells how closely the values of the data set are clustered around the mean computed by taking the “mean” squared deviation (around the mean) Population Variance: (for a population of size 𝑁) 𝝈 𝟐 = 𝒊=𝟏 𝑵 𝒙 𝒊 −𝝁 𝟐 𝑵 Sample Variance: (for a sample of size 𝑛) 𝒔 𝟐 = 𝒊=𝟏 𝒏 𝒙 𝒊 − 𝒙 𝟐 𝒏−𝟏 𝒙 𝒊 −𝝁 OR 𝒙 𝒊 − 𝒙

Standard Deviation calculated as the positive (principal) square root of the variance indicates the “average deviation” from the mean, the consistency in the scores, and how far scores are spread out around the mean Population Standard Deviation: 𝝈= 𝒊=𝟏 𝑵 𝒙 𝒊 −𝝁 𝟐 𝑵 Sample Standard Deviation: 𝒔= 𝒊=𝟏 𝒏 𝒙 𝒊 − 𝒙 𝟐 𝒏−𝟏

Coefficient of Variation
a measure of relative variability which is the ratio of the standard deviation to the mean, usually expressed in percentage Population Coefficient of Variation: 𝐜𝐯 𝐱 = 𝝈 𝝁 ×𝟏𝟎𝟎% Sample Coefficient of Variation: 𝐜𝐯 𝐱 = 𝒔 𝒙 ×𝟏𝟎𝟎%

Example 2 Compute the range, the standard deviation, the variance, and the coefficient of variation for the data given in Example

Some PROPERTIES Among the absolute measures of variability, the range is the simplest (in terms of computation and meaning); however, the disadvantage of which is that it only considers two values (i.e., the max and the min). The standard deviation and the variance on the other hand, while are more (computationally) complicated than the range, are still most commonly used to describe dispersion because both consider the entire data set in relation to their deviations around the mean. However, all the three aforementioned measures are sensitive to outliers.

Some PROPERTIES All the absolute measures of variability are sensitive to the scale of the measurement (i.e., larger scale magnitudes lead to larger “absolute variation measure”). To compare variation of two data sets having different measurement scales, it is necessary to consider a “dimensionless measure” (thus, a relative measure). In which case, the coefficient of variation is most commonly used.

Example 3 The following table shows the time needed for each worker (A and B) to complete a specific task, measured on 7 such repetitions. However, due to miscommunication, the time recorded for worker A were measured in minutes, while for worker B were in seconds. Compute the variation of times within repetitions for each worker. Then, comment about the consistency of completion times of each worker (i.e., identify which worker completes the task in a more (time) consistent basis). Worker 1 2 3 4 5 6 7 A (in mins) 32 34 41 27 33 B (in secs) 2280 1920 2100 1980 1860

Numerical Measures: Centrality and Variability

Similar presentations

Presentation on theme: "Numerical Measures: Centrality and Variability"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Numerical Measures: Centrality and Variability

Similar presentations

Presentation on theme: "Numerical Measures: Centrality and Variability"— Presentation transcript:

Similar presentations

About project

Feedback